CN114648984B

CN114648984B - Audio sentence-breaking method and device, computer equipment and storage medium

Info

Publication number: CN114648984B
Application number: CN202210559476.3A
Authority: CN
Inventors: 张欢韵
Original assignee: Shenzhen Huace Huihong Technology Co ltd
Current assignee: Shenzhen Xiaoyudian Digital Technology Co ltd
Priority date: 2022-05-23
Filing date: 2022-05-23
Publication date: 2022-08-19
Anticipated expiration: 2042-05-23
Also published as: CN114648984A

Abstract

The invention relates to an audio sentence-punctuating method, an audio sentence-punctuating device, computer equipment and a storage medium. The application provides an audio sentence-breaking method, which comprises the steps of firstly judging whether a first sentence to be analyzed is complete; if the first sentence to be analyzed is complete, judging whether the next sentence (second sentence to be analyzed) of the first sentence to be analyzed is complete; under the condition that a second sentence to be analyzed is incomplete, judging whether a first combined sentence obtained by splicing the first sentence to be analyzed and the second sentence to be analyzed is complete or not, and under the condition that the first combined sentence is incomplete, further judging whether a second combined sentence obtained by splicing the second sentence to be analyzed and a next sentence of the second sentence to be analyzed is complete or not; whether each sentence is complete or not can be further considered, and whether the adjacent sentences are complete or not after splicing can be further considered, so that the completeness of the sentence meaning of each sentence can be further ensured under the condition that the completeness of each sentence is ensured, and the accuracy of audio sentence interruption can be further improved.

Description

Audio sentence-breaking method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of speech recognition, and in particular, to an audio sentence-breaking method and apparatus, a computer device, and a storage medium.

Background

For the analysis of off-line video information, speech recognition is an important part, and how to segment sentences is an important part of speech recognition, which is related to the recognition accuracy of words and the completeness of sentences. For sentence segmentation of long audio files, the conventional method is to segment the long audio files based on the time of silent sound, or judge the speech boundaries in an artificial intelligence manner, and for scenes such as repeat or lingering sound in long audio, sentences can be automatically segmented, and inaccurate segmentation may occur.

Disclosure of Invention

The audio sentence-breaking method, the audio sentence-breaking device, the computer equipment and the storage medium are used for solving the technical problem that the sentence breaking is inaccurate in the existing audio sentence-breaking method.

In a first aspect, an audio sentence-breaking method is provided, the method including:

taking a first sentence in a plurality of sentences arranged in sequence as a first sentence to be analyzed, and judging the semantic integrity of the first sentence to be analyzed, wherein the plurality of sentences arranged in sequence are obtained by performing voice recognition on a plurality of audio clips arranged in sequence, and the plurality of audio clips arranged in sequence are obtained by dividing a target audio;

under the condition that the semantics of the first sentence to be analyzed are complete, putting the first sentence to be analyzed into a region to be output, taking the next sentence of the first sentence to be analyzed as a second sentence to be analyzed, and judging the semantic integrity of the second sentence to be analyzed;

under the condition that the semantics of the second sentence to be analyzed is incomplete, merging the second sentence to be analyzed and the sentences in the region to be output to obtain a first merged sentence, and judging the semantic integrity of the first merged sentence;

and under the condition that the semantics of the first combined statement are incomplete, outputting and emptying the statement in the area to be output, combining the next statement of the second statement to be analyzed with the second statement to be analyzed to obtain a second combined statement, taking the second combined statement as the first statement to be analyzed, and executing the step of judging the semantic integrity of the first statement to be analyzed until the plurality of sequentially arranged statements are all output.

With reference to the first aspect, in a possible implementation manner, the method further includes:

and under the condition that the semantics of the first combined statement are complete, emptying the statement in the area to be output, putting the first combined statement into the area to be output, taking the next statement of the second statement to be analyzed as the second statement to be analyzed, and executing the step of judging the semantic integrity of the second statement to be analyzed until the plurality of sequentially arranged statements are output.

and under the condition that the semantics of the second sentence to be analyzed is complete, outputting the sentence in the region to be output, putting the second sentence to be analyzed into the region to be output, taking the next sentence of the second sentence to be analyzed as the second sentence to be analyzed, and executing the step of judging the semantic integrity of the second sentence to be analyzed until the plurality of sentences arranged in sequence are all output.

and under the condition that the semantics of the first sentence to be analyzed are incomplete, combining the first sentence to be analyzed and a next sentence of the first sentence to be analyzed to obtain a third combined sentence, taking the third combined sentence as the first sentence to be analyzed, and executing the step of judging the semantic integrity of the first sentence to be analyzed.

if the sentence length of the second merged sentence reaches the preset sentence length, putting the second merged sentence into the area to be output, taking a third sentence to be analyzed as the second sentence to be analyzed, and executing the step of judging the semantic integrity of the second sentence to be analyzed until the plurality of sentences arranged in sequence are all output; the third sentence to be analyzed is a sentence next to the sentence to be analyzed.

With reference to the first aspect, in a possible implementation manner, before taking a first sentence of the plurality of sentences arranged in sequence as a first sentence to be analyzed, the method includes: acquiring the target audio, and identifying a silent tone in the target audio; segmenting the target audio according to the silent sound to obtain a plurality of sequentially arranged audio segments; and performing character recognition on the plurality of audio frequency segments arranged in sequence to obtain a plurality of sentences arranged in sequence.

With reference to the first aspect, in a possible implementation manner, determining semantic integrity of a target sentence through a preset semantic integrity model, where the target sentence is the first sentence to be analyzed, the second sentence to be analyzed, or a sentence obtained by combining the first sentence to be analyzed and the second sentence to be analyzed, and determining the semantic integrity of the target sentence through the preset semantic integrity model includes: acquiring a word vector, a sentence vector and a position vector corresponding to the target sentence; obtaining a coding sequence corresponding to the target statement according to the word vector, the sentence vector and the position vector; inputting the coding sequence into a preset semantic complete model to obtain the integrity probability of the target statement; and determining the semantic integrity of the target statement according to the integrity probability.

In a second aspect, an audio sentence-breaking apparatus is provided, the apparatus comprising:

the first judging module is used for taking a first sentence in a plurality of sentences which are arranged in sequence as a first sentence to be analyzed and judging the semantic integrity of the first sentence to be analyzed, wherein the plurality of sentences which are arranged in sequence are obtained by performing voice recognition on a plurality of audio clips which are arranged in sequence, and the plurality of audio clips which are arranged in sequence are obtained by dividing a target audio;

the second judging module is used for placing the first sentence to be analyzed into a region to be output under the condition that the semantic meaning of the first sentence to be analyzed is complete, taking the next sentence of the first sentence to be analyzed as a second sentence to be analyzed, and judging the semantic integrity of the second sentence to be analyzed;

a third determining module, configured to, when the semantics of the second statement to be analyzed is incomplete, merge the second statement to be analyzed and the statement in the area to be output to obtain a first merged statement, and determine the semantic integrity of the first merged statement;

a fourth judging module, configured to output the statement in the region to be output when the semantic meaning of the first merged statement is incomplete; and merging the next statement of the second statement to be analyzed with the second statement to be analyzed to obtain a second merged statement, taking the second merged statement as the first statement to be analyzed, and executing the step of judging the semantic integrity of the first statement to be analyzed until the plurality of sequentially arranged statements are all output.

With reference to the second aspect, in one possible design, the second determining module is further configured to: and under the condition that the semantics of the first combined statement are complete, emptying the statement in the area to be output, putting the first combined statement into the area to be output, taking the next statement of the second statement to be analyzed as the second statement to be analyzed, and executing the step of judging the semantic integrity of the second statement to be analyzed until the plurality of sequentially arranged statements are output.

With reference to the second aspect, in one possible design, the second determining module is further configured to: and under the condition that the semantics of the second sentence to be analyzed is complete, outputting the sentence in the region to be output, putting the second sentence to be analyzed into the region to be output, taking the next sentence of the second sentence to be analyzed as the second sentence to be analyzed, and executing the step of judging the semantic integrity of the second sentence to be analyzed until the plurality of sentences arranged in sequence are all output.

With reference to the second aspect, in a possible design, the first determining module is further configured to: and under the condition that the semantics of the first sentence to be analyzed are incomplete, combining the first sentence to be analyzed and a next sentence of the first sentence to be analyzed to obtain a third combined sentence, taking the third combined sentence as the first sentence to be analyzed, and executing the step of judging the semantic integrity of the first sentence to be analyzed.

With reference to the second aspect, in a possible design, the second determining module is further configured to: if the sentence length of the second merged sentence reaches the preset sentence length, putting the second merged sentence into the area to be output, taking a third sentence to be analyzed as the second sentence to be analyzed, and executing the step of judging the semantic integrity of the second sentence to be analyzed until the plurality of sentences arranged in sequence are all output; the third sentence to be analyzed is a sentence next to the second sentence to be analyzed.

With reference to the second aspect, in one possible design, the apparatus further includes a preprocessing module configured to obtain the target audio and identify silence in the target audio; dividing the target audio according to the silent tone to obtain a plurality of audio segments arranged in sequence; and performing character recognition on the plurality of audio frequency segments arranged in sequence to obtain a plurality of sentences arranged in sequence.

With reference to the second aspect, in one possible design, the apparatus further includes: a preset semantic complete model, which is used for acquiring a word vector, a sentence vector and a position vector corresponding to the target sentence; obtaining a coding sequence corresponding to the target statement according to the word vector, the sentence vector and the position vector; inputting the coding sequence into a preset semantic complete model to obtain the integrity probability of the target statement; and determining the semantic integrity of the target statement according to the integrity probability.

In a third aspect, there is provided a computer device comprising a memory and one or more processors for executing one or more computer programs stored in the memory, the one or more processors, when executing the one or more computer programs, causing the computer device to implement the audio sentence-breaking method of the first aspect described above.

In a fourth aspect, a computer-readable storage medium is provided, the computer-readable storage medium storing a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the audio sentence-breaking method of the first aspect described above.

The application can realize the following beneficial effects: the method comprises the steps of firstly judging whether a first sentence to be analyzed is complete; if the first sentence to be analyzed is complete, judging whether the next sentence (second sentence to be analyzed) of the first sentence to be analyzed is complete; under the condition that a second sentence to be analyzed is incomplete, judging whether a first combined sentence obtained by splicing the first sentence to be analyzed and the second sentence to be analyzed is complete or not, and under the condition that the first combined sentence is incomplete, further judging whether a second combined sentence obtained by splicing the second sentence to be analyzed and a next sentence of the second sentence to be analyzed is complete or not; whether each sentence is complete or not can be further considered, and whether the adjacent sentences are complete or not after splicing can be further considered, so that the completeness of the sentence meaning of each sentence can be further ensured under the condition that the completeness of each sentence is ensured, and the accuracy of audio sentence interruption can be further improved.

Drawings

Fig. 1 is a schematic flowchart of an audio sentence-punctuating method provided in an embodiment of the present application;

fig. 2 is a schematic diagram of text segmentation provided in an embodiment of the present application;

fig. 3 is a schematic structural diagram of an audio sentence-breaking apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

The technical scheme of the application can be applied to various scenes of audio identification, and particularly can be used for a scene of sentence breaking of a long audio file in the scene of audio identification. Further, in a scene of sentence breaking of a long audio file, in some implementation manners, the long audio file in the scene is processed into a plurality of audio segments, the plurality of audio segments are subjected to text conversion to obtain a plurality of sentences, and then semantic integrity judgment is performed on the plurality of sentences in the scene, so that whether the plurality of sentences are complete sentences is determined, and sentence breaking of the long audio file is completed.

In order to facilitate understanding of the present solution, the technical concept of the present application is explained first. The technical idea of the application is as follows: after audio is divided and voice recognition is carried out according to a common division mode to obtain a plurality of sentences, the semantic integrity of the sentences is sequentially judged according to the sequence of the sentences, a sentence division strategy is determined according to the respective semantic integrity conditions of the front sentence and the rear sentence, whether the first sentence is output or not is determined according to the semantic integrity condition of the next sentence of the first sentence under the condition that the first sentence with complete semantics is determined, the condition that the next sentence belongs to the shape language of the first sentence and the like, so that the situation that the semantics of the next sentence is incomplete when the next sentence is output after the first sentence is directly output is avoided, and each output sentence is guaranteed to be a sentence with complete semantics, so that the sentences can be accurately divided.

Referring first to the schematic flow chart shown in fig. 1, the present application proposes an audio sentence-breaking method, which includes:

step 101, taking the first sentence of the plurality of sentences arranged in sequence as a first sentence to be analyzed.

The sentences in sequence are obtained by performing voice recognition on a plurality of audio clips in sequence, and the audio clips in sequence are obtained by dividing target audio. The plurality of audio clips arranged in sequence refer to that after the target audio is divided into a plurality of audio clips, the position relation among the plurality of audio clips is not changed, and the plurality of audio clips are still arranged according to the arrangement sequence in the target audio. Correspondingly, the plurality of sentences arranged in sequence refers to a plurality of sentences which are consistent with the arrangement sequence of the plurality of audio clips.

After the target audio is acquired, the target audio is cut to obtain a plurality of audio segments arranged in sequence, and then the plurality of audio segments arranged in sequence are subjected to speech recognition, namely, character conversion, so that a plurality of sentences arranged in sequence corresponding to the plurality of audio segments arranged in sequence are obtained. Specifically, speech recognition can be performed on a plurality of sequentially arranged audio segments through a speech conversion model composed of a convolutional neural network and connectivity time-series classification.

Specifically, the target audio is a to-be-sentence-break audio, and the to-be-sentence-break audio may be a voice instruction acquired by an intelligent robot, an unmanned vehicle, or the like, may also be audio information acquired by a mobile terminal, a translation terminal, an intelligent home, or the like, and may also be audio information acquired in a video interview process, and the like, which is not limited in the present application. In some possible cases, the target audio may also be audio obtained by separating images and sounds of a video, or audio obtained by recording sounds of a video. For example, the target audio may be audio obtained by separating images and sounds during a video interview.

In one embodiment, the preceding the first sentence in the plurality of sentences to be analyzed as the first sentence to be analyzed includes: acquiring the target audio, and identifying a silent tone in the target audio; dividing the target audio according to the silent tone to obtain a plurality of audio segments arranged in sequence; and performing character recognition on the plurality of audio frequency segments arranged in sequence to obtain a plurality of sentences arranged in sequence.

The silent tone may refer to a sound having a sound intensity not higher than a preset decibel value. After the target audio is obtained, firstly, the silent tone is utilized to preliminarily segment the long audio, the long audio is segmented into a plurality of audio segments which are arranged in sequence, then the audio segments obtained by segmentation are respectively converted into sentences in a text form, and one audio segment corresponds to one sentence. Specifically, the target audio may be segmented into a plurality of audio segments arranged in sequence by using the audio position where the duration of the silent sound exceeds the preset duration as a segmentation point according to the length of the silent sound. For example, setting the sound with the sound intensity not higher than-50 dBFS as a silent sound, setting the duration not lower than 1000ms as the audio segmentation condition, and according to the segmentation condition, if the silent sound exists in the target audio and the duration of the silent sound is greater than 1000ms, taking the silent sound as a segmentation point, and segmenting the target audio into two audio segments; when a plurality of silent tones with the time sequence duration longer than 1000ms exist in the target audio, the target audio can be divided into a plurality of audio segments. The target audio is preliminarily segmented based on the silent tone, so that the quality of a plurality of sentences obtained after text conversion can be effectively improved, and the efficiency and the quality of audio sentence break are improved.

In one embodiment, the target audio is usually generated by stopping speaking when the voice is stopped, because the voice stopping words such as "o", "kah" and "hiccup" are usually existed in the target audio. After preliminary sentence interruption of the audio is performed through silent sound and speech recognition is performed, the Chinese and Qi stop words emitted by pause such as "" o "" kashiwa "" hiccup and the like are interpreted into a single sentence. Therefore, after the speech recognition is performed on the plurality of sentences arranged in sequence to obtain the plurality of sentences arranged in sequence, the plurality of sentences also need to be screened to remove the speech stop words, so as to reduce the influence of the speech stop words on the semantic recognition of the sentences and improve the accuracy of the audio sentence break. Specifically, a tone stop word data table for storing tone stop words is obtained, a plurality of sentences are screened according to the tone stop word data table, and sentences corresponding to the tone stop words are identified and deleted.

In one embodiment, because some words are not commonly used in the industries of live delivery, interviews, course video and the like, speech recognition is not necessarily accurate, and a word bank is required for error correction. Therefore, after character recognition is carried out on a plurality of audio segments arranged in sequence, aiming at a single character, if a complex character, an uncommon character and the like are recognized, the complex character, the uncommon character and the like are firstly converted into pinyin; then, traversing the pre-established domain-vocabulary vertical relation corresponding table, and screening out the words corresponding to the pinyin in the specific domain, thereby converting the pinyin into the specific words in the specific domain. Illustratively, "the details please see the use guide of the jingdong | yi | card", wherein the "yi" character is not a common spoken word and is replaced by the pinyin "yi", and then since this is the vertical field of the jingdong, the relevant key word "jingdong e card" in the lexicon meets the condition after matching with the preceding and following words, the sentences meeting the condition before and after being merged at the same time become "the details please see the use guide of the jingdong e card".

Step 102, judging the semantic integrity of the first sentence to be analyzed.

Wherein semantic completeness is used to indicate whether the semantics of the statement are complete. Specifically, one sentence capable of expressing complete semantics has semantic integrity, and correspondingly, one sentence with semantic integrity is a complete sentence; a sentence which cannot completely express semantics has no semantic integrity, and correspondingly, a sentence which does not have semantic integrity is an incomplete sentence. Whether the semantics of a sentence are complete may particularly mean whether the grammatical structure of the sentence is complete and/or whether the sentence can express the complete independently.

The semantic integrity of the first sentence to be analyzed can be judged through the semantic integrity model. For the embodiments related to determining the semantic integrity of the first to-be-analyzed sentence through the semantic integrity model, reference may be made to the following description. Under the condition that the semantics of the first sentence to be analyzed is complete, the first sentence to be analyzed is a complete sentence, and the first sentence to be analyzed can be output as a complete sentence, and step 103 is executed; if the semantic meaning of the first sentence to be analyzed is incomplete, it indicates that the first sentence to be analyzed is an incomplete sentence, and the preliminary sentence break has an error, step 117 is performed.

Step 103, putting the first sentence to be analyzed into the area to be output.

The to-be-output area is used for storing the sentences to be output.

When the first sentence to be analyzed is not the last sentence in the plurality of sentences arranged in sequence, it is described that the judgment of each sentence is not completed yet, the subsequent sentences need to be analyzed, and step 104 is executed; in the case where the first sentence to be analyzed is the last sentence in the plurality of sentences arranged in sequence, it is described that the judgment of each sentence has been completed, and step 106 is performed.

And 104, taking the next statement of the first statement to be analyzed as a second statement to be analyzed.

Wherein the next sentence of the first sentence to be analyzed means a sentence arranged after the first sentence to be analyzed among the plurality of sentences arranged in order.

And 105, judging the semantic integrity of the second statement to be analyzed.

The semantic integrity of the second sentence to be analyzed can be judged through the semantic integrity model. For the specific implementation of judging the semantic integrity of the second sentence to be analyzed by the semantic integrity model, reference may be made to the following description.

If the semantics of the second sentence to be analyzed are complete, which indicates that the second sentence to be analyzed and the sentence in the area to be output are two independent sentences with complete semantics, and the second sentence to be analyzed is not the phrase of the first sentence to be analyzed, and the like, the sentence to be output in the area to be output can be output as a complete sentence, that is, step 106 is executed; if the semantics of the second sentence to be analyzed is incomplete, it indicates that the sentences in the second sentence to be analyzed and the to-be-output area may be two sentences having semantic association, for example, the second sentence to be analyzed is a phrase structure of the sentence in the to-be-output area, and in order to avoid the incompleteness of the semantics when the second sentence to be analyzed is output, further determination needs to be made, and step 109 is executed.

And 106, outputting the statement in the region to be output.

Wherein, under the condition that the sentence in the area to be output does not contain the last sentence in the plurality of sentences in sequence, it is described that there is a sentence which is not output in the plurality of sentences in sequence, step 107 is executed; and under the condition that the sentences in the area to be output contain the last sentence in the plurality of sentences in sequence, the sentence to be output shows that the plurality of sentences in sequence are all output, and the flow is ended.

And step 107, putting the second statement to be analyzed into the area to be output.

Under the condition that the second sentence to be analyzed is not the last sentence in the plurality of sentences arranged in sequence, it is described that the judgment of each sentence is not completed yet, the subsequent sentences need to be analyzed, and step 108 is executed; in the case where the first sentence to be analyzed is the last sentence of the plurality of sentences arranged in order, it is described that the judgment of each sentence has been completed, and 106 is performed.

Step 108, taking the next sentence of the second sentence to be analyzed as the second sentence to be analyzed, and executing step 105.

In steps 105 to 108, the sentence in the area to be output is output only when the semantic of the second sentence to be analyzed is complete, so that incomplete semantic of the second sentence to be analyzed when the second sentence to be analyzed is subsequently output due to the fact that the second sentence to be analyzed is in the state of the sentence in the area to be output and the like can be avoided. And under the condition that the semantic meaning of a second sentence to be analyzed is complete, putting the second sentence to be analyzed into the area to be output, taking the next sentence of the second sentence to be analyzed as the second sentence to be analyzed, and executing the step of judging the semantic integrity of the second sentence to be analyzed, namely step 105, which is equivalent to judging the semantic integrity of the next sentence in a new group of sentences (namely the front and back sentences), so that each sentence output in the area to be output is a sentence with complete semantic meaning.

And step 109, merging the second statement to be analyzed and the statement in the area to be output to obtain a first merged statement.

In an embodiment, after the second sentence to be analyzed and the sentence in the area to be output are merged to obtain the first merged sentence, it is further required to determine whether the sentence length of the first merged sentence reaches the preset sentence length, and the step 110 is executed under the condition that the sentence length of the first merged sentence does not reach the preset sentence length; and outputting and emptying the sentences in the area to be output under the condition that the sentence length of the first combined sentence reaches the preset sentence length, then placing a second sentence to be analyzed in the area to be output, finally taking the next sentence of the second sentence to be analyzed as the second sentence to be analyzed, and executing the step 105. The operation pause caused by the verbosity of the sentences can be avoided by judging the lengths of the combined sentences, the operation speed is improved, and the language habit of people can be better met.

Step 110, judging the semantic integrity of the first merged statement.

The semantic integrity of the first combined statement can be judged through the semantic integrity model. For the embodiments related to determining the semantic integrity of the first merged sentence through the semantic integrity model, reference may be made to the following description.

In the case that the semantics of the first merged sentence are incomplete, the second sentence to be analyzed is not semantically associated with the sentence in the area to be output, that is, the second sentence to be analyzed is a part of a new sentence, such as a phrase structure of the sentence in the area to be output, and therefore, the sentence in the area to be output can be output, that is, step 111 is performed; under the condition that the semantics of the first merged sentence are complete, it is described that the second sentence to be analyzed is associated with the sentence in the area to be output, and therefore, the first merged sentence needs to be placed into the area to be output as a complete sentence, that is, step 115 is executed.

And step 111, outputting and clearing the statement in the area to be output, and combining the next statement of the second statement to be analyzed with the second statement to be analyzed to obtain a second combined statement.

When a second sentence to be analyzed (where the second sentence to be analyzed can be understood as a sentence next to a first sentence with complete semantics) is incomplete, the semantic relevance between the second sentence to be analyzed and the sentence in the area to be output is judged by merging the second sentence to be analyzed and the sentence in the area to be output into a first merged sentence and then judging the semantic integrity of the first merged sentence, and the sentence in the area to be output is output only when the second sentence to be analyzed and the sentence in the area to be output do not have semantic relevance, so that when the second sentence to be analyzed is subsequently output, the second sentence to be analyzed can belong to a part of the sentence with complete semantics and can be output.

Step 112, determine whether the sentence length of the second merged sentence reaches the preset sentence length.

Wherein, in case that the sentence length of the second merged sentence does not reach the preset sentence length, step 113 is executed; in case the sentence length of the second merged sentence reaches the preset sentence length, step 114 is performed. The operation pause caused by the verbosity of the sentences can be avoided by judging the lengths of the combined sentences, the operation speed is improved, and the language habit of people can be better met.

And 113, taking the second combined statement as a first statement to be analyzed, and executing the step 102.

And under the condition that the sentence length of the second combined sentence does not reach the preset sentence length, taking the second combined sentence as the first sentence to be analyzed, which is equivalent to starting to judge the semantic integrity of the previous sentence in the new group of sentences.

Step 114, putting the second merged statement into the to-be-output area, and taking the third to-be-analyzed statement as the second to-be-analyzed statement, and executing step 105.

Wherein the third sentence to be analyzed is a sentence next to the sentence to be analyzed.

And step 115, emptying the sentences in the area to be output, and putting the first combined sentences into the area to be output.

Under the condition that the semantics of the first combined sentence is complete, the first combined sentence is described as a complete sentence, the second sentence to be analyzed is a structure of a sentence in the region to be output, and the like, and the semantics of the second sentence to be analyzed can be completely supplemented by combining the second sentence to be analyzed and the sentence in the region to be output, so that the primary sentence break can be corrected once.

Step 116, the next statement of the second statement to be analyzed is taken as the second statement to be analyzed, and step 105 is executed.

Here, the next part of the second sentence to be analyzed is used as the second sentence to be analyzed, and the semantic integrity of the second sentence to be analyzed is judged, which is equivalent to performing semantic integrity judgment on the next sentence in a new set of sentences (which refers to the front and back sentences), so that each sentence output in the area to be output is a sentence with complete semantic.

Step 117, merging the first to-be-analyzed sentence with a next sentence of the first to-be-analyzed sentence to obtain a third merged sentence, taking the third merged sentence as the first to-be-analyzed sentence, and executing step 102.

Combining the first sentence to be analyzed and the second sentence to be analyzed to complete semantic supplementation on the first sentence to be analyzed; by performing semantic integrity judgment on the third combined sentence obtained by combination, one-time verification and error correction on the initial sentence break are completed, so that the sentence break accuracy is improved.

It can be understood that, if the semantic complete model determines that the semantic of the third merged sentence is incomplete, the third merged sentence and a fourth sentence to be analyzed (a sentence next to the third sentence to be analyzed) are merged to obtain a fourth merged sentence, and the fourth sentence to be analyzed is taken as the first sentence to be analyzed, and step 102 is executed again. It can be seen that if the complete sentence cannot be formed after the semantic supplementation is performed on the first sentence to be analyzed, the semantic supplementation is performed on the first sentence to be analyzed again on the basis of the semantic supplementation until the complete sentence can be formed after the semantic supplementation is performed on the first sentence to be analyzed.

Illustratively, taking the long audio "i'm present/want to eat watermelon/have strawberry" as an example, the long audio includes three sentences of "i'm present", "want to eat watermelon" and "have strawberry", the "i'm present" is judged as the first sentence to be analyzed that the semantics is incomplete by the semantic complete model, the semantic supplementation is performed on the first sentence "i now" to be analyzed, specifically, the first sentence "i now" to be analyzed and the second sentence "want to eat watermelon" to be combined to obtain a third combined sentence "i want to eat watermelon", the semantic integrity model judges that the semantics of the third combined sentence "i want to eat watermelon" is complete, putting the third combined sentence 'i want to eat watermelon now' into the area to be output as the sentence to be output, and using the third sentence to be analyzed with strawberry as the first sentence to be analyzed, and re-executing the step of judging the semantic integrity of the first sentence to be analyzed. In the embodiment, the first sentence to be analyzed, i.e., "i is now", and the second sentence to be analyzed, i.e., "want to eat watermelon", are combined, so that semantic supplementation is performed on the first sentence to be analyzed, i.e., "i is now"; through carrying out semantic integrity judgment on the third combined sentence which is obtained by combination and is 'I want to eat watermelon now', the initial sentence break is verified and corrected once, so that the accuracy of the sentence break is improved.

Illustratively, taking the long audio "i'm now/special/want to eat watermelon" as an example, the long audio includes four sentences "i'm now", "special", "want to eat watermelon", i'm now "as the first sentence to be analyzed is judged by the semantic integrity model to have incomplete semantics, then the first sentence "i now" to be analyzed and the second sentence "special" to be analyzed are merged to obtain a third merge sentence "i now special", the semantic complete model judges that the semantics of the third merge sentence "i now special" is incomplete, then the third merge sentence "i am now ad hoc" is merged with the fourth to-be-analyzed sentence "ad hoc" to obtain a fourth merge sentence "i am now ad hoc", and using the fourth sentence to be analyzed, which is "i'm special now", as the first sentence to be analyzed, and re-executing the step of judging the semantic completeness of the first sentence to be analyzed. In this embodiment, if the complete sentence cannot be formed after the semantic supplementation of the first sentence to be analyzed is performed once, the semantic supplementation of the first sentence to be analyzed is performed again on the basis of the first semantic supplementation until the complete sentence can be formed after the semantic supplementation of the first sentence to be analyzed, so that the verification and the error correction can be performed once on the initial sentence break, and the accuracy of the sentence break is improved.

In a specific example, refer to fig. 2, as shown in fig. 2, the technical solution proposed in the present application is exemplarily explained by the text shown in fig. 2. FIG. 2 is a text corresponding to a long audio "where I want to say that everyone hears, i.e., Chongqing magnetic instrument mouth", and a, b, c, d, e, f, and g are a plurality of sentences arranged in sequence. Firstly, preprocessing all the segmented sentences, wherein the sentences b are the Chinese dead words and are removed. Then, judging the semantic integrity of the sentence a, if the semantic integrity model judges that the sentence a is not an integral sentence, adding the next sentence c to the sentence a to form a sentence ac; if the semantic complete model judges that the statement "ac" is not a complete statement, continuing to add the statement d to the statement "ac" to obtain the statement "acd"; the semantic complete model judges that the sentence 'acd' is a complete sentence, and then the sentence 'acd' is put into a region to be output; then judging a statement e, if the statement e is judged to be not a complete statement by the semantic complete model, combining the statement e with a statement "acd" in a region to be output to obtain a statement "acde"; if the semantic complete model judges that the statement "acde" is not a complete statement, merging the statement e with the next statement f to obtain a statement "ef"; if the semantic integrity model judges that the statement 'ef' is an integral statement, outputting the statement 'acd' of the area to be output, and putting the statement 'ef' into the area to be output; and then judging the statement g, if the statement g is judged to be not a complete statement by the semantic complete model, combining the statement g and the statement "ef" of the area to be output to obtain the statement "efg", if the statement "efg" is judged to be a complete statement by the semantic complete model, putting the statement "efg" into the area to be output, and outputting the statement "efg" of the area to be output. So far, the long audio "where I want to say that everyone hears you, i.e. the magnetic mouth of Chongqing" completes the sentence break, and obtains the complete sentence "where I want to say that everyone hears you," and the complete sentence "i.e. the magnetic mouth of Chongqing".

The method comprises the steps of firstly judging whether a first sentence to be analyzed is complete; if the first sentence to be analyzed is complete, judging whether the next sentence (a second sentence to be analyzed) of the first sentence to be analyzed is complete; under the condition that a second sentence to be analyzed is incomplete, judging whether a first combined sentence obtained by splicing the first sentence to be analyzed and the second sentence to be analyzed is complete or not, and under the condition that the first combined sentence is incomplete, further judging whether a second combined sentence obtained by splicing the second sentence to be analyzed and a next sentence of the second sentence to be analyzed is complete or not; whether each sentence is complete or not can be further considered, and whether the adjacent sentences are complete or not after splicing can be further considered, so that the completeness of the sentence meaning of each sentence can be further ensured under the condition that the completeness of each sentence is ensured, and the accuracy of audio sentence interruption can be further improved.

In one embodiment, the semantic integrity of the target sentence is judged through a preset semantic integrity model. The target sentence may be a sentence whose semantic integrity needs to be determined in the embodiment of fig. 1, specifically, the target sentence may be the first to-be-analyzed sentence, the second to-be-analyzed sentence, or a sentence obtained by combining, and the sentence obtained by combining may be the first combined sentence in step 109, the second combined sentence in step 111, or the third combined sentence, and so on.

Judging the semantic integrity of the target sentence through a preset semantic integrity model, comprising the following steps of: acquiring a word vector, a sentence vector and a position vector corresponding to the target statement; obtaining a coding sequence corresponding to the target statement according to the word vector, the sentence vector and the position vector; inputting the coding sequence into a preset semantic complete model to obtain the integrity probability of the target statement; and determining the semantic integrity of the target statement according to the integrity probability.

The Word Vector is an original Word Vector of each Word in a sentence, and the Vector can be initialized randomly, or pre-trained by using algorithms such as Word2Vector and the like to serve as an initial value.

Where the text vector is used to distinguish two sentences, it is understood to represent different sentences.

The position vectors are used for distinguishing different meanings of the same word at different positions, and because the same word appears at different positions and represents different meanings (such as I love you and I love you), the model adds a different position vector to the words at different positions for distinguishing.

Specifically, the coding sequence corresponding to the target sentence can be obtained by superimposing the word vector, the sentence vector and the position vector corresponding to the target sentence. After obtaining the coding sequence corresponding to the target sentence, the coding sequence corresponding to the target sentence and a specific code (referred to as a classification token [ CLS ]) can be used as input of a semantic integrity model, and feature extraction and semantic integrity recognition are performed on the coding sequence corresponding to the target sentence and the specific code through the semantic integrity model, so that the integrity probability of the target sentence can be obtained, wherein the semantic integrity probability refers to the probability that the target sentence is a complete sentence; if the integrity probability of the target statement is greater than or equal to a preset probability threshold, determining that the semantics of the target statement are complete; and if the integrity probability of the target sentence is less than a preset probability threshold, determining that the semantics of the target sentence are incomplete.

In a specific embodiment, the semantic integrity model may include a Bidirectional encoded depth Representation (BERT) model and a Deep Neural Network (DNN) model, where the BERT model is configured to perform feature extraction on a coding sequence and a specific code corresponding to a target sentence to obtain a semantic feature corresponding to the target sentence; the DNN model is used for obtaining the integrity probability of the target statement according to the semantic features corresponding to the target statement. In this embodiment, the semantic features corresponding to the target sentence may be feature vectors output by the encoding component at the position corresponding to the specific encoding; the DNN model may include a fully connected layer, a pooled layer, Dropout.

In one embodiment, the predetermined semantic integrity model is pre-trained. In the process of training to obtain a semantic integrity model, a large number of speech corpora are obtained from a speech corpus, and a text obtained after the obtained speech corpora are subjected to character conversion is cut to obtain a large number of sentences; and then marking a corresponding complete standard label on each sentence, and finally inputting each sentence and an integrity label corresponding to each sentence into a not-yet-trained semantic complete model for training, so that the trained semantic complete model can be highly consistent with the integrity label corresponding to each sentence based on a sentence integrity judgment result output by each sentence, and further the trained semantic complete model has the capability of judging the sentence integrity. On the basis, the technical scheme provided by the application can complete sentence break processing on the long audio information.

In one embodiment, when a text obtained after text conversion is performed on an obtained voice corpus is cut, a sentence cut by using a comma as a cut point is used as a positive sample, and complete labels are respectively marked on the sentences in the positive sample; and randomly segmenting the sentences to obtain negative examples, and marking incomplete labels on the sentences in the negative examples. For example, using one-hot encoding, the label of the positive sample can be (1, 0) and the label of the negative sample can be (0, 1). Alternatively, the type of positive samples may be represented by 1, and the type of negative samples may be represented by 0. Illustratively, "I love fruit, and He love vegetables. "i love to eat fruits" and "he love to eat vegetables" obtained by comma cutting are taken as positive samples and are complete types; the ' I love to eat ' and ' eat fruits and the ' he ' are obtained by randomly segmenting sentences, are used as negative samples and are incomplete types; and a semantic complete model for judging whether the sentence is complete can be obtained by training by using the positive and negative samples.

In this embodiment, through the process of obtaining the semantic integrity model through training, the trained semantic integrity model is enabled to have the capability of judging the integrity of the sentence, and specifically, the trained semantic integrity model is enabled to have the capability of calculating the probability of the semantic integrity of the sentence. Calculating the semantic integrity probability of the sentence through the trained semantic integrity model, comparing the semantic integrity probability with a preset threshold, and outputting the target sentence as an integral sentence if the semantic integrity probability is greater than the preset threshold; and if the semantic integrity probability is not greater than the preset threshold, outputting the target statement as a non-integrity statement. Through the semantic integrity probability, whether the sentence is complete or not can be accurately indicated, different judgment standards can be set according to actual requirements through setting a probability threshold value, and the application range of the scheme is widened.

As shown in fig. 3, the present application proposes an audio sentence-breaking device, the device comprising:

the first determining module 501 is configured to use a first sentence of the multiple sequentially arranged sentences as a first sentence to be analyzed, and determine semantic integrity of the first sentence to be analyzed, where the multiple sequentially arranged sentences are obtained by performing speech recognition on multiple sequentially arranged audio segments, and the multiple sequentially arranged audio segments are obtained by dividing a target audio.

A second judging module 502, configured to, when the semantic of the first to-be-analyzed sentence is complete, put the first to-be-analyzed sentence into the to-be-output area, and use a next sentence of the first to-be-analyzed sentence as a second to-be-analyzed sentence, so as to judge the semantic integrity of the second to-be-analyzed sentence.

A third determining module 503, configured to, when the semantic of the second sentence to be analyzed is incomplete, merge the second sentence to be analyzed and the sentence in the area to be output to obtain a first merged sentence, and determine the semantic integrity of the first merged sentence.

A fourth determining module 504, configured to output the sentence in the region to be output when the semantic of the first merged sentence is incomplete; and merging the next statement of the second statement to be analyzed with the second statement to be analyzed to obtain a second merged statement, taking the second merged statement as the first statement to be analyzed, and executing the step of judging the semantic integrity of the first statement to be analyzed until the plurality of sequentially arranged statements are all output.

In the device, firstly, judging whether a first sentence to be analyzed is complete; if the first sentence to be analyzed is complete, judging whether the next sentence (a second sentence to be analyzed) of the first sentence to be analyzed is complete; under the condition that a second sentence to be analyzed is incomplete, judging whether a first combined sentence obtained by splicing the first sentence to be analyzed and the second sentence to be analyzed is complete or not, and under the condition that the first combined sentence is incomplete, further judging whether a second combined sentence obtained by splicing the second sentence to be analyzed and a next sentence of the second sentence to be analyzed is complete or not; whether each sentence is complete or not can be further considered, whether the adjacent sentences are complete or not after splicing is further considered, and therefore completeness of sentence meanings is further guaranteed under the condition that completeness of each sentence is guaranteed, and accuracy of audio sentence breaking is further improved.

As shown in fig. 4, fig. 4 is a schematic structural diagram of a computer device provided in an embodiment of the present application, where the computer device 60 includes a processor 601 and a memory 602. The processor 601 is connected to the memory 602, for example, the processor 601 may be connected to the memory 602 through a bus.

The processor 601 is configured to enable the computer device 60 to perform corresponding functions in the methods of fig. 1-4. The processor 601 may be a Central Processing Unit (CPU), a Network Processor (NP), a hardware chip, or any combination thereof. The hardware chip may be an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

The memory 602 is used for storing program codes and the like. The memory 602 may include Volatile Memory (VM), such as Random Access Memory (RAM); the memory 602 may also include a non-volatile memory (NVM), such as a read-only memory (ROM), a flash memory (flash memory), a hard disk (HDD) or a solid-state drive (SSD); the memory 602 may also comprise a combination of memories of the kind described above.

In some possible cases, the processor 601 may call the program code to:

taking a first sentence in a plurality of sentences arranged in sequence as a first sentence to be analyzed, and judging the semantic integrity of the first sentence to be analyzed, wherein the plurality of sentences arranged in sequence are obtained by segmenting a text obtained after target audio voice recognition;

under the condition that the semantics of the second sentence to be analyzed is incomplete, combining the second sentence to be analyzed and the sentences in the area to be output to obtain a first combined sentence, and judging the semantic integrity of the first combined sentence;

and under the condition that the semantics of the first combined statement are incomplete, outputting and clearing the statement in the area to be output, combining the next statement of the second statement to be analyzed with the second statement to be analyzed to obtain a second combined statement, taking the second combined statement as the first statement to be analyzed, and executing the step of judging the semantic integrity of the first statement to be analyzed until the plurality of statements arranged in sequence are all output.

It should be noted that, the implementation of each operation may also correspond to the corresponding description with reference to the above method embodiment; the processor 601 may also cooperate with other functional hardware to perform other operations in the above-described method embodiments.

Embodiments of the present application further provide a computer-readable storage medium, in which a computer program is stored, the computer program including program instructions, which, when executed by a computer, cause the computer to execute the method according to the foregoing embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. An audio sentence-breaking method, the method comprising:

2. The method of claim 1, further comprising:

3. The method of claim 1, further comprising:

4. The method of claim 1, further comprising:

and under the condition that the semantics of the first sentence to be analyzed are incomplete, combining the first sentence to be analyzed and a next sentence of the first sentence to be analyzed to obtain a third combined sentence, taking the third combined sentence as the first sentence to be analyzed, and executing the step of judging the semantics integrity of the first sentence to be analyzed.

5. The method of claim 1, further comprising:

if the sentence length of the second combined sentence reaches the preset sentence length, putting the second combined sentence into the to-be-output area, taking a third to-be-analyzed sentence as the second to-be-analyzed sentence, and executing the step of judging the semantic integrity of the second to-be-analyzed sentence until the plurality of sentences arranged in sequence are all output; the third sentence to be analyzed is a sentence next to the second sentence to be analyzed.

6. The method according to claim 1, wherein said preceding a first sentence of the plurality of sentences in sequence as a first sentence to be analyzed comprises:

acquiring the target audio and identifying silent tones in the target audio;

dividing the target audio according to the silent tone to obtain a plurality of audio segments arranged in sequence;

and performing character recognition on the plurality of audio frequency segments which are arranged in sequence to obtain a plurality of sentences which are arranged in sequence.

7. The method according to claim 1, wherein determining the semantic integrity of the target sentence according to a preset semantic integrity model, where the target sentence is the first sentence to be analyzed, the second sentence to be analyzed, or a sentence obtained by combining the first sentence to be analyzed, and the determining the semantic integrity of the target sentence according to the preset semantic integrity model includes:

acquiring a word vector, a sentence vector and a position vector corresponding to the target statement;

obtaining a coding sequence corresponding to the target statement according to the word vector, the sentence vector and the position vector;

inputting the coding sequence into a preset semantic complete model to obtain the integrity probability of the target statement;

and determining the semantic integrity of the target statement according to the integrity probability.

8. An audio sentence-breaking apparatus, the apparatus comprising:

9. A computer device comprising memory and one or more processors to execute one or more computer programs stored in the memory, the one or more processors, when executing the one or more computer programs, causing the computer device to implement the method of any of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-7.