CN109446376B

CN109446376B - Method and system for classifying voice through word segmentation

Info

Publication number: CN109446376B
Application number: CN201811290932.9A
Authority: CN
Inventors: 魏誉荧
Original assignee: Guangdong Genius Technology Co Ltd
Current assignee: Guangdong Genius Technology Co Ltd
Priority date: 2018-10-31
Filing date: 2018-10-31
Publication date: 2021-06-25
Anticipated expiration: 2038-10-31
Also published as: CN109446376A

Abstract

The invention provides a method and a system for classifying voice by word segmentation, wherein the method comprises the following steps: acquiring a corpus sample library, and establishing an audio library and a semantic slot according to a corpus sample in the corpus sample library; acquiring voice audio; comparing the voice audio with the word segmentation audio in the audio library, and generating matched word segmentation audio in the voice audio; merging the same word segmentation audio, and counting the frequency of each merged word segmentation audio in the voice audio; obtaining a word segmentation segment semantic corresponding to the word segmentation segment audio according to the semantic slot; selecting semantics corresponding to one or more semantic sets as classification labels of the voice audio according to the semantic meanings and the frequency of the word segmentation fragments; and classifying the voice audio according to the classification label. The invention can rapidly and accurately classify the content of the voice audio by word segmentation, thereby clearly storing the voice audio and facilitating the subsequent search.

Description

Method and system for classifying voice through word segmentation

Technical Field

The present invention relates to the field of speech recognition technology, and more particularly, to a method and system for classifying speech by word segmentation.

Background

With the rapid development of the internet, people's lives become more and more intelligent, and therefore people are also more and more accustomed to using intelligent terminals to fulfill various requirements. And along with the increasing maturity of the related technology of artificial intelligence, the intelligent degree of various terminals is also higher and higher. Voice interaction is also becoming more popular with users as one of the mainstream communication applications of human-computer interaction in intelligent terminals. The user receives a large amount of speech each day.

For voice information that the user finds valuable, the user may choose to store it. However, when the user selects to store the voice information, the user may select a default route and a default name to store the voice information, and the voice information is not classified accordingly, which results in that it is very tedious to subsequently search for the required voice information. Or the user is required to print a classification label for each voice message to be stored one by one and then classify the voice message into a corresponding class, which is complex in process, and the user may forget to select the voice message to be stored or may not classify the voice message due to other accidents, which also brings a certain trouble for subsequently searching the voice message to be required.

Therefore, there is a need for a method for classifying speech, which can intelligently and separately store the speech information to be stored, and facilitate the subsequent quick and accurate search of the required speech information.

Disclosure of Invention

The invention aims to provide a method and a system for classifying voice by word segmentation, which can realize the purpose of rapidly and accurately classifying the content of voice audio by word segmentation, thereby clearly storing the voice audio and facilitating subsequent searching.

The technical scheme provided by the invention is as follows:

the invention provides a method for classifying voice by word segmentation, which is characterized by comprising the following steps:

acquiring a corpus sample library, and establishing an audio library and a semantic slot according to a corpus sample in the corpus sample library;

acquiring voice audio;

comparing the voice audio with the word segmentation audio in the audio library, and generating matched word segmentation audio in the voice audio;

merging the same word segmentation audio, and counting the frequency of each merged word segmentation audio in the voice audio;

obtaining the semantic of the word segmentation segment corresponding to the audio frequency of the word segmentation segment according to the semantic slot;

selecting one or more semantics as the classification labels of the voice audio according to the word segmentation semantic meanings and the frequency;

and classifying the voice audio according to the classification label.

Further, the acquiring a corpus sample library, and establishing an audio library and a semantic groove according to the corpus samples in the corpus sample library specifically includes:

obtaining the corpus sample library, and performing word segmentation on the corpus samples in the corpus sample library according to a word segmentation technology to obtain word segments contained in the corpus samples;

acquiring word segmentation audio corresponding to the word segmentation, and establishing an audio library according to the word segmentation audio and the corresponding word segmentation;

and acquiring the word segmentation semantics corresponding to the word segmentation, and establishing a semantic slot according to the word segmentation semantics and the word segmentation.

Further, the acquiring a corpus sample library, and establishing an audio library and a semantic slot according to the corpus samples in the corpus sample library further includes:

obtaining the semantic meaning of the corpus sample corresponding to the corpus sample and the part of speech corresponding to the participle;

analyzing sentence structures of the corpus samples by combining the corpus sample semantics, the participle semantics and the part of speech;

and if the participles belong to the keywords in the sentence structure, marking the participles as key participles.

Further, the selecting semantics corresponding to one or more semantic sets according to the word segmentation semantics and the frequency as the classification label of the voice audio specifically includes:

forming a corresponding semantic set according to the word segmentation semantic, and combining the semantic sets with similar or similar semantics;

and selecting the semantics corresponding to one or more combined semantic sets as the classification labels of the voice audio by combining the frequency of the word segmentation audio in the voice audio.

Further, before selecting the semantics corresponding to one or more merged semantic collections as the classification labels of the speech audio by combining the frequencies of the word segmentation segment audio appearing in the speech audio, the method includes:

acquiring target participles corresponding to the participle fragment audios according to the audio library;

judging whether the target participle contains the key participle;

the selecting, in combination with the frequency of the segmented speech segment audio appearing in the speech audio, a semantic corresponding to one or more merged semantic sets as the classification label of the speech audio specifically includes:

if the target participle contains the key participle, selecting a semantic set corresponding to the key participle;

and selecting the semantics corresponding to the semantic set of one or more key participles as the classification labels of the voice audio by combining the frequency of the key participles appearing in the voice audio.

The present invention also provides a system for classifying speech by word segmentation, comprising:

the database establishing module is used for acquiring a corpus sample library and establishing an audio library and a semantic slot according to the corpus samples in the corpus sample library;

the voice acquisition module is used for acquiring voice audio;

the matching module is used for comparing the voice audio acquired by the voice acquisition module with the word segmentation audio in the audio library established by the database establishment module and generating matched word segmentation audio in the voice audio;

the processing module is used for merging the word segmentation audio obtained by the same matching module and counting the frequency of each merged word segmentation audio in the voice audio;

the semantic acquisition module is used for acquiring the participle fragment semantics corresponding to the participle fragment audio frequency obtained by the matching module according to the semantic slot established by the database establishing module;

the analysis module selects one or more semantics as the classification labels of the voice audio according to the word segmentation segment semantics acquired by the semantic acquisition module and the frequency counted by the processing module;

and the classification module classifies the voice audio according to the classification label selected by the analysis module.

Further, the database establishing module specifically includes:

the word segmentation unit is used for acquiring the corpus sample library, and performing word segmentation on the corpus samples in the corpus sample library according to a word segmentation technology to obtain word segments contained in the corpus samples;

the acquisition unit is used for acquiring the word segmentation audio corresponding to the word segmentation obtained by the word segmentation unit;

the database establishing unit is used for establishing an audio database according to the word segmentation audio acquired by the acquiring unit and the word segmentation obtained by the corresponding word segmentation unit;

the acquisition unit is used for acquiring the participle semanteme corresponding to the participle obtained by the participle unit;

and the database establishing unit is used for establishing a semantic slot according to the word segmentation semantics acquired by the acquiring unit and the word segmentation acquired by the corresponding word segmentation unit.

Further, the database building module further comprises:

the obtaining unit is used for obtaining the semantic meaning of the corpus sample corresponding to the corpus sample and the part of speech corresponding to the participle obtained by the participle unit;

the parsing unit is used for parsing the sentence structure of the corpus sample by combining the corpus sample semantics, the participle semantics and the part of speech acquired by the acquisition unit;

and the marking unit is used for marking the participle as a key participle if the participle analyzed by the analyzing unit belongs to the key word in the sentence structure.

Further, the analysis module specifically includes:

the merging unit is used for forming a corresponding semantic set according to the participle fragment semantics and merging the semantic sets with similar or similar semantics;

and the analysis unit is used for selecting the semanteme corresponding to the semanteme set combined by one or more combining units as the classification label of the voice audio by combining the frequency of the word segmentation segment audio in the voice audio.

Further, the analysis module further comprises:

the target word segmentation acquiring unit acquires target words corresponding to the word segmentation segment audio according to the audio library;

the judging unit is used for judging whether the target participle acquired by the target participle acquiring unit contains the key participle or not;

the analysis unit specifically includes:

a selecting subunit, wherein if the judging unit judges that the target participle contains the key participle, a semantic set corresponding to the key participle is selected;

and the analysis subunit selects semantics corresponding to the semantic set of the key participles selected by one or more selection subunits as the classification labels of the voice audio by combining the frequency of the key participles appearing in the voice audio.

The method and the system for classifying the voice by word segmentation provided by the invention can bring at least one of the following beneficial effects:

1. according to the invention, a corpus sample library is formed by collecting a large number of corpus samples, and then an audio library and a semantic groove are established, so that the acquired voice audio can be conveniently matched subsequently, and then a classification label corresponding to the voice audio is obtained.

2. In the invention, the selection range of the classification label of the voice audio is narrowed by combining the same word segmentation audio in the voice audio.

3. In the invention, the classification label of the voice audio is selected by combining the frequency of each word segmentation segment audio appearing in the voice audio, so that the selected classification label can represent the intention of the voice audio to the maximum extent.

4. According to the invention, the classification labels of the voice audios are intelligently selected through word segmentation, and then the voice audios are classified, so that all the voice audios are stored in order, and the target voice audio can be conveniently and rapidly and accurately searched in the follow-up process.

Drawings

The above features, technical features, advantages and implementations of a method and system for classifying speech by segmentation will be further explained in the following detailed description of preferred embodiments in a clearly understandable manner with reference to the accompanying drawings.

FIG. 1 is a flow chart of a first embodiment of a method of classifying speech by word segmentation in accordance with the present invention;

FIG. 2 is a flow chart of a second embodiment of a method of classifying speech by word segmentation in accordance with the present invention;

FIGS. 3 and 4 are flow charts of a third embodiment of a method for classifying speech by word segmentation according to the present invention;

FIG. 5 is a schematic diagram of a fourth embodiment of a system for classifying speech by word segmentation according to the present invention;

FIG. 6 is a schematic diagram of a fifth embodiment of a system for classifying speech by word segmentation according to the present invention;

FIG. 7 is a diagram illustrating a sixth embodiment of a system for classifying speech by word segmentation according to the present invention.

The reference numbers illustrate:

system for classifying speech by 1000 participles

1100 database creation module 1110 participle unit 1120 acquisition unit 1130 database creation unit 1140 parsing unit 1150 labeling unit

1200 speech acquisition module

1300 matching module

1400 processing module

1500 semantic acquisition module

1600 analysis module 1610 merging unit 1620 target participle obtaining unit 1630 judging unit

1640 analysis unit 1641 selects subunit 1642 analysis subunit

1700 classification module

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will be made with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.

For the sake of simplicity, the drawings only schematically show the parts relevant to the present invention, and they do not represent the actual structure as a product. In addition, in order to make the drawings concise and understandable, components having the same structure or function in some of the drawings are only schematically illustrated or only labeled. In this document, "one" means not only "only one" but also a case of "more than one".

A first embodiment of the present invention, as shown in fig. 1, is a method for classifying speech by word segmentation, comprising:

s100, a corpus sample library is obtained, and an audio library and a semantic groove are established according to the corpus samples in the corpus sample library.

Specifically, a large number of corpus samples are collected and acquired to establish a corpus sample library, and then all corpus samples are analyzed to obtain participles in the corpus samples and corresponding audio, semantics and the like, so that an audio library and a semantic groove are established.

S200 acquires voice audio.

Specifically, the voice audio is acquired, which may be the voice input by the user in real time, for example, in the process of communicating with other users in a voice manner, when the user feels that one or more of the voices relate to valuable information which may be needed subsequently, the information needs to be saved, and in order to facilitate subsequent searching and viewing, the information needs to be saved in a classified manner.

In addition, the audio may also be downloaded or recorded, for example, the recorded audio has a large amount of information, and the user does not have sufficient time to recognize each other, so that all the audio needs to be classified in order to quickly and accurately find the audio needed by the user among the recorded large amount of audio.

S300, comparing the voice audio with the word segmentation audio in the audio library, and generating matched word segmentation audio in the voice audio.

Specifically, the acquired voice audio and the participle audio in the audio library summarized according to a large amount of corpus samples are matched one by one, and when a matching result of a certain participle audio in the audio library and a certain part of the acquired voice audio is consistent, participle segment audio corresponding to the part of participle segment audio is generated in the voice audio, so that the voice audio is divided into a plurality of participle segment audios.

S400, merging the same word segmentation audio, and counting the frequency of each merged word segmentation audio in the voice audio.

Specifically, all the split word segmentation audio frequencies are identified, the same word segmentation audio frequencies are combined, then the frequency of each combined word segmentation audio frequency in the voice audio frequency is counted, and the combined word segmentation audio frequencies are counted according to the number before the word segmentation audio frequencies are combined.

For example, 10 participle segment audios are split in a certain voice audio, wherein the participle segment audio is "animal" appearing 5 times, and "what" appearing 3 times, and "yes" appearing 2 times, and after the same participle segment audios are combined, 3 participle segment audios are obtained, the frequency of "animal" appearing is 0.5, the frequency of "what" appearing is 0.3, and the frequency of "animal" appearing is 0.2.

S500, obtaining the semantic meaning of the word segmentation segment corresponding to the audio frequency of the word segmentation segment according to the semantic groove.

Specifically, when a matching result of a certain word segmentation audio frequency in the audio frequency library and a certain part of the obtained voice audio frequency is consistent, word segmentation segment audio frequency corresponding to the part is generated, then word segmentation corresponding to the word segmentation segment audio frequency is obtained according to the audio frequency library, and word segmentation segment semantics corresponding to the word segmentation segment audio frequency is obtained according to the word segmentation and the semantic slot.

S600, one or more semantics are selected as the classification labels of the voice audio according to the word segmentation semantic meanings and the frequency.

Specifically, according to the word segmentation segment semantics corresponding to the word segmentation segment audio and the frequency of each word segmentation segment audio appearing in the voice audio, the word segmentation segment semantics are arranged according to the sequence of the corresponding frequencies from large to small, and one or more semantics arranged in front are selected as the classification labels of the voice audio.

According to the voice classification method, the acquired voice audio is analyzed by the system, and then the voice audio is intelligently classified, but similarly, a user can select the classification label corresponding to the voice audio according to self understanding.

S700, classifying the voice audio according to the classification label.

Specifically, the classification label corresponding to the voice audio is obtained, and no matter the classification label is intelligently selected by the system or is selected by the user, the obtained voice audio is classified and stored according to the classification label, so that subsequent searching is facilitated.

In this embodiment, a corpus sample library is formed by collecting a large number of corpus samples, and then an audio library and a semantic groove are established, so that the acquired voice audio can be matched in the following process, and then the classification label corresponding to the voice audio is obtained.

According to the method, the classification labels of the voice audios are intelligently selected through word segmentation, and then the voice audios are classified, so that all the voice audios are stored in order, and the target voice audio can be conveniently and rapidly and accurately searched in the follow-up process.

The method comprises the steps of combining the same word segmentation segment audios in the voice audios, narrowing the selection range of the classification labels of the voice audios, and selecting the classification labels of the voice audios by combining the frequency of each word segmentation segment audio appearing in the voice audios, so that the selected classification labels can represent the intention of the voice audios to the greatest extent.

A second embodiment of the present invention is an optimized embodiment of the first embodiment, and as shown in fig. 2, the second embodiment includes:

s110, a corpus sample library is obtained, and the corpus samples in the corpus sample library are subjected to word segmentation according to a word segmentation technology to obtain the segmented words contained in the corpus samples.

Specifically, a large number of corpus samples are collected and acquired to establish a corpus sample library, the corpus samples not only refer to written texts, but also include voices, audios and the like, and the difference is that the corpus samples such as the voices and the audios need to be converted into corresponding text information first, and then subsequent processing is performed.

The method comprises the steps of segmenting words of a corpus sample according to a word segmentation technology, judging the structure of sentences in the corpus sample, identifying the part of speech of the words in each sentence in the corpus sample, and dividing the whole sentence in each sentence in the corpus sample into words, phrases and other words according to the part of speech of the words. Therefore, the participles and the corresponding parts of speech contained in the corpus sample are obtained.

S120, obtaining the word segmentation audio corresponding to the word segmentation, and establishing an audio library according to the word segmentation audio and the corresponding word segmentation.

Specifically, the audio corresponding to each word is obtained, due to the influence of factors such as the age and the accent of the user, the same word may correspond to multiple audios, different audios of the same word are obtained as many as possible, the voice of the user can be recognized comprehensively after one time, and omission is avoided. And then establishing an audio library according to all the audios, and establishing a corresponding relation between the participles and the audios in the audio library.

S130, obtaining the word segmentation semantics corresponding to the word segmentation, and establishing a semantic slot according to the word segmentation semantics and the word segmentation.

Specifically, all the participles contained in all the corpus samples are obtained, a semantic slot is established according to all the participles and the participle semantics corresponding to the participles, and the corresponding relation between the participles and the participle semantics is established in the semantic slot.

S140, obtaining the semantic meaning of the corpus sample corresponding to the corpus sample and the part of speech corresponding to the participle.

S150, analyzing the sentence structure of the corpus sample by combining the corpus sample semantics, the participle semantics and the part of speech.

S160, if the participle belongs to the key word in the sentence structure, the participle is marked as a key participle.

Specifically, the semantic meaning of the corpus sample corresponding to the corpus sample and the part of speech corresponding to each participle are obtained, and then the sentence structure of the corpus sample is analyzed by combining the semantic meaning of the corpus sample, the semantic meaning of the participle and the part of speech of the participle.

The part of speech of each participle is judged firstly, if the part of speech of a participle belongs to the part of speech without practical meaning such as a connecting word, the participle has no great influence on the semanteme of the corpus sample, and therefore the participle of the type can be eliminated firstly.

And secondly, judging the influence of the participle semantics of each participle on the semantics of the corpus sample, judging whether the semantics of the corpus sample can be understood if a certain participle is deleted, if so, indicating that the participle is irrelevant, and otherwise, indicating that the participle is a keyword for understanding the semantics of the corpus sample. And finally, marking the participles which are determined as the keywords as key participles.

S200 acquires voice audio.

S700, classifying the voice audio according to the classification label.

In this embodiment, the corpus samples are segmented according to a segmentation technology, so as to establish an audio library and a semantic slot, and the sentence structure of the corpus samples is analyzed by combining the corpus sample semantics, the segmentation semantics and the part of speech of the segmentation, so as to determine key segmentation, so as to identify voice audio in the following process, and select corresponding classification tags.

A third embodiment of the present invention is an optimized embodiment of the first and second embodiments, and as shown in fig. 3 and 4, includes:

S200 acquires voice audio.

S610, forming a corresponding semantic set according to the participle fragment semantics, and combining the semantic sets with similar or similar semantics.

Specifically, a corresponding semantic set is formed according to the participle fragment semantics, each participle fragment semantics correspondingly forms a semantic set, then the semantics of each semantic set are identified, the semantic sets with similar or similar semantics are combined, and the residual semantic sets after combination are any one of the mutually combined semantic sets.

For example, the semantic sets "cup" and "cup" may be merged, the remaining semantic sets after merging are "cup" or "cup", the probability of "cup" appearing in the speech audio before merging is 0.3, the probability of "cup" is 0.1, and the probability of the remaining semantic sets after merging is 0.4.

S620, acquiring the target participle corresponding to the participle segment audio according to the audio library.

S630 determines whether the target participle includes the key participle.

Specifically, the target participle corresponding to the participle fragment audio is determined according to the corresponding relation between the participle and the participle audio in the audio library, and then whether the target participle contains the key participle is judged, if so, the key participle can represent the semantic meaning of the voice audio to a certain extent.

S640, combining the frequency of the word segmentation segment audio in the voice audio, and selecting one or more semantics corresponding to the combined semantic set as the classification label of the voice audio.

Specifically, if the target participle includes a key participle, the key participle can represent the semantics of the voice audio to a certain extent, and the classification label is selected by combining the frequency of the key participle appearing in the voice audio. And if the target word segmentation does not contain the key word segmentation, selecting a classification label according to the frequency of the target word segmentation in the voice audio.

The S640 specifically includes, in combination with the frequency of the word segmentation audio appearing in the voice audio, selecting, as the classification label of the voice audio, a semantic corresponding to one or more merged semantic sets:

s641, if the target participle contains the key participle, selecting a semantic set corresponding to the key participle.

S642, in combination with the frequency of the key participles appearing in the voice audio, selects semantics corresponding to a semantic set of one or more of the key participles as a classification tag of the voice audio.

Specifically, if the target participle includes a key participle, selecting a semantic set corresponding to the key participle, arranging the semantic sets corresponding to the key participle according to the frequency in combination with the frequency of the key participle in the voice audio, and then selecting the semantics corresponding to one or more semantic sets arranged in front as the classification label.

S700, classifying the voice audio according to the classification label.

In this embodiment, the voice audio is compared with an audio library and a semantic slot established by corpus samples, so as to obtain target participles corresponding to participle segment audio in the voice audio, and then the target participles are matched with the key participles, and then a classification label is selected according to the probability of the participle segment audio appearing in the voice audio, so as to classify the voice audio.

A fourth embodiment of the present invention, as shown in fig. 5, is a system 1000 for classifying speech by word segmentation, comprising:

the database establishing module 1100 obtains a corpus sample library, and establishes an audio library and a semantic slot according to the corpus samples in the corpus sample library.

Specifically, the database establishing module 1100 collects and acquires a large number of corpus samples to establish a corpus sample library, and then analyzes all corpus samples to obtain word segments in the corpus samples and corresponding audio and semantics, thereby establishing an audio library and a semantic groove.

The voice acquiring module 1200 acquires a voice audio.

Specifically, the voice acquiring module 1200 acquires voice audio, which may be voice input by a user in real time, for example, in a process of communicating with other users in a voice manner, when the user feels that one or more voices relate to valuable information which may be needed subsequently, the information needs to be saved, and in order to facilitate subsequent searching and viewing, the information needs to be saved in a classified manner.

The matching module 1300 compares the speech audio obtained by the speech obtaining module 1200 with the segmentation audio in the audio library established by the database establishing module 1100, and generates matching segmentation audio in the speech audio.

Specifically, the matching module 1300 matches the acquired speech audio with the participle audio in the audio library summarized according to a large number of corpus samples one by one, and when a matching result of a certain participle audio in the audio library matches a matching result of a certain part of the acquired speech audio, generates a participle segment audio corresponding to the part of the participle segment audio in the speech audio, thereby splitting the speech audio into a plurality of participle segment audios.

The processing module 1400 is configured to combine the same segment audio obtained by the matching module 1300, and count the frequency of each combined segment audio appearing in the speech audio.

Specifically, the processing module 1400 identifies all the split word segmentation audio frequencies, merges the same word segmentation audio frequencies, then counts the frequency of each merged word segmentation audio frequency in the speech audio frequency, and counts the number of the merged word segmentation audio frequencies according to the number before merging.

The semantic acquiring module 1500 acquires the semantic of the word segmentation segment corresponding to the audio of the word segmentation segment obtained by the matching module 1300 according to the semantic slot established by the database establishing module 1100.

Specifically, when a matching result of a certain word segmentation audio frequency in the audio frequency library matches a matching result of a certain part of the acquired voice audio frequency, a word segmentation segment audio frequency corresponding to the part is generated, then a word segmentation corresponding to the word segmentation segment audio frequency is obtained according to the audio frequency library, and the semantic obtaining module 1500 obtains a word segmentation segment semantic corresponding to the word segmentation segment audio frequency according to the word segmentation and the semantic slot.

The analysis module 1600 selects one or more semantics as the classification tags of the voice audio according to the word segmentation semantic acquired by the semantic acquisition module 1500 and the frequency counted by the processing module 1400.

Specifically, the analysis module 1600 arranges the segmentation segment semantics according to the sequence of the corresponding frequencies from large to small according to the segmentation segment semantics corresponding to the segmentation segment audio and the frequency of each segmentation segment audio appearing in the voice audio, and selects one or more semantics arranged in front as the classification tag of the voice audio.

A classification module 1700 configured to classify the voice audio according to the classification label selected by the analysis module 1600.

Specifically, the classification module 1700 obtains a classification tag corresponding to the voice audio, and classifies and stores the obtained voice audio according to the classification tag regardless of whether the classification tag is intelligently selected by the system or is selected by the user, so as to facilitate subsequent searching.

A fifth embodiment of the present invention is a preferable embodiment of the fourth embodiment, and as shown in fig. 6, the fifth embodiment includes:

The database establishing module 1100 specifically includes:

the word segmentation unit 1110 obtains the corpus sample library, and performs word segmentation on the corpus samples in the corpus sample library according to a word segmentation technology to obtain words included in the corpus samples.

Specifically, the word segmentation unit 1110 collects and acquires a large number of corpus samples to establish a corpus sample library, where the corpus samples refer to not only written texts but also voices, audios, and the like, and the difference is that the corpus samples such as voices and audios need to be converted into corresponding text information first, and then subsequent processing is performed.

The obtaining unit 1120 obtains the word segmentation audio corresponding to the word segmentation obtained by the word segmentation unit 1110.

The database creating unit 1130 creates an audio database according to the word segmentation audio obtained by the obtaining unit 1120 and the word segmentation obtained by the corresponding word segmentation unit 1110.

Specifically, the obtaining unit 1120 obtains the audio corresponding to each word segmentation, and due to the influence of factors such as the age and the accent of the user, the same word segmentation may correspond to multiple audios, so that different audios of the same word segmentation are obtained as many as possible, the voice of the user can be comprehensively recognized after one time of follow-up, and omission is avoided. Then, the database building unit 1130 builds an audio library from all the audios, and builds a correspondence between the participles and the audios in the audio library.

The obtaining unit 1120 obtains the semantic meaning of the segmented word corresponding to the segmented word obtained by the segmentation unit 1110.

The database building unit 1130 builds a semantic slot according to the word segmentation semantics acquired by the acquiring unit 1120 and the corresponding word segmentation obtained by the word segmentation unit 1110.

Specifically, the obtaining unit 1120 obtains all the participles included in all the corpus samples, and the database establishing unit 1130 establishes a semantic slot according to all the participles and the participle semantics corresponding to the participles, and establishes a corresponding relationship between the participles and the participle semantics in the semantic slot.

The database building module 1100 further comprises:

the obtaining unit 1120 obtains the semantic meaning of the corpus sample corresponding to the corpus sample and the part of speech corresponding to the participle obtained by the participle unit 1110.

An analyzing unit 1140, which analyzes the sentence structure of the corpus sample in combination with the corpus sample semantic, the participle semantic and the part of speech obtained by the obtaining unit 1120.

A labeling unit 1150, configured to label the participle as a key participle if the parsing unit 1140 parses that the participle belongs to a keyword in the sentence structure.

Specifically, the obtaining unit 1120 obtains the corpus sample semantics corresponding to the corpus sample and the part-of-speech corresponding to each participle, and the parsing unit 1140 parses the sentence structure of the corpus sample by combining the corpus sample semantics, the participle semantics, and the part-of-speech of the participle.

The parsing unit 1140 first determines the part of speech of each participle, and if the part of speech of a participle belongs to a part of speech having no practical meaning, such as a conjunctive word, the participle has no great influence on the semantic meaning of the corpus sample, so that such participle can be excluded first.

The parsing unit 1140 then determines the influence of the participle semantics of each participle on the corpus sample semantics, and determines whether the corpus sample semantics can be understood if a participle is deleted, if so, the participle is not significant, otherwise, the participle is a keyword for understanding the corpus sample semantics. The final labeling unit 1150 labels the segmented words determined as the keywords as the key segmented words.

The voice acquiring module 1200 acquires a voice audio.

The analysis module 1600 selects semantics corresponding to one or more semantic sets as the classification tags of the voice audio according to the word segmentation semantics acquired by the semantic acquisition module 1500 and the frequency counted by the processing module 1400.

A sixth embodiment of the present invention is a preferable embodiment of the fourth and fifth embodiments, and as shown in fig. 7, the sixth embodiment includes:

The voice acquiring module 1200 acquires a voice audio.

The analysis module 1600 selects one or more semantic meanings as the classification tags of the voice audio according to the word segmentation semantic meaning obtained by the semantic meaning obtaining module 1500 and the frequency counted by the processing module 1400.

The analysis module 1600 specifically includes:

the merging unit 1610 is configured to form a semantic set according to the participle segment semantics, and merge the semantic sets with similar or similar semantics.

Specifically, the merging unit 1610 forms a corresponding semantic set according to the participle fragment semantics, each participle fragment semantics correspondingly forms a semantic set, then identifies the semantics of each semantic set, merges the semantic sets with similar or similar semantics, and the remaining semantic sets after merging are any one of the mutually merged semantic sets.

The target word segmentation obtaining unit 16201120 obtains a target word segmentation corresponding to the word segmentation audio according to the audio library.

The determining unit 1630 determines whether the target participle acquired by the target participle acquiring unit 16201120 includes the key participle.

Specifically, the target participle obtaining unit 16201120 determines the target participle corresponding to the participle segment audio according to the correspondence between the participle and the participle audio in the audio library, and then the determining unit 1630 determines whether the target participle includes the above-mentioned key participle, if so, the key participle may represent the semantic of the speech audio to a certain extent.

The analysis unit 1640, in combination with the frequency of the word segmentation audio appearing in the voice audio, selects the semantics corresponding to the semantic set merged by the merging unit 1610 as the classification label of the voice audio.

Specifically, if the segment target segmented word includes a key segmented word, the key segmented word may represent the semantic of the voice audio to a certain extent, and the classification tag is selected according to the frequency of the key segmented word appearing in the voice audio. And if the target word segmentation does not contain the key word segmentation, selecting a classification label according to the frequency of the target word segmentation in the voice audio.

The analysis unit 1640 specifically includes:

a selecting subunit 1641, if the determining unit 1630 determines that the target participle includes the key participle, then select a semantic set corresponding to the key participle.

An analyzing subunit 1642, selecting, in combination with the frequency of the key participles appearing in the voice audio, semantics corresponding to the semantic set of the key participles selected by the selecting subunit 1641 as the classification labels of the voice audio.

Specifically, if the target participle includes a key participle, the selecting subunit 1641 selects a semantic set corresponding to the key participle, the analyzing subunit 1642 arranges the semantic set corresponding to the key participle according to the frequency in combination with the frequency of the key participle appearing in the voice audio, and then selects the semantics corresponding to one or more semantic sets arranged in front as the classification tags.

It should be noted that the above embodiments can be freely combined as necessary. The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method of classifying speech by word segmentation, comprising:

acquiring voice audio;

and classifying the voice audio according to the classification label.

2. The method of claim 1, wherein the obtaining of the corpus sample library, the establishing of the audio library and the semantic groove according to the corpus samples in the corpus sample library comprises:

3. The method of claim 2, wherein the obtaining the corpus sample pool, and establishing the audio pool and the semantic pool according to the corpus samples in the corpus sample pool further comprises:

4. The method of claim 3, wherein the selecting one or more semantics as the classification label of the speech audio according to the segmentation segment semantics and the frequency specifically comprises:

5. The method of claim 4, wherein the selecting the semantic meaning corresponding to the one or more merged semantic collections as the classification label of the speech audio according to the frequency of the segmented audio appearing in the speech audio comprises:

judging whether the target participle contains the key participle;

6. A system for classifying speech by segmentation, comprising:

the voice acquisition module is used for acquiring voice audio;

7. The system for classifying speech by word segmentation according to claim 6, wherein the database creation module specifically comprises:

8. The system for classifying speech by participle according to claim 7, wherein said database building module further comprises:

9. The system for classifying speech by word segmentation according to claim 8, wherein the analysis module specifically comprises:

10. The system for classifying speech by participle according to claim 9, wherein said analysis module further comprises:

the analysis unit specifically includes: