CN115240655A

CN115240655A - Chinese voice recognition system and method based on deep learning

Info

Publication number: CN115240655A
Application number: CN202210848331.5A
Authority: CN
Inventors: 张年乾
Original assignee: Shenzhen Lingjing Technology Co ltd
Current assignee: Shenzhen Lingjing Technology Co ltd
Priority date: 2022-07-19
Filing date: 2022-07-19
Publication date: 2022-10-25

Abstract

The invention provides a Chinese speech recognition system and method based on deep learning, relating to the technical field of speech recognition, wherein the system comprises: the voice acquisition module is used for receiving the Chinese voice segment to be recognized in real time and sequencing the Chinese voice segment to be recognized based on a time sequence; the voice recognition module is used for constructing a Chinese voice recognition model and sequentially inputting the acquired Chinese voice segments to be recognized into the Chinese voice recognition model for voice recognition based on the sequencing result to obtain a voice text; and the correction module is used for carrying out grammar correction on the obtained voice text based on the preset Chinese grammar to obtain the final voice recognition text. The Chinese speech recognition model is constructed to sequentially recognize the acquired Chinese speech segments to be recognized and correct the recognized speech text according to the Chinese grammar, so that the accuracy of Chinese speech recognition is ensured, and the effect of Chinese speech recognition is improved.

Description

Chinese voice recognition system and method based on deep learning

Technical Field

The invention relates to the technical field of voice recognition, in particular to a Chinese voice recognition system and method based on deep learning.

Background

At present, with the rapid improvement of computer processing capability, the speech recognition technology is rapidly developed, and the application of the speech recognition technology is increasingly changing the production and living modes of human beings, so that the speech recognition technology is widely applied to the fields such as a speech input system, a speech control system, an intelligent dialogue inquiry system and the like;

however, most of the existing voice recognition systems can only simply recognize the voice to be recognized, and cannot check the recognized voice text according to the Chinese semantics, so that a logic error or a grammar error exists in a recognition result, and meanwhile, a wrongly written word in the recognition result cannot be effectively corrected, so that the voice recognition effect is greatly reduced;

therefore, the invention provides a Chinese speech recognition system and method based on deep learning.

Disclosure of Invention

The invention provides a Chinese voice recognition system and method based on deep learning, which are used for sequentially recognizing an acquired Chinese voice segment to be recognized by constructing a Chinese voice recognition model and correcting a recognized voice text according to Chinese grammar, so that the accuracy of Chinese voice recognition is ensured, and the effect of Chinese voice recognition is improved.

The invention provides a Chinese speech recognition system based on deep learning, which comprises:

the voice acquisition module is used for receiving the Chinese voice segment to be recognized in real time and sequencing the Chinese voice segment to be recognized based on a time sequence;

the voice recognition module is used for constructing a Chinese voice recognition model and sequentially inputting the acquired Chinese voice segments to be recognized into the Chinese voice recognition model for voice recognition based on the sequencing result to obtain a voice text;

and the correction module is used for carrying out grammar correction on the obtained voice text based on the preset Chinese grammar to obtain the final voice recognition text.

Preferably, the system for recognizing chinese speech based on deep learning includes:

the voice acquisition unit is used for monitoring the current acoustic characteristics of the user in real time and determining the current voice state of the user based on the acoustic characteristics, wherein the voice state comprises a voiced sound and an unvoiced sound;

and the voice recording unit is used for acquiring the Chinese voice sent by the user when the voice state is voice, and storing the acquired Chinese voice to obtain the Chinese voice section to be identified.

the voice processing subunit is used for acquiring the obtained Chinese voice segment to be recognized, and performing spectrum analysis on the Chinese voice segment to be recognized to obtain an audio map corresponding to the Chinese voice segment to be recognized;

the voice screening subunit is used for determining a first peak frequency point corresponding to the Chinese voice segment to be identified at each moment based on the audio frequency map, acquiring a noise audio frequency map corresponding to a noise signal, and determining a second peak frequency point of the noise signal based on the noise audio frequency map;

the voice screening subunit is configured to compare the first peak frequency point with the second peak frequency point, screen out a target peak frequency point at which the first peak frequency point is greater than the second peak frequency point, and determine the to-be-identified Chinese voice segment corresponding to the target peak frequency point as an effective to-be-identified Chinese voice segment.

the time determining unit is used for acquiring the obtained Chinese speech segment to be recognized and processing the Chinese speech segment to be recognized to obtain a speech signal corresponding to each frame;

the time determining unit is further configured to determine time domain information of the Chinese speech segment to be recognized based on the speech signal corresponding to each frame, and match the time domain information with the speech signal corresponding to each frame;

and the sorting unit is used for determining a time sequence corresponding to the Chinese speech segment to be recognized based on the matching result and sorting the Chinese speech segment to be recognized based on the ascending sequence of the time sequence, wherein the Chinese speech segment to be recognized is at least one segment.

Preferably, the chinese speech recognition system based on deep learning includes:

the result acquiring subunit is used for acquiring the sequencing result of the Chinese speech segment to be recognized and determining the target number of the Chinese speech segment to be recognized based on the sequencing result;

the tag obtaining subunit is used for extracting the acoustic features of the Chinese speech segment to be recognized and determining the speech type of the Chinese speech segment to be recognized based on the acoustic features;

and the marking subunit is used for acquiring a target number of marking labels from a preset label database based on the voice type and marking the Chinese voice segment to be recognized based on the target number of marking labels.

Preferably, the system for recognizing chinese speech based on deep learning comprises:

the data acquisition unit is used for acquiring a voice training text and calling accents with different sound colors from a preset voice library to read the voice training text to obtain audio data of the accents with different sound colors to the voice training text;

the data processing unit is used for preprocessing the audio data, converting the audio data into a corresponding spectrogram based on a preprocessing result, and determining an effective area in the audio data based on the spectrogram;

the model construction unit is used for determining the characteristic parameters of the audio data based on the effective region, acquiring the corresponding relation between Chinese pinyin and Chinese characters, training the characteristic parameters based on the corresponding relation, and constructing a Chinese voice recognition model based on the training result;

the speech recognition unit is used for sequentially inputting the acquired Chinese speech segments to be recognized into the Chinese speech recognition model, analyzing the received Chinese speech segments to be recognized based on a preset syntax analysis tree in the Chinese speech recognition model, and determining a starting point and an ending point of each sentence in the Chinese speech segments to be recognized;

the speech recognition unit is used for performing first splitting on each Chinese speech segment to be recognized based on the starting point and the end point, obtaining a sentence set of each Chinese speech segment to be recognized based on a first splitting result, and extracting syllable attributes contained in each sentence of Chinese speech in the sentence set;

the voice recognition unit is used for carrying out second splitting on the Chinese voice of each sentence based on the syllable attribute and obtaining Chinese words contained in the Chinese voice of each sentence based on a second splitting result;

the voice recognition unit is also used for extracting pronunciation characteristics of the Chinese vocabulary, and processing the pronunciation characteristics based on the corresponding relation between the Chinese pinyin and the Chinese characters to obtain vocabulary texts corresponding to the Chinese vocabulary;

and the text splicing unit is used for splicing the vocabulary texts corresponding to the Chinese vocabularies contained in each sentence of Chinese voice to obtain the voice text corresponding to the Chinese voice segment to be recognized.

the voice recognition subunit is configured to acquire a statement set of each to-be-recognized Chinese voice segment obtained based on the first splitting result, construct an acoustic model at the same time, and perform acoustic recognition on each sentence of Chinese voice in the statement set based on the acoustic model;

the identity determining subunit is used for determining the sound characteristics corresponding to the Chinese speech of the adjacent sentence based on the acoustic recognition result and comparing the sound characteristics corresponding to the Chinese speech of the adjacent sentence;

and the result determining subunit is used for determining that the users corresponding to the Chinese voices of the adjacent sentences are the same when the comparison result determines that the sound characteristics corresponding to the Chinese voices of the adjacent sentences are consistent, and uniformly labeling the voice texts corresponding to the Chinese voices of the adjacent sentences, otherwise, determining that the users corresponding to the Chinese voices of the adjacent sentences are different, and performing distinguishing labeling on the voice texts corresponding to the Chinese voices of the adjacent sentences.

the text acquisition unit is used for acquiring the Chinese speech segment to be recognized, constructing a pronunciation change recognition model at the same time, and inputting the Chinese speech segment to be recognized into the pronunciation change recognition model for processing to obtain the intonation information of the Chinese speech segment to be recognized;

the intention determining unit is used for acquiring a voice text obtained by identifying the Chinese voice segment to be identified and combining the intonation information and the voice text to determine the target intention of the Chinese voice segment to be identified;

the semantic determining unit is used for performing semantic analysis on the voice text based on the target intention to obtain a semantic analysis result, acquiring a preset Chinese grammar checking rule and performing grammar checking on the voice text based on the semantic analysis result;

the grammar correcting unit is used for determining a target position of the abnormal voice text in the voice text when the grammar checking result judges that the voice text has wrong grammar, and determining the logic relation of the context of the abnormal voice text based on the target position;

the grammar correction unit is used for splitting the abnormal voice text at the target position to obtain N text keywords, and rearranging the N text keywords based on the logic relation and a preset Chinese grammar rule to obtain a corrected voice text;

the text verification unit is used for performing character verification on the corrected voice text based on the target intention, determining the difference characters in the voice text based on a verification result and determining the target pinyin of the difference characters;

the character replacing unit is used for mapping the target pinyin and each preset noun in a preset noun library one by one and determining a target replacing character based on a mapping result;

the character replacing unit is also used for replacing the difference characters based on the target replacing characters and obtaining a final voice recognition text based on a replacing result.

the voice recognition text acquisition unit is used for acquiring a final voice recognition text and determining the text size of the final voice recognition text;

and the capacity allocation unit is used for allocating a target storage space in a preset storage area based on the text size and storing the final voice recognition text in the target storage space.

The invention provides a Chinese speech recognition method based on deep learning, which comprises the following steps:

step 1: receiving Chinese voice segments to be recognized in real time, and sequencing the Chinese voice segments to be recognized based on a time sequence;

step 2: constructing a Chinese voice recognition model, and inputting the acquired Chinese voice segments to be recognized into the Chinese voice recognition model in sequence based on the sequencing result to perform voice recognition to obtain a voice text;

and 3, step 3: and carrying out grammar correction on the obtained voice text based on the preset Chinese grammar to obtain the final voice recognition text.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a block diagram of a deep learning based Chinese speech recognition system according to an embodiment of the present invention;

FIG. 2 is a block diagram of a speech acquisition module of a deep learning-based Chinese speech recognition system according to an embodiment of the present invention;

FIG. 3 is a flowchart of a method for Chinese speech recognition based on deep learning according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it should be understood that they are presented herein only to illustrate and explain the present invention and not to limit the present invention.

Example 1:

the embodiment provides a chinese speech recognition system based on deep learning, as shown in fig. 1, including:

In this embodiment, the chinese speech segment to be recognized refers to a received sentence set for speech recognition, and each sentence is a speech segment.

In this embodiment, the time sequence is used to represent the occurrence sequence of different chinese speech segments to be recognized, i.e. the speaking sequence of the different chinese speech segments to be recognized.

In this embodiment, sorting the to-be-recognized chinese speech segments based on the time sequence refers to sorting the acquired to-be-recognized speech segments according to the time sequence.

In this embodiment, the speech text refers to text information obtained by identifying a received chinese speech segment to be identified, that is, chinese character information corresponding to the chinese speech segment to be identified.

In this embodiment, the preset chinese grammar is set in advance, and includes defining the positions and logical relationships of the subjects and verbs.

The beneficial effects of the above technical scheme are: the Chinese speech recognition model is constructed to sequentially recognize the acquired Chinese speech segments to be recognized and correct the recognized speech text according to the Chinese grammar, so that the accuracy of Chinese speech recognition is ensured, and the effect of Chinese speech recognition is improved.

Example 2:

on the basis of embodiment 1, this embodiment provides a chinese speech recognition system based on deep learning, as shown in fig. 2, the speech acquisition module includes:

In this embodiment, the acoustic feature is used to determine whether the user currently has speaking behavior.

The beneficial effects of the above technical scheme are: by accurately judging the current speaking behavior of the user, the generated Chinese speech is acquired and stored in time when the user speaks, and convenience is brought to accurate and effective recognition of the Chinese speech of the user.

Example 3:

on the basis of embodiment 2, this embodiment provides a chinese speech recognition system based on deep learning, where the speech recording unit includes:

the voice processing subunit is used for acquiring the obtained Chinese voice segment to be recognized, and performing spectrum analysis on the Chinese voice segment to be recognized to obtain an audio spectrum corresponding to the Chinese voice segment to be recognized;

In this embodiment, the audio map refers to converting the chinese speech segment to be recognized into a corresponding audio format, so as to distinguish the effective speech signal from the noise signal in the chinese speech segment to be recognized.

In this embodiment, the first peak frequency point refers to the size of an audio value corresponding to each frame of the chinese speech segment to be recognized in the time domain.

In this embodiment, the noise audio map refers to the audio forms corresponding to various kinds of noise.

In this embodiment, the second peak frequency point refers to the audio value size corresponding to the noise.

In this embodiment, the target peak frequency point refers to a chinese speech signal in which the first peak frequency point is greater than the second peak frequency point.

In this embodiment, the valid chinese speech segment to be recognized refers to a speech signal without other interference factors obtained after removing noise signals in the chinese speech segment to be recognized.

The beneficial effects of the above technical scheme are: the obtained Chinese speech segment to be recognized is converted into the corresponding audio map, and the audio map corresponding to the noise signal is obtained at the same time, so that the noise signal in the Chinese speech segment to be recognized is eliminated through the audio map, the effectiveness of the Chinese speech segment to be recognized is guaranteed, and the recognition effect of the Chinese speech segment to be recognized is improved.

Example 4:

on the basis of embodiment 1, this embodiment provides a chinese speech recognition system based on deep learning, where the speech acquisition module includes:

In this embodiment, the time domain information refers to a time range related to the received chinese speech segment to be recognized.

In this embodiment, the time sequence is used to represent the specific time corresponding to each Chinese speech segment to be recognized

The beneficial effects of the above technical scheme are: the specific time sequence of each Chinese speech segment to be recognized is confirmed by determining the time domain information related to the Chinese speech segment to be recognized, so that the obtained Chinese speech segments to be recognized are conveniently sequenced through the specific time sequence, the recognition efficiency of the Chinese speech segments to be recognized is improved, and the recognition effect of the Chinese speech segments to be recognized is guaranteed.

Example 5:

on the basis of embodiment 4, this embodiment provides a chinese speech recognition system based on deep learning, and the ranking unit includes:

the result acquiring subunit is used for acquiring the sorting result of the Chinese speech segment to be recognized and determining the target number of the Chinese speech segment to be recognized based on the sorting result;

In this embodiment, the target number is a specific number used to characterize the acquired chinese speech segment to be recognized.

In this embodiment, the acoustic features refer to the sound characteristics of the chinese speech segment to be recognized, including the sound color and intonation.

In this embodiment, the preset tag database is set in advance and is used for storing the tag tags corresponding to different voice types.

In this embodiment, the markup tags refer to markup symbols that can be used to distinguish different chinese speech segments to be recognized, and the markup tags can quickly distinguish the different chinese speech segments to be recognized and facilitate determining the speech type of the chinese speech segments to be recognized.

The beneficial effects of the above technical scheme are: the acoustic characteristics of the Chinese speech segment to be recognized are determined, and the speech type of the speech segment to be recognized is accurately and effectively judged according to the acoustic characteristics, so that the fact that different Chinese speech segments to be recognized are marked by selecting proper marking labels according to the speech type is facilitated, the orderliness of recognition of the Chinese speech segment to be recognized is guaranteed, and meanwhile, the recognition efficiency and the accuracy are facilitated to be improved.

Example 6:

on the basis of embodiment 1, this embodiment provides a chinese speech recognition system based on deep learning, where the speech recognition module includes:

and the text splicing unit is used for splicing the vocabulary text corresponding to the Chinese vocabulary contained in each sentence of Chinese speech to obtain the speech text corresponding to the Chinese speech segment to be recognized.

In the embodiment, the preset voice library is used for storing accents with different sound colors, so that the Chinese voice recognition model can be accurately and effectively trained.

In this embodiment, the speech training text is set in advance, and the text information corresponding to the speech is known.

In this embodiment, the preprocessing may be denoising or the like processing on the audio data.

In this embodiment, the spectrogram refers to a spectral analysis view, the abscissa of which is time, the ordinate of which is frequency, and the coordinate point value of which is voice data energy.

In this embodiment, the valid region refers to filtering the acquired audio data to extract a speech segment having key characterization information therein.

In this embodiment, the characteristic parameter refers to a value of the audio data and a fluctuation range corresponding to the intonation.

In the embodiment, the preset syntax analysis tree is set in advance and is used for identifying the acquired Chinese speech segment to be identified according to the Chinese grammar, so that the accuracy and efficiency of identification are improved conveniently.

In this embodiment, the start point and the fructification point are used to characterize the beginning and end of each sentence.

In this embodiment, the first splitting refers to splitting the chinese speech segment to be recognized into a plurality of sentences in units of sentences.

In this embodiment, the sentence set refers to a set obtained by splitting a Chinese speech segment to be recognized into a plurality of sentences.

In this embodiment, the syllable attribute refers to monosyllable and bisyllable of the vocabulary contained in each sentence.

In this embodiment, the second splitting refers to splitting each sentence of the chinese speech into a plurality of chinese vocabularies in units of vocabularies.

In this embodiment, the pronunciation characteristics refer to pronunciation characteristics of each Chinese vocabulary.

In this embodiment, the vocabulary text refers to the Chinese characters corresponding to the vocabulary speech in each sentence of Chinese speech.

In this embodiment, analyzing the received chinese speech segment to be recognized based on the preset parse tree in the chinese speech recognition model includes:

acquiring an obtained Chinese voice segment to be recognized, converting the Chinese voice segment to be recognized into a corresponding feature vector, and determining a feature sequence corresponding to the Chinese voice segment to be recognized based on the feature vector;

calculating the word sequence recognized by the Chinese speech recognition model to the Chinese speech segment to be recognized based on the characteristic sequence, and calculating the recognition accuracy of the Chinese speech segment to be recognized based on the word sequence, wherein the method specifically comprises the following steps:

calculating the recognized word sequence of the Chinese speech segment to be recognized according to the following formula:

M＝argmax[log ₂ P(α|m)+η*log ₂ P(m)]；

wherein, M represents the recognized word sequence of the Chinese speech segment to be recognized; p (α | m) represents an acoustic model, and represents the probability that the output acoustic feature is the feature sequence α under the condition that the preset word sequence is m, and the value range is (0,1); p (m) represents a language model, represents the probability value of the preset word sequence m in the feature sequence, and the value range is (0,1); eta represents an adjustable parameter and has a value range of (0,1); argmax [. Cndot.) represents a function for solving a set of functions, and particularly represents a maximum vocabulary set obtained when the acoustic model and the language model meet the condition for recognizing the Chinese speech segment to be recognized;

determining the total number K of recognized words of the Chinese speech segment to be recognized based on the word sequence M;

calculating the recognition accuracy of the Chinese speech segment to be recognized according to the following formula:

wherein,

the recognition accuracy of the Chinese speech segment to be recognized is shown, and the value range is (0,1); omega represents an error factor, and the value range is (0.02,0.05); k represents the recognition of the Chinese speech segment to be recognizedThe total number K of the words; delta represents the number of wrongly recognized words of the Chinese speech segment to be recognized; sigma represents the number of the vocabularies of the missed recognition of the Chinese speech segment to be recognized;

comparing the calculated identification accuracy with a preset accuracy;

if the recognition accuracy is greater than or equal to the preset accuracy, judging that the Chinese speech segment to be recognized is qualified for recognition;

otherwise, judging that the Chinese speech segment to be recognized is unqualified in recognition, and performing speech recognition on the Chinese speech segment to be recognized again until the recognition accuracy is greater than or equal to the preset accuracy.

The feature vector refers to a statement vector about the Chinese speech segment to be recognized, which is obtained after vectorization processing is performed on the Chinese speech segment to be recognized.

The feature sequence refers to a vocabulary sequence corresponding to each feature vector.

The preset accuracy is set in advance and is used for judging whether the recognition accuracy of the Chinese speech segment to be recognized meets the preset requirement or not.

The beneficial effects of the above technical scheme are: the voice training text is obtained and read through the accents with different sound colors, so that the accents with different sound colors are effectively obtained, the audio data corresponding to the voice training text are processed and trained, the Chinese voice recognition model is accurately and reliably constructed, the obtained Chinese voice section to be recognized is split and recognized through the Chinese voice recognition model, the recognition accuracy and recognition effect of the Chinese voice section to be recognized are guaranteed, and the obtained voice text is accurate and effective.

Example 7:

on the basis of embodiment 6, this embodiment provides a chinese speech recognition system based on deep learning, and the speech recognition unit includes:

and the result determining subunit is used for determining that the users corresponding to the Chinese voices of the adjacent sentences are the same and uniformly labeling the voice texts corresponding to the Chinese voices of the adjacent sentences when the comparison result determines that the sound characteristics corresponding to the Chinese voices of the adjacent sentences are consistent, or else, determining that the users corresponding to the Chinese voices of the adjacent sentences are different and distinguishing and labeling the voice texts corresponding to the Chinese voices of the adjacent sentences.

In this embodiment, the acoustic model is used to analyze the sound characteristics of the Chinese speech segment to be recognized, including the sound color and tone.

In this embodiment, the sound feature may be the thickness of the sound corresponding to the chinese speech of the adjacent sentence, or the like.

In this embodiment, the unified labeling refers to labeling the chinese speeches of adjacent sentences as the same user, so as to facilitate distinguishing the obtained speech texts.

In this embodiment, the distinguishing and labeling means that the chinese speeches of adjacent sentences are labeled as different users, so as to facilitate distinguishing the obtained speech texts.

The beneficial effects of the above technical scheme are: by constructing the acoustic model and performing acoustic feature recognition on the Chinese speech of the adjacent sentence in the Chinese speech segment to be recognized through the acoustic model, accurate judgment is conveniently performed on users corresponding to different sentences in the Chinese speech segment to be recognized, so that the speech text obtained by recognition is conveniently and orderly managed, and the recognition effect of the Chinese speech segment to be recognized is improved.

Example 8:

on the basis of embodiment 1, this embodiment provides a chinese speech recognition system based on deep learning, where the modification module includes:

the text acquisition unit is used for acquiring the Chinese speech segment to be recognized, constructing a pronunciation change recognition model, inputting the Chinese speech segment to be recognized into the pronunciation change recognition model and processing the Chinese speech segment to be recognized to obtain intonation information of the Chinese speech segment to be recognized;

In this embodiment, the pronunciation change recognition model is used for recognizing pronunciation change of the Chinese speech segment to be recognized.

In this embodiment, the intonation information refers to the intonation change condition corresponding to the Chinese speech segment to be recognized, so as to facilitate the determination of the speech intention of the user.

In this embodiment, the target intent refers to an expression purpose corresponding to the Chinese speech segment to be recognized.

In this embodiment, the semantic analysis refers to analyzing the acquired voice text to determine the meaning of the voice text expression.

In this embodiment, the preset chinese grammar check rule is set in advance, and is used to check the grammar of the chinese speech segment to be recognized.

In this embodiment, the abnormal speech text refers to text information corresponding to an incorrect grammar existing in the acquired speech text.

In this embodiment, the target position refers to a position condition of the abnormal speech text in the obtained speech text.

In this embodiment, the text keyword refers to a Chinese vocabulary contained in a sentence in which the abnormal speech text is located.

In this embodiment, the preset chinese grammar rule is set in advance.

In this embodiment, word checking refers to checking words in the obtained speech text so as to determine whether an error word exists therein.

In this embodiment, the difference word refers to an error word existing in the obtained speech text.

In this embodiment, the target pinyin refers to the pronunciation corresponding to the difference characters.

In this embodiment, the preset noun library is set in advance and is used for storing different characters.

In this embodiment, the target replacement word refers to a Chinese character that is homophonic with the difference word but has a non-uniform font.

The beneficial effects of the above technical scheme are: by carrying out grammar check on the obtained voice text, the error grammar in the voice text is corrected in time when grammar errors exist, and then the Chinese character form in the voice text is checked after grammar correction, so that the accuracy of the finally obtained voice recognition text is ensured, and the recognition effect of the Chinese voice segment to be recognized is ensured.

Example 9:

on the basis of embodiment 1, this embodiment provides a chinese speech recognition system based on deep learning, where the correction module includes:

In this embodiment, the preset storage area is set in advance and includes storage spaces of different sizes.

In this embodiment, the target storage space refers to a storage area for storing the acquired final speech recognition text.

The beneficial effects of the above technical scheme are: the file size of the finally obtained speech recognition text is determined, so that the corresponding storage space is conveniently distributed for the speech recognition text, the speech recognition text is stored, the storage effect of the recognition result of the Chinese speech segment to be recognized is improved, and the recognition effect of the Chinese speech segment to be recognized is guaranteed.

Example 10:

the embodiment provides a method for recognizing Chinese speech based on deep learning, as shown in fig. 3, including:

and 2, step: constructing a Chinese voice recognition model, and inputting the acquired Chinese voice segments to be recognized into the Chinese voice recognition model in sequence based on the sequencing result to perform voice recognition to obtain a voice text;

and step 3: and carrying out grammar correction on the obtained voice text based on the preset Chinese grammar to obtain the final voice recognition text.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A Chinese speech recognition system based on deep learning, comprising:

2. The system of claim 1, wherein the speech acquisition module comprises:

3. The system of claim 2, wherein the phonetic listing unit comprises:

the voice screening subunit is used for determining a first peak frequency point corresponding to the Chinese voice segment to be identified at each moment based on the audio map, acquiring a noise audio map corresponding to a noise signal, and determining a second peak frequency point of the noise signal based on the noise audio map;

4. The system of claim 1, wherein the speech acquisition module comprises:

5. The deep learning based Chinese speech recognition system of claim 4, wherein the ranking unit comprises:

6. The system of claim 1, wherein the speech recognition module comprises:

the voice recognition unit is used for performing first splitting on each Chinese voice segment to be recognized based on the starting point and the end point, obtaining a sentence set of each Chinese voice segment to be recognized based on a first splitting result, and extracting syllable attributes contained in each sentence of Chinese voice in the sentence set;

the voice recognition unit is used for carrying out second splitting on each sentence of Chinese voice based on the syllable attribute and obtaining Chinese vocabulary contained in each sentence of Chinese voice based on a second splitting result;

7. The system of claim 6, wherein the speech recognition unit comprises:

8. The system of claim 1, wherein the modification module comprises:

and the character replacing unit is also used for replacing the difference characters based on the target replacing characters and obtaining a final voice recognition text based on a replacing result.

9. The system of claim 1, wherein the modification module comprises:

10. A Chinese speech recognition method based on deep learning is characterized by comprising the following steps: