CN109086455B

CN109086455B - Method for constructing voice recognition library and learning equipment

Info

Publication number: CN109086455B
Application number: CN201811002956.XA
Authority: CN
Inventors: 徐杨
Original assignee: Guangdong Genius Technology Co Ltd
Current assignee: Guangdong Genius Technology Co Ltd
Priority date: 2018-08-30
Filing date: 2018-08-30
Publication date: 2021-03-12
Anticipated expiration: 2038-08-30
Also published as: CN109086455A

Abstract

The invention relates to the technical field of electronic equipment, and discloses a method for constructing a voice recognition library and learning equipment, which comprise the following steps: identifying the acquired voice information matched with the identity information of the learning equipment user to acquire a plurality of words; calculating the use frequency of each word, and determining the words with the use frequency higher than the preset frequency as common words; and storing the common words and the related information corresponding to the common words in a speech recognition library of the user in a correlated manner. By implementing the embodiment of the invention, the pronunciation and the meaning of the common words can be stored in the voice recognition library exclusive to the user, so that the subsequent learning equipment can recognize the voice information according to the meaning of the common words in the voice recognition library when recognizing the voice information input by the user, the difficulty of recognizing the voice information by the learning equipment is reduced, the accuracy of voice recognition by the learning equipment can be improved, and the efficiency of voice recognition by the learning equipment is improved.

Description

Method for constructing voice recognition library and learning equipment

Technical Field

The invention relates to the technical field of electronic equipment, in particular to a method for constructing a voice recognition library and learning equipment.

Background

Along with the rapid development of learning equipment such as family education machine, study computer, the low age user of learning equipment is also more and more, because low age user's hands-on ability is relatively weak, consequently low age user controls learning equipment through the mode of pronunciation usually. The learning device may acquire a voice input by a user of a low age group, and perform an operation corresponding to the voice content by recognizing the voice content. However, in practice, it is found that different users have different language habits, and when recognizing voice information input by a user, a conventional learning device cannot recognize the voice information according to the language habits specific to the user, so that the efficiency of voice recognition of the learning device is low.

Disclosure of Invention

The embodiment of the invention discloses a method for constructing a voice recognition library and learning equipment, which can improve the voice recognition efficiency of the learning equipment.

The first aspect of the embodiment of the invention discloses a method for constructing a speech recognition library, which comprises the following steps:

acquiring a plurality of pieces of pre-stored voice information matched with identity information of a learning equipment user;

identifying the plurality of pre-stored voice information, and acquiring words contained in the plurality of pre-stored voice information;

calculating the use frequency of each word, determining a target word with the use frequency higher than a preset frequency from the words, and determining the target word as a common word;

identifying related information corresponding to the common words, and storing the common words and the related information corresponding to the common words in a voice recognition library matched with the identity information of the user in an associated manner, wherein one common word corresponds to one related information, and the related information at least comprises the meanings of the common words and the pronunciation of the common words.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the acquiring a plurality of pieces of pre-stored voice information that matches with identity information of a user includes:

when a target instruction for constructing the voice recognition library is detected, acquiring identity information of a learning equipment user;

determining a voice key factor of the user from the identity information;

and acquiring a plurality of pieces of pre-stored voice information matched with the voice key factors from a database.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, before the obtaining, when the target instruction for constructing the speech recognition library is detected, the identity information of the user of the learning device, the method further includes:

when a microphone of the learning device receives target voice input by a user, judging whether the identity information of the user comprises a voice key factor;

if not, identifying the voiceprint of the target voice through a voiceprint identification technology;

extracting a plurality of voiceprint nodes from the voiceprint;

and generating the voice key factor of the target voice by calculating the voiceprint nodes, and storing the voice key factor into the identity information of the user.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the calculating a usage frequency of each word, determining a target word with the usage frequency greater than a preset frequency from the words, and determining the target word as a common word includes:

calculating the number of times of use of each word by taking the pre-stored voice information as a basis;

synthesizing the used times of each word, and calculating the total used times of all the words;

calculating the use frequency corresponding to each word according to the used times and the total used times of each word, wherein one word corresponds to one use frequency;

and determining the target words with the use frequency higher than the preset frequency from the words and determining the target words as common words.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, after the identifying related information corresponding to the common terms, and storing the common terms and the related information corresponding to the common terms in association with the speech recognition library matched with the identity information of the user, the method further includes:

when the current voice input by the user is detected, detecting whether a target word voice matched with the pronunciation of any one common word in the voice recognition library exists in the current voice;

if yes, acquiring target related information of the common words corresponding to the target word voice from the voice recognition library;

and performing semantic recognition on the current voice according to the target related information to obtain the semantic corresponding to the current voice.

A second aspect of the embodiments of the present invention discloses a learning apparatus, including:

the first acquisition unit is used for acquiring a plurality of pieces of pre-stored voice information matched with the identity information of the learning equipment user;

the first identification unit is used for identifying the plurality of pre-stored voice information and acquiring words contained in the plurality of pre-stored voice information;

the calculation unit is used for calculating the use frequency of each word, determining a target word with the use frequency higher than a preset frequency from the words and determining the target word as a common word;

the storage unit is used for identifying related information corresponding to the common words and storing the common words and the related information corresponding to the common words in the voice recognition library matched with the identity information of the user in an associated manner, wherein one common word corresponds to one related information, and the related information at least comprises the meanings of the common words and the pronunciation of the common words.

As an optional implementation manner, in a second aspect of the embodiment of the present invention, the first obtaining unit includes:

the first acquisition subunit is used for acquiring the identity information of the learning equipment user when a target instruction for constructing the voice recognition library is detected;

a first determining subunit, configured to determine a speech key factor of the user from the identity information;

and the second acquisition subunit is used for acquiring a plurality of pieces of pre-stored voice information matched with the voice key factors from the database.

As an alternative implementation, in the second aspect of the embodiment of the present invention, the learning apparatus further includes:

the judging unit is used for judging whether the identity information of the user comprises a voice key factor or not before the first acquiring subunit acquires the identity information of the user of the learning equipment when detecting a target instruction for constructing the voice recognition library and when a microphone of the learning equipment receives a target voice input by the user;

the second identification unit is used for identifying the voiceprint of the target voice through a voiceprint identification technology when the judgment result of the judgment unit is negative;

the extracting unit is used for extracting a plurality of voiceprint nodes from the voiceprints;

and the generating unit is used for generating the voice key factor of the target voice by calculating the voiceprint nodes and storing the voice key factor into the identity information of the user.

As an optional implementation manner, in the second aspect of the embodiment of the present invention, the calculation unit includes:

the first calculating subunit is used for calculating the number of times of use of each word according to the pre-stored voice information;

the first calculating subunit is further configured to synthesize the used times of each word, and calculate a total used time of all the words;

a second calculating subunit, configured to calculate, according to the number of times of use of each word and the total number of times of use, a use frequency corresponding to each word, where one word corresponds to one use frequency;

and the second determining subunit is used for determining the target words with the use frequency higher than the preset frequency from the words and determining the target words as common words.

the detection unit is used for identifying the related information corresponding to the common words in the storage unit, storing the common words and the related information corresponding to the common words in the voice recognition library matched with the identity information of the user in a correlation manner, and detecting whether target word voice matched with the pronunciation of any one common word in the voice recognition library exists in the current voice when the current voice input by the user is detected;

a second obtaining unit, configured to obtain, from the speech recognition library, target related information of the common term corresponding to the target term speech when a result detected by the detecting unit is yes;

and the third identification unit is used for carrying out semantic identification on the current voice according to the target related information to obtain the semantics corresponding to the current voice.

A third aspect of an embodiment of the present invention discloses an electronic device, including:

a memory storing executable program code;

a processor coupled with the memory;

the processor calls the executable program code stored in the memory to perform part or all of the steps of any one of the methods of the first aspect.

A fourth aspect of the present embodiments discloses a computer-readable storage medium storing a program code, where the program code includes instructions for performing part or all of the steps of any one of the methods of the first aspect.

A fifth aspect of embodiments of the present invention discloses a computer program product, which, when run on a computer, causes the computer to perform some or all of the steps of any one of the methods of the first aspect.

A sixth aspect of the present embodiment discloses an application publishing platform, where the application publishing platform is configured to publish a computer program product, where the computer program product is configured to, when running on a computer, cause the computer to perform part or all of the steps of any one of the methods in the first aspect.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, a plurality of pieces of pre-stored voice information matched with the identity information of a learning equipment user are obtained; identifying a plurality of pre-stored voice messages, and acquiring words contained in the pre-stored voice messages; calculating the use frequency of each word, and determining the words with the use frequency higher than the preset frequency as common words; and identifying related information corresponding to the common words, and storing the common words and the related information corresponding to the common words in a voice recognition library matched with the identity information of the user in an associated manner, wherein the related information at least comprises the meanings and the pronunciations of the common words. Therefore, by implementing the embodiment of the invention, the commonly used words frequently used by the user can be determined from the pre-stored voice information of the user, and the pronunciation and the meaning of the commonly used words are stored in the voice recognition library exclusive to the user, so that the subsequent learning equipment can recognize the voice information according to the meaning of the commonly used words in the voice recognition library when recognizing the voice information input by the user, the difficulty of recognizing the voice information by the learning equipment is reduced, the accuracy of the voice recognition by the learning equipment can be improved, and the voice recognition efficiency of the learning equipment is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart illustrating a method for constructing a speech recognition library according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating another method for constructing a speech recognition library according to an embodiment of the present invention;

FIG. 3 is a flow chart illustrating another method for constructing a speech recognition library according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a learning device according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of another learning device disclosed in the embodiment of the present invention;

FIG. 6 is a schematic structural diagram of another learning device disclosed in the embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the embodiments and drawings of the present invention are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

The embodiment of the invention discloses a method for constructing a voice recognition library and learning equipment, which can improve the voice recognition efficiency of the learning equipment. The following are detailed below.

Example one

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for constructing a speech recognition library according to an embodiment of the present invention. As shown in fig. 1, the method for constructing the speech recognition library may include the following steps:

101. the learning device obtains a plurality of pieces of pre-stored voice information matched with the identity information of the learning device user.

In the embodiment of the present invention, the learning device may be an electronic device such as a learning tablet, a family education machine, a notebook computer, and the like, which is not limited in the embodiment of the present invention. The relationship between the learning device and the user may be that one learning device corresponds to one user, or that one learning device corresponds to multiple users, which is not limited in the embodiments of the present invention. The identity information of the user may be account information of the user using the learning device (e.g., a unique account code of the user using the learning device, an identity card number of the user, etc.), or may also be physiological characteristic information of the user (e.g., fingerprint information of the user, face information of the user, or a voice key factor of the user, etc.), which is not limited in the embodiment of the present invention. The learning device may acquire all speech information input by the user through the microphone of the learning device and pre-store the all speech information into a database or memory of the learning device.

As an alternative embodiment, before the learning device performs step 101, the following steps may also be performed:

when detecting that the learning equipment is triggered by any instruction, the learning equipment acquires identity information of a current user;

the learning equipment judges whether a voice recognition library matched with the identity information of the user is stored or not;

if a voice recognition library matched with the identity information of the user is stored, the learning equipment acquires the current date and the construction date of the voice recognition library;

the learning equipment calculates the construction duration of the voice recognition library according to the current date and the construction date;

the learning equipment judges whether the construction time length is greater than a preset time length;

and if the preset time length is longer than the preset time length, the learning equipment acquires a plurality of pieces of pre-stored voice information matched with the identity information of the user of the learning equipment.

The implementation of the implementation mode can update the voice recognition library at preset time intervals, and the voice recognition of the learning device can be more accurate by updating the voice recognition library on time because the language habit of the user changes at any time.

102. The learning device identifies a plurality of pieces of pre-stored voice information and obtains words contained in the pre-stored voice information.

In the embodiment of the present invention, the pre-stored voice information may be a word, a sentence, or a section of speech, and there may be a relationship between inclusion and inclusion between words included in the obtained pre-stored voice information. When the pre-stored voice information is a word, the learning device may determine the word as a word included in the pre-stored voice information; when the pre-stored voice message is a sentence, the learning device may recognize all words contained in the sentence, and determine all words as words contained in the pre-stored voice message, for example, the pre-stored voice message may be "call for a small step", and the words contained in the pre-stored voice message may include: "give", "step", "make" and "call", since there may be a containing and contained relationship between the words contained in the obtained pre-stored voice information, the words contained in the pre-stored voice information may also include "make call", and in summary, the words contained in the pre-stored voice information may be: "give", "step", "make", "call", and "make phone"; when the pre-stored voice information is a piece of speech, the learning device may recognize all words contained in the piece of speech and determine all words as words contained in the pre-stored voice information.

As an alternative embodiment, the learning device may recognize a plurality of pieces of pre-stored voice information, and the manner of obtaining words included in the plurality of pieces of pre-stored voice information may include the following steps:

the learning equipment performs voice recognition on a plurality of pre-stored voice messages to obtain text information corresponding to each pre-stored voice message, wherein one pre-stored voice message corresponds to one text information;

the learning equipment carries out semantic analysis on each character information and divides each character information into a plurality of target words;

the learning equipment integrates a plurality of target words corresponding to each character information to generate a target word library;

the learning equipment combines the same target words in the target word library, and determines the target words in the combined target word library as words contained in a plurality of pre-stored voice messages.

The implementation method can convert the voice information into the text information, further identifies the text information, analyzes the text information to obtain a plurality of words contained in the text information, and obtains a plurality of words contained in the text information, namely words contained in a plurality of voice information.

103. The learning equipment calculates the use frequency of each word, determines a target word with the use frequency higher than the preset frequency from the words and determines the target word as a common word.

In the embodiment of the present invention, the usage frequency may be determined by calculating a ratio of the number of times each word appears in all the pre-stored voice information to the total number of times all the words appear in all the pre-stored voice information. The preset frequency may be preset for the learning device, or may be set by the user of the learning device, which is not limited in the embodiments of the present invention.

104. The learning equipment identifies the related information corresponding to the common words and stores the common words and the related information corresponding to the common words in a voice identification library matched with the identity information of the user in a related mode, wherein one common word corresponds to one related information, and the related information at least comprises the meanings of the common words and the pronunciation of the common words.

In the embodiment of the present invention, the related information corresponding to the common word may further include information such as a usage time period in which the usage frequency of the common word is high, and thus, the embodiment of the present invention is not limited. The identity information of each user can be matched with only one voice recognition library, and the matching mode can be more accurate in word summarization commonly used by the user, so that the situation of repeated storage or omission is avoided. The speech recognition library may be stored in a memory of the learning device, or may be stored in a server (such as a cloud server) that is connected to the learning device in advance, which is not limited in the embodiment of the present invention.

In the method described in fig. 1, the speech information can be recognized according to the meaning of the common words in the speech recognition library, so that the difficulty of recognizing the speech information by the learning device is reduced, and the accuracy of speech recognition by the learning device can be improved, thereby improving the efficiency of speech recognition by the learning device. And the speech recognition of the learning equipment can be more accurate by updating the speech recognition library on time. In addition, words contained in the recognized voice information can be more accurately recognized.

Example two

Referring to fig. 2, fig. 2 is a flow chart illustrating another method for constructing a speech recognition library according to an embodiment of the present invention. As shown in fig. 2, the method for constructing the speech recognition library may include the following steps:

201. when a microphone of the learning device receives target voice input by a user, the learning device judges whether the identity information of the user comprises a voice key factor, if so, the process is ended; if not, step 202 to step 210 are executed.

In the embodiment of the present invention, when the learning device receives the target voice input by the user through the microphone, if the target voice is initially received by the learning device, the identity information corresponding to the user in the learning device does not include the voice key factor, and therefore, the learning device needs to identify the target voice input by the user to obtain the voice key factor of the user.

202. The learning device recognizes the voiceprint of the target voice through a voiceprint recognition technology.

In the embodiment of the invention, the Voiceprint (Voiceprint) is a sound wave frequency spectrum carrying language information, and the Voiceprint not only has specificity, but also has the characteristic of relative stability, so that the identity information of a user can be determined through the voice key factor in the Voiceprint of the user. The learning device can extract the voice characteristics in the target voice by utilizing the voiceprint recognition technology, and recognizes the voiceprint in the target voice according to the voice characteristics.

203. The learning device extracts a number of voiceprint nodes from the voiceprint.

In the embodiment of the present invention, the voiceprint node may be a node capable of obviously expressing the characteristics of the user voiceprint, and the number of the voiceprint nodes included in the user voiceprint is not limited.

204. The learning equipment calculates the voiceprint nodes to generate a voice key factor of the target voice, and stores the voice key factor into the identity information of the user.

In the embodiment of the invention, the learning device can comprehensively analyze a plurality of voiceprint nodes, so that the learning device can analyze the voiceprint nodes to obtain the specific voice key factor of the user. The learning device may obtain, through the voice key factor, a plurality of pieces of pre-stored voice information corresponding to the user, and may further perform an operation related to the user-specific identity information according to the voice key factor, which is not limited in the embodiment of the present invention.

In the embodiment of the present invention, by implementing the steps 201 to 204, when the user uses the speech recognition technology of the learning device for the first time, the speech of the user can be analyzed, and the speech key factor of the user is obtained and stored, so that the learning device can quickly determine the information matched with the user according to the speech key factor of the user.

205. When a target instruction for constructing a speech recognition library is detected, the learning device acquires identity information of a user of the learning device.

In the embodiment of the present invention, the manner of triggering the target instruction for constructing the speech recognition library may be automatically triggered by the learning device (for example, the learning device may trigger the target instruction for constructing the speech recognition library once every predetermined time), or may be actively triggered by the user of the learning device (for example, the user may trigger the target instruction for constructing the speech recognition library by triggering a case on the learning device corresponding to the target instruction for constructing the speech recognition library). When the learning device automatically triggers a target instruction for constructing a voice recognition library, the learning device can acquire the identity information of a user who logs in the learning device at present; when the learning device triggers a target instruction for constructing a voice recognition library through a user of the learning device, the learning device can acquire biological characteristic information of the user currently triggering the target instruction, such as fingerprints, voice key factors, irises or face images, and the like, and acquire identity information of the current user.

206. The learning device determines the speech key factor of the user from the identity information.

In the embodiment of the present invention, the identity information may include a plurality of contents, such as a name, an age, a sex, an identification number, a voice key factor, and the like of the user, and if the learning device stores the voice key factor of the user in advance, the learning device may obtain the voice key factor of the user through the identity information of the user.

207. The learning device acquires a plurality of pieces of pre-stored voice information matched with the voice key factors from the database.

In the embodiment of the invention, when the voice information is prestored in the learning device, the identity information of the user can be marked, so that the learning device can acquire the voice information matched with the identity information of the user through any content in the identity information of the user.

In the embodiment of the present invention, by implementing the above steps 205 to 207, a plurality of pieces of pre-stored voice information matched with the voice key factor of the user can be obtained, and because the voice key factor has uniqueness, the voice information corresponding to the user can be accurately obtained through the voice key factor.

208. The learning device identifies a plurality of pieces of pre-stored voice information and obtains words contained in the pre-stored voice information.

209. The learning equipment calculates the use frequency of each word, determines a target word with the use frequency higher than the preset frequency from the words and determines the target word as a common word.

210. The learning equipment identifies the related information corresponding to the common words and stores the common words and the related information corresponding to the common words in a voice identification library matched with the identity information of the user in a related mode, wherein one common word corresponds to one related information, and the related information at least comprises the meanings of the common words and the pronunciation of the common words.

As an alternative implementation, the way that the learning device identifies the related information corresponding to the common terms and stores the common terms and the related information corresponding to the common terms in association with each other in the speech recognition library matched with the identity information of the user may include the following steps:

the learning equipment identifies a plurality of pronunciations of the common words from a plurality of pieces of pre-stored voice information and identifies a plurality of meanings of the common words corresponding to the pre-stored voice information;

the learning device determines a target pronunciation with the most use times from the plurality of pronunciations and determines a target meaning with the most use times from the plurality of meanings;

the learning equipment combines the target pronunciation and the target meaning to generate related information corresponding to the common words;

and the learning equipment associates the common words with the related information to generate a common word information set, and stores the common word information set into a voice recognition library matched with the identity information of the user.

In the implementation of this embodiment, because the same common word is used by the user in different situations, the situation that the common word corresponds to different pronunciations and different semantics may occur, so the learning device may combine the pronunciations and the semantics, which are used by the common word most, to generate the related information of the common word, and store the related information in the speech recognition library specific to the user in association with the common word, so that the subsequent learning device may determine the meaning of the common word more quickly when recognizing the speech information of the user.

In the method described in fig. 2, the speech information can be recognized according to the meaning of the common words in the speech recognition library, so that the difficulty of recognizing the speech information by the learning device is reduced, and the accuracy of speech recognition by the learning device can be improved, thereby improving the efficiency of speech recognition by the learning device. The identity information of the user can be determined through the voice key factor in the voiceprint of the user, and the accuracy of determining the identity information of the user is improved. In addition, target instructions for constructing the speech recognition library can be triggered in various ways, so that the construction of the speech recognition library is more flexible.

EXAMPLE III

Referring to fig. 3, fig. 3 is a flow chart illustrating another method for constructing a speech recognition library according to an embodiment of the present invention. As shown in fig. 3, the method for constructing the speech recognition library may include the following steps:

301. the learning device obtains a plurality of pieces of pre-stored voice information matched with the identity information of the learning device user.

302. The learning device identifies a plurality of pieces of pre-stored voice information and obtains words contained in the pre-stored voice information.

303. The learning device calculates the number of times of use of each word based on a plurality of pre-stored voice messages.

In the embodiment of the present invention, from a plurality of words recognized from a plurality of pieces of pre-stored voice information, each of the plurality of words may appear at least once in the plurality of pieces of pre-stored voice information, the learning device may count the number of times each word appears in the plurality of pieces of pre-stored voice information, and may confirm the number of times each word appears as the total number of times each word is used.

304. The learning equipment integrates the used times of each word and calculates the total used times of all the words.

In the embodiment of the present invention, the total number of times of use of each word is added to obtain the total number of times of use, and the total number of times of use may be the total number of times of use of all words appearing in a plurality of pre-stored voice messages.

305. And the learning equipment calculates and obtains the use frequency corresponding to each word according to the use times and the total use times of each word, wherein one word corresponds to one use frequency.

In the embodiment of the invention, the learning device can acquire the number of times of being used of any word, calculate the ratio of the number of times of being used to the total number of times of being used, and determine the ratio as the use frequency of the word corresponding to the number of times of being used; the learning device can calculate each word to obtain the use frequency corresponding to each word.

306. The learning equipment determines target words with the use frequency larger than the preset frequency from the words and determines the target words as common words.

In the embodiment of the present invention, by implementing the above steps 303 to 306, the frequently used word of the user is determined by calculating the ratio of the number of times of any word input by the user voice to the total number of times of all words input by the user, so that the frequently used word of the user generated by the learning device is truly related to the language habit of the user, and further, the subsequent learning device can identify the voice information of the user more accurately.

307. The learning equipment identifies the related information corresponding to the common words and stores the common words and the related information corresponding to the common words in a voice identification library matched with the identity information of the user in a related mode, wherein one common word corresponds to one related information, and the related information at least comprises the meanings of the common words and the pronunciation of the common words.

308. When the current voice input by the user is detected, the learning equipment detects whether a target word voice matched with the pronunciation of any one commonly used word in the voice recognition library exists in the current voice, and if so, the step 309-step 310 are executed; if not, the flow is ended.

In the embodiment of the invention, after acquiring the current voice input by the user, the learning device can compare the current voice with the pronunciations of all the common words in the voice recognition library, and if the pronunciation of any one common word is matched with one part of the voice in the current voice, the learning device can determine the part of the voice as the target word voice and consider the word corresponding to the target word voice as the common word in the voice recognition library, so that the learning device can extract the target word voice from the current voice and search the common word matched with the target word voice from the voice recognition library.

309. The learning equipment acquires target related information of common words corresponding to target word voices from a voice recognition library.

In the embodiment of the invention, one or more pronunciations matched with any part of the current voices can exist in the voice recognition library, so that the learning equipment can extract a plurality of target word voices from the current voices, the learning equipment can acquire target related information of common words corresponding to the target word voices from the voice recognition library, and one target word voice corresponds to one target related information.

310. And the learning equipment carries out semantic recognition on the current voice according to the target related information to obtain the semantic corresponding to the current voice.

In the embodiment of the invention, the related information of the common words in the voice recognition library can comprise the meanings of the common words, the learning device can acquire the meanings of the common words contained in the target related information according to the target related information acquired from the voice recognition library, and can assist in recognizing the semantics corresponding to the current voice according to the meanings of the common words, so that the accuracy of the semantic recognition of the current voice is improved.

In the embodiment of the present invention, by implementing the above steps 308 to 310, the common words included in the voice information can be obtained from the voice recognition library through the voice information currently input by the user, and the voice information of the user can be recognized in an auxiliary manner according to information such as the meaning of the common words, so that the learning device is not required to perform semantic recognition on each word in the voice information, the efficiency of recognizing the voice information input by the user by the learning device is improved, and the voice recognition of the learning device is more accurate.

In the method described in fig. 3, the speech information can be recognized according to the meaning of the common words in the speech recognition library, so that the difficulty of recognizing the speech information by the learning device is reduced, and the accuracy of speech recognition by the learning device can be improved, thereby improving the efficiency of speech recognition by the learning device. The used times of each word and the total used times of all the words can be accurately obtained, so that the calculation result of the use frequency of each word is more accurate. In addition, the common words of the user generated by the learning equipment can be really related to the language habit of the user, and therefore the subsequent learning equipment can identify the voice information of the user more accurately.

Example four

Referring to fig. 4, fig. 4 is a schematic structural diagram of a learning device according to an embodiment of the present invention. As shown in fig. 4, the learning apparatus may include:

a first obtaining unit 401, configured to obtain a plurality of pieces of pre-stored voice information that match the identity information of the learning device user.

As an optional implementation manner, the first obtaining unit 401 may further be configured to:

when detecting that the learning equipment is triggered to any instruction, acquiring identity information of a current user;

judging whether a voice recognition library matched with the identity information of the user is stored or not;

if a voice recognition library matched with the identity information of the user is stored, acquiring the current date and the construction date of the voice recognition library;

calculating the construction duration of the voice recognition library according to the current date and the construction date;

judging whether the construction time length is greater than a preset time length;

and if the time length is longer than the preset time length, acquiring a plurality of pieces of pre-stored voice information matched with the identity information of the learning equipment user.

A first identifying unit 402, configured to identify the plurality of pieces of pre-stored voice information acquired by the first acquiring unit 401, and acquire words included in the plurality of pieces of pre-stored voice information.

As an optional implementation manner, the first recognition unit 402 recognizes a plurality of pieces of pre-stored voice information, and the manner of obtaining words included in the plurality of pieces of pre-stored voice information may specifically be:

performing voice recognition on a plurality of pre-stored voice messages to obtain text information corresponding to each pre-stored voice message, wherein one pre-stored voice message corresponds to one text information;

performing semantic analysis on each character information, and dividing each character information into a plurality of target words;

synthesizing a plurality of target words corresponding to each character information to generate a target word library;

and merging the same target words in the target word library, and determining the target words in the merged target word library as words contained in a plurality of pre-stored voice messages.

A calculating unit 403, configured to calculate a usage frequency of each word identified by the first identifying unit 402, determine a target word with the usage frequency greater than a preset frequency from the words, and determine the target word as a common word.

A storage unit 404, configured to identify relevant information corresponding to the commonly used words determined by the calculation unit 403, and store the commonly used words and the relevant information corresponding to the commonly used words in a voice recognition library matched with the identity information of the user in an associated manner, where one commonly used word corresponds to one piece of relevant information, and the relevant information at least includes meanings of the commonly used words and pronunciation of the commonly used words.

As an optional implementation manner, the manner of identifying the related information corresponding to the common terms and storing the common terms and the related information corresponding to the common terms in the speech recognition library matched with the identity information of the user in an associated manner by the storage unit 404 may specifically be:

identifying a plurality of pronunciations of the common word from a plurality of pre-stored voice messages, and identifying a plurality of meanings of the common word corresponding to the plurality of pre-stored voice messages;

determining a target pronunciation with the most use times from the plurality of pronunciations and determining a target meaning with the most use times from the plurality of meanings;

generating related information corresponding to the common words by combining the target pronunciation and the target meaning;

and associating the common words with the related information to generate a common word information set, and storing the common word information set into a voice recognition library matched with the identity information of the user.

In the learning device described in fig. 4, the speech information can be recognized according to the meaning of the common words in the speech recognition library, so that the difficulty of recognizing the speech information by the learning device is reduced, and the accuracy of speech recognition by the learning device can be improved, thereby improving the efficiency of speech recognition by the learning device. And the speech recognition of the learning equipment can be more accurate by updating the speech recognition library on time. In addition, words contained in the recognized voice information can be more accurately recognized.

EXAMPLE five

Referring to fig. 5, fig. 5 is a schematic structural diagram of another learning apparatus according to an embodiment of the present invention. The learning apparatus shown in fig. 5 is optimized by the learning apparatus shown in fig. 4. Compared to the learning apparatus shown in fig. 4, the first acquisition unit 401 of the learning apparatus shown in fig. 5 may include:

the first obtaining sub-unit 4011 is configured to obtain identity information of a learning device user when a target instruction for constructing a speech recognition library is detected.

The first determining sub-unit 4012 is configured to determine a speech key factor of the user from the identity information acquired by the first acquiring sub-unit 4011.

A second obtaining sub-unit 4013 configured to obtain, from the database, several pieces of pre-stored speech information that match the speech key factor determined by the first determining sub-unit 4012.

In the embodiment of the invention, a plurality of pieces of pre-stored voice information matched with the voice key factors of the user can be acquired, and because the voice key factors are unique, the voice information corresponding to the user can be accurately acquired through the voice key factors.

As an alternative embodiment, the learning apparatus shown in fig. 5 may further include:

a determining unit 405, configured to determine whether a voice key factor is included in the identity information of the user before the first obtaining subunit 401 obtains the identity information of the user of the learning device when detecting a target instruction for constructing a voice recognition library and when a microphone of the learning device receives a target voice input by the user;

a second recognition unit 406, configured to, when the result determined by the determination unit 405 is negative, recognize a voiceprint of the target voice through a voiceprint recognition technique;

an extracting unit 407, configured to extract a plurality of voiceprint nodes from the voiceprint identified by the second identifying unit 406;

the generating unit 408 is configured to generate a voice key factor of the target voice by calculating the voiceprint nodes extracted by the extracting unit 407, and store the voice key factor into the identity information of the user.

By implementing the implementation mode, when the user uses the voice recognition technology of the learning device for the first time, the voice of the user can be analyzed, and the voice key factor of the user can be obtained and stored, so that the learning device can rapidly determine the information matched with the user according to the voice key factor of the user.

In the learning device described in fig. 5, the speech information can be recognized according to the meaning of the common words in the speech recognition library, so that the difficulty of recognizing the speech information by the learning device is reduced, and the accuracy of speech recognition by the learning device can be improved, thereby improving the efficiency of speech recognition by the learning device. The identity information of the user can be determined through the voice key factor in the voiceprint of the user, and the accuracy of determining the identity information of the user is improved. In addition, target instructions for constructing the speech recognition library can be triggered in various ways, so that the construction of the speech recognition library is more flexible.

EXAMPLE six

Referring to fig. 6, fig. 6 is a schematic structural diagram of another learning apparatus according to an embodiment of the present invention. The learning apparatus shown in fig. 6 is optimized by the learning apparatus shown in fig. 5. Compared to the learning apparatus shown in fig. 5, the calculation unit 403 of the learning apparatus shown in fig. 6 may include:

a first calculating sub-unit 4031, configured to calculate the number of times each word identified by the first identifying unit 402 is used according to a plurality of pieces of pre-stored voice information acquired by the first acquiring unit 401.

The first calculating subunit 4031 is further configured to integrate the number of times each word is used, and calculate the total number of times all words are used.

A second calculating subunit 4032, configured to calculate, according to the number of times of use and the total number of times of use of each word calculated by the first calculating subunit 4031, a use frequency corresponding to each word, where one word corresponds to one use frequency.

A second determining subunit 4033, configured to determine, from the words, a target word obtained by the second calculating subunit 4032 and having a usage frequency greater than the preset frequency, and determine the target word as a common word.

In the embodiment of the invention, the common words of the user are determined by calculating the proportion of the times of any words input by the voice of the user to the total times of all words input by the user, so that the common words of the user generated by the learning equipment are really related to the language habits of the user, and further the subsequent learning equipment can identify the voice information of the user more accurately.

a detecting unit 409, configured to identify, in the storage unit 404, relevant information corresponding to the commonly used words, and store the commonly used words and the relevant information corresponding to the commonly used words in a voice recognition library matched with the identity information of the user in an associated manner, and detect whether a target word voice matched with the pronunciation of any commonly used word in the voice recognition library exists in the current voice when the current voice input by the user is detected;

a second obtaining unit 410, configured to, if the detection result of the detecting unit 409 is yes, obtain target related information of a common word corresponding to the target word voice from the voice recognition library;

the third identifying unit 411 is configured to perform semantic identification on the current speech according to the target related information acquired by the second acquiring unit 410, so as to obtain a semantic corresponding to the current speech.

By implementing the implementation mode, the common words contained in the voice information can be acquired from the voice recognition library through the voice information currently input by the user, the voice information of the user can be recognized in an auxiliary manner according to the information such as the meanings of the common words, the learning equipment is not required to recognize the semantics of each word in the voice information, the efficiency of recognizing the voice information input by the user by the learning equipment is improved, and the voice recognition of the learning equipment is more accurate.

In the learning device described in fig. 6, the speech information can be recognized according to the meaning of the common words in the speech recognition library, so that the difficulty of recognizing the speech information by the learning device is reduced, and the accuracy of speech recognition by the learning device can be improved, thereby improving the efficiency of speech recognition by the learning device. The used times of each word and the total used times of all the words can be accurately obtained, so that the calculation result of the use frequency of each word is more accurate. In addition, the common words of the user generated by the learning equipment can be really related to the language habit of the user, and therefore the subsequent learning equipment can identify the voice information of the user more accurately.

EXAMPLE seven

Referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure. As shown in fig. 7, the electronic device may include:

a memory 701 in which executable program code is stored;

a processor 702 coupled to the memory 701;

wherein, the processor 702 calls the executable program code stored in the memory 701 to execute part or all of the steps of the method in the above method embodiments.

The embodiment of the invention also discloses a computer readable storage medium, wherein the computer readable storage medium stores program codes, wherein the program codes comprise instructions for executing part or all of the steps of the method in the above method embodiments.

Embodiments of the present invention also disclose a computer program product, wherein, when the computer program product is run on a computer, the computer is caused to execute part or all of the steps of the method as in the above method embodiments.

The embodiment of the present invention also discloses an application publishing platform, wherein the application publishing platform is used for publishing a computer program product, and when the computer program product runs on a computer, the computer is caused to execute part or all of the steps of the method in the above method embodiments.

It should be appreciated that reference throughout this specification to "an embodiment of the present invention" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrase "in embodiments of the invention" appearing in various places throughout the specification are not necessarily all referring to the same embodiments. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art should also appreciate that the embodiments described in this specification are exemplary and alternative embodiments, and that the acts and modules illustrated are not required in order to practice the invention.

In various embodiments of the present invention, it should be understood that the sequence numbers of the above-mentioned processes do not imply an inevitable order of execution, and the execution order of the processes should be determined by their functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

In addition, the terms "system" and "network" are often used interchangeably herein. It should be understood that the term "and/or" herein is merely one type of association relationship describing an associated object, meaning that three relationships may exist, for example, a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a from which B can be determined. It should also be understood, however, that determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information.

It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by hardware instructions of a program, and the program may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM), or other Memory, such as a magnetic disk, or a combination thereof, A tape memory, or any other medium readable by a computer that can be used to carry or store data.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated units, if implemented as software functional units and sold or used as a stand-alone product, may be stored in a computer accessible memory. Based on such understanding, the technical solution of the present invention, which is a part of or contributes to the prior art in essence, or all or part of the technical solution, can be embodied in the form of a software product, which is stored in a memory and includes several requests for causing a computer device (which may be a personal computer, a server, a network device, or the like, and may specifically be a processor in the computer device) to execute part or all of the steps of the above-described method of each embodiment of the present invention.

The method for constructing the speech recognition library and the learning device disclosed by the embodiment of the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for constructing a speech recognition library, the method comprising:

identifying related information corresponding to the common words, and storing the common words and the related information corresponding to the common words in a voice recognition library matched with the identity information of the user in an associated manner, wherein one common word corresponds to one related information, and the related information at least comprises the meanings of the common words and the pronunciation of the common words;

the identifying the related information corresponding to the common terms and storing the common terms and the related information corresponding to the common terms in association with the voice recognition library matched with the identity information of the user includes:

identifying a plurality of pronunciations of the common words from the plurality of pre-stored voice messages, and identifying a plurality of meanings of the common words corresponding to the plurality of pre-stored voice messages;

generating related information corresponding to the common words by combining the target pronunciation and the target meaning; and associating the common words with the related information to generate a common word information set, and storing the common word information set into a voice recognition library matched with the identity information of the user.

2. The method of claim 1, wherein the obtaining a plurality of pre-stored voice messages matching with the identity information of the user comprises:

determining a voice key factor of the user from the identity information;

3. The method of claim 2, wherein before obtaining identity information of a learning device user upon detecting a target instruction for building the speech recognition library, the method further comprises:

extracting a plurality of voiceprint nodes from the voiceprint;

4. The method according to any one of claims 1 to 3, wherein the calculating of the usage frequency of each word, the determining of the target word with the usage frequency greater than a preset frequency from the words and the determining of the target word as a common word comprises:

5. The method according to any one of claims 1 to 3, wherein after the identifying the related information corresponding to the common words and storing the common words and the related information corresponding to the common words in association with the voice recognition library matched with the identity information of the user, the method further comprises:

6. A learning device, comprising:

the storage unit is used for identifying related information corresponding to the common words and storing the common words and the related information corresponding to the common words into the voice recognition library matched with the identity information of the user in an associated manner, wherein one common word corresponds to one related information, and the related information at least comprises the meanings of the common words and the pronunciation of the common words;

7. The learning apparatus according to claim 6, wherein the first acquisition unit includes:

8. The learning apparatus according to claim 7, characterized in that the learning apparatus further comprises:

9. The learning apparatus according to any one of claims 6 to 8, wherein the calculation unit includes:

10. The learning apparatus according to any one of claims 6 to 8, characterized by further comprising: