CN111354349A - Voice recognition method and device and electronic equipment - Google Patents

Voice recognition method and device and electronic equipment Download PDF

Info

Publication number
CN111354349A
CN111354349A CN201910305011.3A CN201910305011A CN111354349A CN 111354349 A CN111354349 A CN 111354349A CN 201910305011 A CN201910305011 A CN 201910305011A CN 111354349 A CN111354349 A CN 111354349A
Authority
CN
China
Prior art keywords
standard
matching
voice
result
engine library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910305011.3A
Other languages
Chinese (zh)
Inventor
苑磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Honghe Innovation Information Technology Co Ltd
Original Assignee
Shenzhen Honghe Innovation Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Honghe Innovation Information Technology Co Ltd filed Critical Shenzhen Honghe Innovation Information Technology Co Ltd
Priority to CN201910305011.3A priority Critical patent/CN111354349A/en
Publication of CN111354349A publication Critical patent/CN111354349A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a voice recognition method and device and electronic equipment, wherein the voice recognition method comprises the following steps: acquiring voice information; matching a standard engine library according to the voice information; if the standard engine library is matched with the output first matching result, taking the first matching result as a voice recognition result; if the standard engine library is not matched with the output result, matching a non-standard engine library according to the voice information; and if the non-standard engine library is matched and outputs a second matching result, taking the second matching result as a voice recognition result. The invention can improve the accuracy of voice recognition.

Description

Voice recognition method and device and electronic equipment
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a speech recognition method and apparatus, and an electronic device.
Background
With the development of big data processing technology, the mass data is subject to specialized analysis and processing to form an information asset, and the method has great application value. At present, schools equipped with recording and broadcasting equipment generally record video content of each class by using the recording and broadcasting equipment, the video content comprises rich data contents such as voice data, expression data, limb data and the like of teachers and students, big data analysis is carried out on the video content, and the video content has potential information value for the education industry. Currently, most of the standard voice contents can be recognized from the video contents by using the existing voice recognition technology, but the non-standard (such as dialects in various places, popular words and the like) voice contents cannot be recognized.
Disclosure of Invention
In view of the above, the present invention provides a speech recognition method and apparatus, and an electronic device, which can improve the speech recognition accuracy.
Based on the above object, the present invention provides a speech recognition method, comprising:
acquiring voice information;
matching a standard engine library according to the voice information;
if the standard engine library is matched with the output first matching result, taking the first matching result as a voice recognition result;
if the standard engine library is not matched with the output result, matching a non-standard engine library according to the voice information;
and if the non-standard engine library is matched and outputs a second matching result, taking the second matching result as a voice recognition result.
Optionally, the non-standard engine library includes a non-standard speech recognition module and a non-standard speech database, and the non-standard speech recognition module is used to recognize the speech information and match a speech recognition result with the non-standard speech database.
Optionally, when the non-standard speech database does not match the speech recognition result, splitting the speech information into a plurality of phrases, inputting the plurality of phrases into the non-standard speech database respectively for matching, to obtain possible phrases of each phrase, and combining and matching the possible phrases of each phrase, to obtain a phrase combination with the maximum matching probability, which is used as the second matching result.
Optionally, the method further includes: and acquiring the voice information from the video information, identifying a face object emitting voice from the video information to obtain a face identification result, and associating the face identification result with the voice identification result.
Optionally, the method further includes: and extracting high-frequency words with the occurrence frequency larger than a certain threshold value from the voice recognition result to be used as keywords.
An embodiment of the present invention further provides a speech recognition apparatus, including:
the acquisition module is used for acquiring voice information;
the standard voice matching module is used for matching a standard engine library according to the voice information, and if a first matching result is output by matching the standard engine library, the first matching result is used as a voice recognition result;
and the non-standard voice matching module is used for matching the non-standard engine library according to the voice information if the standard engine library does not match the output result, and taking a second matching result as a voice recognition result if the non-standard engine library matches the output result.
Optionally, the non-standard engine library includes a non-standard speech recognition module and a non-standard speech database, and the non-standard speech recognition module is used to recognize the speech information and match a speech recognition result with the non-standard speech database.
Optionally, the non-standard engine library further includes:
and the splitting and matching module is used for splitting the voice information into a plurality of phrases when the non-standard voice database is not matched with the voice recognition result, respectively inputting the phrases into the non-standard voice database for matching to obtain possible phrases of each phrase, and combining and matching the possible phrases of each phrase to obtain a phrase combination with the maximum matching probability as the second matching result.
Optionally, the apparatus further comprises:
and the face recognition module is used for recognizing a face object emitting voice from the video information to obtain a face recognition result, and associating the face recognition result with the voice recognition result.
Optionally, the apparatus further comprises:
and the extraction module is used for extracting high-frequency words with the occurrence frequency larger than a certain threshold value from the voice recognition result to be used as key words.
The embodiment of the invention also provides electronic equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the voice recognition method when executing the program.
From the above, the voice recognition method, the voice recognition device and the electronic equipment provided by the invention acquire the voice information from the video information, firstly input the voice information into the standard engine library for recognition, and if the voice information is not recognized, input the voice information into the non-standard engine library for recognition, so as to obtain a voice recognition result; the non-standard engine library can split phrases, match and recognize each phrase respectively, and then combine and match the phrases to finally obtain a combined phrase with the maximum matching probability as a voice recognition result. The invention can improve the breadth and accuracy of voice recognition, and provides a data base for the development and innovation of education industry by analyzing the big data of the video content recorded in the school classroom.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a speech recognition method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a speech recognition apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
The voice recognition method provided by the embodiment of the invention is used for recognizing the voice content from the video content, and comprises the following steps:
acquiring voice information;
matching a standard engine library according to the voice information;
if the standard engine library outputs a first matching result in a matching mode, taking the first matching result as a voice recognition result;
if the standard engine library is not matched with the output result, matching the non-standard engine library according to the voice information;
and if the non-standard engine library is matched and outputs a second matching result, taking the second matching result as a voice recognition result.
The voice recognition method of the embodiment of the invention establishes a standard engine library and a non-standard engine library, acquires voice information from video content, firstly matches the voice information with the standard engine library, if matching is successful, the standard engine library outputs a matched first matching result, and then converts the first matching result into character information to realize voice recognition; and if the standard engine library cannot recognize the voice information, matching the voice information with the non-standard engine library, if the matching is successful, outputting a matched second matching result by the non-standard engine library, and converting the second matching result into character information to realize voice recognition. The invention can identify standard voice information and non-standard voice information and improve the accuracy of voice identification.
FIG. 1 is a flowchart of a speech recognition method according to an embodiment of the present invention. As shown in the figure, the speech recognition method provided in the embodiment of the present invention includes:
s10: acquiring voice information;
in the embodiment of the invention, the voice information is obtained from the video content of each class recorded by the recording and playing equipment, and the voice information in the video content is identified.
S11: matching a standard engine library according to the voice information;
the standard engine library comprises a standard voice recognition module, a standard voice database and the like. In the embodiment of the invention, the acquired voice information is input into a standard engine library, the voice information is recognized by using a standard voice recognition module, and a voice recognition result is matched from a standard voice database.
S12: if the standard engine library outputs a first matching result in a matching mode, taking the first matching result as a voice recognition result;
if the input voice information is standard voice information, the standard voice information is recognized by using a standard voice recognition module, a first matching result can be output from a standard voice database in a matching mode, and then the first matching result is converted into character information to serve as a voice recognition result.
The standard voice information is mandarin, standard words and the like, and the standard voice information such as mandarin, standard words and the like is stored in the standard voice database.
S13: if the standard engine library is not matched with the output result, matching the non-standard engine library according to the voice information;
the non-standard engine library comprises a non-standard voice recognition module, a non-standard voice database and the like. In the embodiment of the invention, if the standard engine library is not matched with the output recognition result, the acquired voice information is input into the non-standard engine library, the non-standard voice recognition module is used for recognizing the voice information, and the voice recognition result is matched from the non-standard voice database.
S14: and if the non-standard engine library is matched and outputs a second matching result, taking the second matching result as a voice recognition result.
If the input voice information is non-standard voice information, recognizing the non-standard voice information by using a non-voice recognition module, and matching a voice recognition result from a non-standard voice database; if the matching is successful, outputting a second matching result, converting the second matching result into character information as a voice recognition result, and if the matching is unsuccessful, outputting a prompt voice, wherein the prompt voice information is not recognized.
In an embodiment of the present invention, the non-standard engine library further includes: and the splitting and matching module is used for splitting the voice information into a plurality of phrases when the non-standard voice database does not match the voice recognition result, respectively inputting the phrases into the non-standard voice database for matching to obtain possible phrases of each phrase, and combining and matching the possible phrases of each phrase to obtain a phrase combination with the maximum matching probability as a second matching result. Specifically, the method comprises the following steps:
and for the non-standard voice information, if the non-standard engine library is not successfully matched, judging whether the non-standard voice information contains a plurality of phrases, if so, splitting the non-standard voice information into the phrases to form a plurality of phrases, respectively inputting the phrases into the non-standard engine library for matching, correspondingly outputting a plurality of matched possible phrases for each phrase, matching the plurality of possible phrases of each phrase, and obtaining a phrase combination with the maximum matching probability as a second matching result. For example, a location is commonly called three-bounce for a tricycle, the obtained non-standard voice information is 'mountain bounce', the non-standard voice information 'mountain bounce' is input into a non-standard engine library and is not successfully matched, the non-standard engine library divides 'mountain bounce' into two phrases of 'mountain' and 'bounce', the 'mountain' and the 'bounce' are respectively input into the non-standard engine library for recognition, for the non-standard voice information 'mountain', the recognition result of the non-standard engine library is 'three', 'mountain', 'umbrella', 'fir', and other possible phrases, and for the non-standard voice information 'bounce', the recognition result of the non-standard engine library is 'three bounce', 'little bounce', 'bounce', and other possible phrases are respectively combined and matched, so that the three-bounce ',' three-bounce ',' three bounce ',' two-bounce ', three-bounce', 'bounce', and 'bouncing' are obtained, The 'bounce umbrella' and the like, and a phrase combination 'three bounces' with the maximum matching probability is selected from the 'bounce umbrella' and the like as a second matching result.
The non-standard voice information includes, for example, local dialect, popular word, custom word, and special tone word, and the non-standard voice information includes, for example, local dialect, popular word, custom word, and special tone word stored in the non-standard voice database.
In the embodiment of the invention, the non-standard voice database of the non-standard engine library can be updated by a machine learning method so as to improve the breadth and accuracy of voice recognition.
In the embodiment of the present invention, the speech recognition method further includes: and acquiring voice content from the video content, identifying the face object which sends the voice while performing voice identification to obtain a face identification result, and associating the face identification result with the voice identification result. The face recognition result is basic information such as the name of the face object, the voice recognition result is character information converted from the first matching result or the second matching result, and the basic information of the face object is associated with the character information of the voice recognition to form the recognition result of the face object.
In the embodiment of the present invention, the speech recognition method further includes: and extracting high-frequency words with the occurrence frequency larger than a certain threshold value as keywords according to the voice recognition result. And performing voice recognition on the video content of a certain class or the video content of a certain subject within a certain time, and analyzing and processing the character information according to the character information converted from the first matching result or the second matching result to obtain high-frequency words with the occurrence frequency greater than a certain threshold value as keywords.
Fig. 2 is a schematic diagram of a speech recognition apparatus according to an embodiment of the present invention. As shown in the drawings, a speech recognition apparatus provided in an embodiment of the present invention is configured to recognize speech content from video content, and the apparatus includes:
the acquisition module is used for acquiring voice information;
the standard voice matching module is used for matching the standard engine library according to the voice information, and taking a first matching result as a voice recognition result if the standard engine library is matched and outputs the first matching result;
and the non-standard voice matching module is used for matching the non-standard engine library according to the voice information if the standard engine library does not match the output result, and taking the second matching result as a voice recognition result if the non-standard engine library matches the output second matching result.
The voice recognition device of the embodiment of the invention establishes a standard engine library and a non-standard engine library, the acquisition module acquires voice information from video content, the standard voice matching module is firstly utilized to match the voice information with the standard engine library, if the matching is successful, the standard engine library outputs a matched first matching result, and then the first matching result is converted into character information to realize voice recognition; if the standard engine library can not recognize the voice information, the non-standard voice matching module is used for matching the voice information with the non-standard engine library, if the matching is successful, the non-standard engine library outputs a matched second matching result, and then the second matching result is converted into character information to realize voice recognition. The invention can identify standard voice information and non-standard voice information and improve the accuracy of voice identification.
In the embodiment of the invention, the standard engine library comprises a standard voice recognition module, a standard voice database and the like. In the embodiment of the invention, the acquired voice information is input into a standard engine library, the voice information is recognized by using a standard voice recognition module, and a voice recognition result is matched from a standard voice database. If the input voice information is standard voice information, the standard voice information is recognized by using a standard voice recognition module, a first matching result can be output from a standard voice database in a matching mode, and then the first matching result is converted into character information to serve as a voice recognition result.
The standard voice information is mandarin, standard words and the like, and the standard voice information such as mandarin, standard words and the like is stored in the standard voice database.
In the embodiment of the invention, the non-standard engine library comprises a non-standard voice recognition module, a non-standard voice database and the like. In the embodiment of the invention, if the standard engine library is not matched with the output recognition result, the acquired voice information is input into the non-standard engine library, the non-standard voice recognition module is used for recognizing the voice information, and the voice recognition result is matched from the non-standard voice database. If the input voice information is non-standard voice information, recognizing the non-standard voice information by using a non-voice recognition module, and matching a voice recognition result from a non-standard voice database; if the matching is successful, outputting a second matching result, converting the second matching result into character information as a voice recognition result, and if the matching is unsuccessful, outputting a prompt voice, wherein the prompt voice information is not recognized.
In an embodiment of the present invention, the non-standard engine library further includes:
and the splitting and matching module is used for splitting the voice information into a plurality of phrases when the non-standard voice database does not match the voice recognition result, respectively inputting the phrases into the non-standard voice database for matching to obtain possible phrases of each phrase, and combining and matching the possible phrases of each phrase to obtain a phrase combination with the maximum matching probability as a second matching result.
And for the non-standard voice information, if the non-standard engine library is not successfully matched, judging whether the non-standard voice information contains a plurality of phrases, if so, splitting the non-standard voice information into the phrases to form a plurality of phrases, respectively inputting the phrases into the non-standard engine library for matching, correspondingly outputting a plurality of matched possible phrases for each phrase, matching the plurality of possible phrases of each phrase, and obtaining a phrase combination with the maximum matching probability as a second matching result. For example, a location is commonly called three-bounce for a tricycle, the obtained non-standard voice information is 'mountain bounce', the non-standard voice information 'mountain bounce' is input into a non-standard engine library and is not successfully matched, the non-standard engine library divides 'mountain bounce' into two phrases of 'mountain' and 'bounce', the 'mountain' and the 'bounce' are respectively input into the non-standard engine library for recognition, for the non-standard voice information 'mountain', the recognition result of the non-standard engine library is 'three', 'mountain', 'umbrella', 'fir', and other possible phrases, and for the non-standard voice information 'bounce', the recognition result of the non-standard engine library is 'three bounce', 'little bounce', 'bounce', and other possible phrases are respectively combined and matched, so that the three-bounce ',' three-bounce ',' three bounce ',' two-bounce ', three-bounce', 'bounce', and 'bouncing' are obtained, The 'bounce umbrella' and the like, and a phrase combination 'three bounces' with the maximum matching probability is selected from the 'bounce umbrella' and the like as a second matching result.
The non-standard voice information includes, for example, local dialect, popular word, custom word, and special tone word, and the non-standard voice information includes, for example, local dialect, popular word, custom word, and special tone word stored in the non-standard voice database.
In the embodiment of the invention, the non-standard voice database of the non-standard engine library can be updated by a machine learning method so as to improve the breadth and accuracy of voice recognition.
In an embodiment of the present invention, the speech recognition apparatus further includes:
and the face recognition module is used for recognizing the face object which sends out the voice from the video content to obtain a face recognition result, and associating the face recognition result with the voice recognition result. The face recognition result is basic information such as the name of the face object, the voice recognition result is character information converted from the first matching result or the second matching result, and the basic information of the face object is associated with the character information of the voice recognition to form the recognition result of the face object.
In an embodiment of the present invention, the speech recognition apparatus further includes:
and the extraction module is used for extracting high-frequency words with the occurrence frequency larger than a certain threshold value as the keywords according to the voice recognition result. And performing voice recognition on the video content of a certain class or the video content of a certain subject within a certain time, and analyzing and processing the character information according to the character information converted from the first matching result or the second matching result to obtain high-frequency words with the occurrence frequency greater than a certain threshold value as keywords.
In view of the above object, an embodiment of the present invention further provides an apparatus for performing the speech recognition method. The device comprises:
one or more processors, and a memory.
The apparatus for performing the voice recognition method may further include: an input device and an output device.
The processor, memory, input device, and output device may be connected by a bus or other means.
The memory, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the speech recognition method in embodiments of the present invention. The processor executes various functional applications of the server and data processing by running nonvolatile software programs, instructions and modules stored in the memory, namely, implements the voice recognition method of the above-described method embodiment.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of an apparatus performing the voice recognition method, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory optionally includes memory remotely located from the processor, and these remote memories may be connected to the member user behavior monitoring device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device may receive input numeric or character information and generate key signal inputs related to user settings and function control of the device performing the voice recognition method. The output device may include a display device such as a display screen.
The one or more modules are stored in the memory and, when executed by the one or more processors, perform the speech recognition method of any of the method embodiments described above. The technical effect of the embodiment of the device for executing the voice recognition method is the same as or similar to that of any method embodiment.
The embodiment of the invention also provides a non-transitory computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the processing method of the list item operation in any method embodiment. Embodiments of the non-transitory computer storage medium may be the same or similar in technical effect to any of the method embodiments described above.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes in the methods of the above embodiments may be implemented by a computer program that can be stored in a computer-readable storage medium and that, when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like. The technical effect of the embodiment of the computer program is the same as or similar to that of any of the method embodiments described above.
Furthermore, the apparatuses, devices, etc. described in the present disclosure may be various electronic terminal devices, such as a mobile phone, a Personal Digital Assistant (PDA), a tablet computer (PAD), a smart television, etc., and may also be large terminal devices, such as a server, etc., and therefore the scope of protection of the present disclosure should not be limited to a specific type of apparatus, device. The client disclosed by the present disclosure may be applied to any one of the above electronic terminal devices in the form of electronic hardware, computer software, or a combination of both.
Furthermore, the method according to the present disclosure may also be implemented as a computer program executed by a CPU, which may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method of the present disclosure.
Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.
Further, it should be appreciated that the computer-readable storage media (e.g., memory) described herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (11)

1. A speech recognition method, comprising:
acquiring voice information;
matching a standard engine library according to the voice information;
if the standard engine library is matched with the output first matching result, taking the first matching result as a voice recognition result;
if the standard engine library is not matched with the output result, matching a non-standard engine library according to the voice information;
and if the non-standard engine library is matched and outputs a second matching result, taking the second matching result as a voice recognition result.
2. The method of claim 1, wherein the non-standard engine library comprises a non-standard speech recognition module, a non-standard speech database, and wherein the speech information is recognized by the non-standard speech recognition module and the speech recognition result is matched from the non-standard speech database.
3. The method according to claim 2, wherein when the non-standard speech database does not match the speech recognition result, the speech information is split into a plurality of phrases, the phrases are respectively input into the non-standard speech database for matching to obtain possible phrases of each phrase, and the possible phrases of each phrase are combined and matched to obtain a phrase combination with the maximum matching probability as the second matching result.
4. The method of claim 1, further comprising: and acquiring the voice information from the video information, identifying a face object emitting voice from the video information to obtain a face identification result, and associating the face identification result with the voice identification result.
5. The method of claim 1, further comprising: and extracting high-frequency words with the occurrence frequency larger than a certain threshold value from the voice recognition result to be used as keywords.
6. A speech recognition apparatus, comprising:
the acquisition module is used for acquiring voice information;
the standard voice matching module is used for matching a standard engine library according to the voice information, and if a first matching result is output by matching the standard engine library, the first matching result is used as a voice recognition result;
and the non-standard voice matching module is used for matching the non-standard engine library according to the voice information if the standard engine library does not match the output result, and taking a second matching result as a voice recognition result if the non-standard engine library matches the output result.
7. The apparatus of claim 6, wherein the non-standard engine library comprises a non-standard speech recognition module, a non-standard speech database, and wherein the non-standard speech recognition module is used to recognize the speech information and match speech recognition results from the non-standard speech database.
8. The apparatus of claim 7, wherein the non-standard engine library further comprises:
and the splitting and matching module is used for splitting the voice information into a plurality of phrases when the non-standard voice database is not matched with the voice recognition result, respectively inputting the phrases into the non-standard voice database for matching to obtain possible phrases of each phrase, and combining and matching the possible phrases of each phrase to obtain a phrase combination with the maximum matching probability as the second matching result.
9. The apparatus of claim 6, further comprising:
and the face recognition module is used for recognizing a face object emitting voice from the video information to obtain a face recognition result, and associating the face recognition result with the voice recognition result.
10. The apparatus of claim 6, further comprising:
and the extraction module is used for extracting high-frequency words with the occurrence frequency larger than a certain threshold value from the voice recognition result to be used as key words.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 5 when executing the program.
CN201910305011.3A 2019-04-16 2019-04-16 Voice recognition method and device and electronic equipment Pending CN111354349A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910305011.3A CN111354349A (en) 2019-04-16 2019-04-16 Voice recognition method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910305011.3A CN111354349A (en) 2019-04-16 2019-04-16 Voice recognition method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN111354349A true CN111354349A (en) 2020-06-30

Family

ID=71196967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910305011.3A Pending CN111354349A (en) 2019-04-16 2019-04-16 Voice recognition method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111354349A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112017653A (en) * 2020-07-13 2020-12-01 武汉戴美激光科技有限公司 Laser treatment handle with voice recognition function and adjusting method
CN112102833A (en) * 2020-09-22 2020-12-18 北京百度网讯科技有限公司 Voice recognition method, device, equipment and storage medium
CN114495931A (en) * 2022-01-28 2022-05-13 达闼机器人股份有限公司 Voice interaction method, system, device, equipment and storage medium

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391673A (en) * 2014-11-20 2015-03-04 百度在线网络技术(北京)有限公司 Voice interaction method and voice interaction device
CN105096940A (en) * 2015-06-30 2015-11-25 百度在线网络技术(北京)有限公司 Method and device for voice recognition
US20160118050A1 (en) * 2014-10-24 2016-04-28 Sestek Ses Ve Iletisim Bilgisayar Teknolojileri Sanayi Ticaret Anonim Sirketi Non-standard speech detection system and method
CN105872687A (en) * 2016-03-31 2016-08-17 乐视控股(北京)有限公司 Method and device for controlling intelligent equipment through voice
US20160253990A1 (en) * 2015-02-26 2016-09-01 Fluential, Llc Kernel-based verbal phrase splitting devices and methods
CN105931643A (en) * 2016-06-30 2016-09-07 北京海尔广科数字技术有限公司 Speech recognition method and apparatus
CN106251859A (en) * 2016-07-22 2016-12-21 百度在线网络技术(北京)有限公司 Voice recognition processing method and apparatus
CN106385548A (en) * 2016-09-05 2017-02-08 努比亚技术有限公司 Mobile terminal and method for generating video captions
CN106910498A (en) * 2017-03-01 2017-06-30 成都启英泰伦科技有限公司 The method for improving voice control command word discrimination
CN107220292A (en) * 2017-04-25 2017-09-29 上海庆科信息技术有限公司 Intelligent dialogue device, reaction type intelligent sound control system and method
GB2549117A (en) * 2016-04-05 2017-10-11 Chase Information Tech Services Ltd A searchable media player
CN109036410A (en) * 2018-08-30 2018-12-18 Oppo广东移动通信有限公司 Audio recognition method, device, storage medium and terminal
CN109065020A (en) * 2018-07-28 2018-12-21 重庆柚瓣家科技有限公司 The identification storehouse matching method and system of multilingual classification
CN109360564A (en) * 2018-12-10 2019-02-19 珠海格力电器股份有限公司 Method and device for selecting language identification mode and household appliance
CN109450745A (en) * 2018-10-15 2019-03-08 深圳市欧瑞博科技有限公司 Information processing method, device, intelligence control system and intelligent gateway
CN109524017A (en) * 2018-11-27 2019-03-26 北京分音塔科技有限公司 A kind of the speech recognition Enhancement Method and device of user's custom words

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160118050A1 (en) * 2014-10-24 2016-04-28 Sestek Ses Ve Iletisim Bilgisayar Teknolojileri Sanayi Ticaret Anonim Sirketi Non-standard speech detection system and method
CN104391673A (en) * 2014-11-20 2015-03-04 百度在线网络技术(北京)有限公司 Voice interaction method and voice interaction device
US20160253990A1 (en) * 2015-02-26 2016-09-01 Fluential, Llc Kernel-based verbal phrase splitting devices and methods
CN105096940A (en) * 2015-06-30 2015-11-25 百度在线网络技术(北京)有限公司 Method and device for voice recognition
CN105872687A (en) * 2016-03-31 2016-08-17 乐视控股(北京)有限公司 Method and device for controlling intelligent equipment through voice
GB2549117A (en) * 2016-04-05 2017-10-11 Chase Information Tech Services Ltd A searchable media player
CN105931643A (en) * 2016-06-30 2016-09-07 北京海尔广科数字技术有限公司 Speech recognition method and apparatus
CN106251859A (en) * 2016-07-22 2016-12-21 百度在线网络技术(北京)有限公司 Voice recognition processing method and apparatus
CN106385548A (en) * 2016-09-05 2017-02-08 努比亚技术有限公司 Mobile terminal and method for generating video captions
CN106910498A (en) * 2017-03-01 2017-06-30 成都启英泰伦科技有限公司 The method for improving voice control command word discrimination
CN107220292A (en) * 2017-04-25 2017-09-29 上海庆科信息技术有限公司 Intelligent dialogue device, reaction type intelligent sound control system and method
CN109065020A (en) * 2018-07-28 2018-12-21 重庆柚瓣家科技有限公司 The identification storehouse matching method and system of multilingual classification
CN109036410A (en) * 2018-08-30 2018-12-18 Oppo广东移动通信有限公司 Audio recognition method, device, storage medium and terminal
CN109450745A (en) * 2018-10-15 2019-03-08 深圳市欧瑞博科技有限公司 Information processing method, device, intelligence control system and intelligent gateway
CN109524017A (en) * 2018-11-27 2019-03-26 北京分音塔科技有限公司 A kind of the speech recognition Enhancement Method and device of user's custom words
CN109360564A (en) * 2018-12-10 2019-02-19 珠海格力电器股份有限公司 Method and device for selecting language identification mode and household appliance

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112017653A (en) * 2020-07-13 2020-12-01 武汉戴美激光科技有限公司 Laser treatment handle with voice recognition function and adjusting method
CN112102833A (en) * 2020-09-22 2020-12-18 北京百度网讯科技有限公司 Voice recognition method, device, equipment and storage medium
CN112102833B (en) * 2020-09-22 2023-12-12 阿波罗智联(北京)科技有限公司 Speech recognition method, device, equipment and storage medium
CN114495931A (en) * 2022-01-28 2022-05-13 达闼机器人股份有限公司 Voice interaction method, system, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109346059B (en) Dialect voice recognition method and electronic equipment
CN110444198B (en) Retrieval method, retrieval device, computer equipment and storage medium
US11482242B2 (en) Audio recognition method, device and server
CN109461437B (en) Verification content generation method and related device for lip language identification
WO2018223796A1 (en) Speech recognition method, storage medium, and speech recognition device
CN112530408A (en) Method, apparatus, electronic device, and medium for recognizing speech
JP2020030408A (en) Method, apparatus, device and medium for identifying key phrase in audio
CN111967224A (en) Method and device for processing dialog text, electronic equipment and storage medium
CN114556328B (en) Data processing method, device, electronic equipment and storage medium
CN110910903B (en) Speech emotion recognition method, device, equipment and computer readable storage medium
US20150179173A1 (en) Communication support apparatus, communication support method, and computer program product
CN105975569A (en) Voice processing method and terminal
CN107844470B (en) Voice data processing method and equipment thereof
CN111259148A (en) Information processing method, device and storage medium
CN111354349A (en) Voice recognition method and device and electronic equipment
CN110826637A (en) Emotion recognition method, system and computer-readable storage medium
US11893813B2 (en) Electronic device and control method therefor
CN110544470B (en) Voice recognition method and device, readable storage medium and electronic equipment
CN114449310A (en) Video editing method and device, computer equipment and storage medium
KR102312993B1 (en) Method and apparatus for implementing interactive message using artificial neural network
CN114429635A (en) Book management method
CN111354377B (en) Method and device for recognizing emotion through voice and electronic equipment
CN115840808A (en) Scientific and technological project consultation method, device, server and computer-readable storage medium
CN109635125B (en) Vocabulary atlas building method and electronic equipment
CN110970030A (en) Voice recognition conversion method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination