CN108962232B

CN108962232B - Voice recognition method and device, storage medium and terminal

Info

Publication number: CN108962232B
Application number: CN201810777632.7A
Authority: CN
Inventors: 王华勇
Original assignee: Shanghai Xiaoyi Technology Co Ltd
Current assignee: Shanghai Xiaoyi Technology Co Ltd
Priority date: 2018-07-16
Filing date: 2018-07-16
Publication date: 2021-01-01
Anticipated expiration: 2038-07-16
Also published as: CN108962232A

Abstract

A voice recognition method and device, a storage medium and a terminal are provided, and the voice recognition method comprises the following steps: entering a proper noun recognition mode; acquiring voice input by a user, and recognizing the voice to obtain a recognition result; and when a word combination meeting a preset combination rule exists in the recognition result, only homophones in the word combination are reserved, and the word combination comprises sequentially arranged nouns, preset associated words and homophones of the nouns. The technical scheme of the invention can improve the accuracy of identifying the proper nouns.

Description

Voice recognition method and device, storage medium and terminal

Technical Field

The present invention relates to the field of speech processing technologies, and in particular, to a speech recognition method and apparatus, a storage medium, and a terminal.

Background

In the prior art, when speech recognition is performed on the speech of a user, the speech recognition is generally performed on the basis of words stored in a knowledge base. The knowledge base may be pre-stored with words commonly used in life, professional words in the field, etc.

However, when the knowledge base does not store the proper nouns such as the name of a person, the name of a place, and the name of a brand, the recognition error occurs when the user inputs the above words by voice, and the user experience is poor.

Disclosure of Invention

The invention solves the technical problem of how to improve the accuracy of identifying proper nouns.

In order to solve the foregoing technical problem, an embodiment of the present invention provides a speech recognition method, where the speech recognition method includes: entering a proper noun recognition mode; acquiring voice input by a user, and recognizing the voice to obtain a recognition result; and when a word combination meeting a preset combination rule exists in the recognition result, only homophones in the word combination are reserved, and the word combination comprises sequentially arranged nouns, preset associated words and homophones of the nouns.

Optionally, the keeping of homophones in the word combination includes: determining at least one word of the noun that is homophonic with the homophonic word; the at least one word is reserved.

Optionally, the entering into the proper noun recognition mode includes: and entering the proper noun recognition mode in response to the triggering command of the user.

Optionally, the speech recognition method further includes: and feeding back the reserved recognition result to the user or storing the recognition result in a word bank.

Optionally, the noun is selected from a name of a person, a name of a place, a name of an object, or a name of a brand.

Optionally, the preset relevant word is selected from the group consisting of a word and a word.

In order to solve the above technical problem, an embodiment of the present invention further discloses a speech recognition apparatus, including: the mode entering module is suitable for entering a proper noun recognition mode; the voice recognition module is suitable for acquiring voice input by a user and recognizing the voice to obtain a recognition result; and the processing module is suitable for only keeping homophones in the word combination when the word combination meeting a preset combination rule exists in the recognition result, and the word combination comprises a noun, a preset associated word and homophones of the noun which are sequentially arranged.

Optionally, the processing module includes: a determining unit adapted to determine at least one word of the noun that is homophonic with the homophonic word; a reservation unit adapted to reserve the at least one word.

Optionally, the mode entering module enters the proper noun recognition mode in response to a trigger command of the user.

Optionally, the speech recognition apparatus further includes: and the feedback module is suitable for feeding back the reserved recognition result to the user or storing the reserved recognition result in a word stock.

The embodiment of the invention also discloses a storage medium, wherein computer instructions are stored on the storage medium, and the steps of the voice recognition method are executed when the computer instructions are executed.

The embodiment of the invention also discloses a terminal which comprises a memory and a processor, wherein the memory is stored with a computer instruction capable of running on the processor, and the processor executes the steps of the voice recognition method when running the computer instruction.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

the technical scheme of the invention enters a proper noun recognition mode; acquiring voice input by a user, and recognizing the voice to obtain a recognition result; and when a word combination meeting a preset combination rule exists in the recognition result, only homophones in the word combination are reserved, and the word combination comprises sequentially arranged nouns, preset associated words and homophones of the nouns. According to the technical scheme, the reading habit of the user on the special nouns is considered, the word combination meeting the preset combination rule is processed in the special noun recognition mode, namely, the homophone in the word combination is only reserved, so that the special nouns such as the names of people and places can be recognized, the accuracy of voice recognition is improved, and the user experience is improved.

Further, determining at least one character of the noun that is homophonic with the homophonic character; the at least one word is reserved. In the technical scheme of the invention, the noun in the word combination has at least one character which is homophonic with the homophonic character, and the at least one character is a character to be expressed by a user, so that the at least one character can be reserved as a final recognition result for subsequent steps. Therefore, the retention of wrong homophones can be avoided, and the accuracy of identifying the proper nouns is realized.

Further, in response to a trigger command of the user, the proper noun recognition mode is entered. Because the additional operation, namely the identification operation of the proper nouns, can be executed in the proper noun identification mode, and the power consumption is larger, in the technical scheme of the invention, the proper noun identification mode is entered when the user issues the trigger command, and the voice identification of the proper nouns can be realized on the basis of reducing the power consumption.

Drawings

FIG. 1 is a flow chart of a speech recognition method according to an embodiment of the present invention;

FIG. 2 is a flow chart of another speech recognition method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present invention.

Detailed Description

As described in the background art, when the unique nouns such as the name of a person, the name of a place, and the name of a brand are not stored in the knowledge base, when the user inputs the above words by voice, a recognition error occurs, and the user experience is poor.

According to the technical scheme, the reading habit of the user on the special nouns is considered, the word combination meeting the preset combination rule is processed in the special noun recognition mode, namely, the homophone in the word combination is only reserved, so that the special nouns such as the names of people and places can be recognized, the accuracy of voice recognition is improved, and the user experience is improved.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

Fig. 1 is a flowchart of a speech recognition method according to an embodiment of the present invention.

The speech recognition method shown in fig. 1 can be executed by a computer, for example, by writing computer program instructions and executing the instructions, and can be executed on any terminal device, such as a mobile phone, a computer, and the like.

The speech recognition method shown in fig. 1 may comprise the steps of:

step S101: entering a proper noun recognition mode;

step S102: acquiring voice input by a user, and recognizing the voice to obtain a recognition result;

step S103: and when a word combination meeting a preset combination rule exists in the recognition result, only homophones in the word combination are reserved, and the word combination comprises sequentially arranged nouns, preset associated words and homophones of the nouns.

In the implementation of step S101, the terminal device may enter a proper noun recognition mode. After entering the proper noun recognition mode, the processing of the word combinations meeting the preset combination rules in the subsequent steps can be triggered.

Specifically, when the terminal device is in a non-specific name recognition mode, if the voice input by the user is acquired, the voice is directly recognized to obtain a recognition result, and the recognition result does not need to be processed. That is, all the words in the recognition result obtained by recognizing the speech are retained.

Correspondingly, when the terminal equipment is in the proper noun recognition mode, if the voice of the user is acquired, the voice is firstly recognized, and a recognition result is obtained. The recognition result comprises all characters obtained by the voice recognition.

Specifically, the specific way of acquiring the voice input by the user may be directly receiving the voice input by the user, or may be called from other devices, applications, or databases.

It is to be understood that any implementable existing algorithm may be used for the specific process of performing speech recognition on a speech, and the embodiment of the present invention is not limited thereto.

Further, in the specific implementation of step S103, the terminal device is in the proper noun recognition mode, which indicates that the word combinations satisfying the preset combination rule in the recognition result can be processed. The preset combination rule may be preset. The preset combination rule can be a noun + a preset associated word + a homophone of the noun. Therefore, the word combination satisfying the preset combination rule means that the word combination comprises nouns arranged in sequence, preset associated words and homophones of the nouns. For example, the words are "Liu of Liu Bei" and "Zhang Fei".

Specifically, if a word combination satisfying a preset combination rule exists in the recognition result, only homophones in the word combination are reserved. Nouns and preset associated words in the word combination can be eliminated. For example, for the word "Liu of Liu Bei", only the homophone word "Liu" is retained; for the word combination "flying by Zhang Fei", only the homophone "flying" is retained.

In this embodiment, the reserved homophones are proper nouns. Further, the proper noun is a reserved combination of adjacent homophones, namely a name of a person, a place name or a brand name. In other words, when the user inputs a voice according to the preset combination rule, the embodiment of the present invention may resolve the proper noun based on the preset combination rule.

In this embodiment, only the homophones in the word combination are reserved, which means that the homophones replace the word combination. For example, a "fly" is substituted for a "Zhang fly".

It should be noted that different preset combination rules may be set according to different application scenarios or different user expression habits.

In a specific application scenario of the invention, the user inputs a voice "the name of my colleague is the fly of liu zhuang of liu liao". The recognition result of the speech in the prior art is that the name of the co-worker is the flying time of the Liu Zhang of Liu Bei. In the embodiment of the present invention, after the processing from step S101 to step S103 shown in fig. 1, the final recognition result is "the name of my colleague is liufei". Compared with the speech recognition result in the prior art, the embodiment of the invention can realize accurate recognition of the proper noun and improve the accuracy of the speech recognition.

In a particular application of the invention, the noun may be selected from a name of a person, a name of a place, a name of an object, or a name of a brand. The preset associated word is selected from the group or the place.

It should be noted that, in different application scenarios, the noun and the preset related word may also be configured in a user-defined manner, which is not limited in this embodiment of the present invention.

In an embodiment of the present invention, referring to fig. 2, step S103 shown in fig. 1 may include the following steps: step S201: determining at least one word of the noun that is homophonic with the homophonic word; step S202: the at least one word is reserved.

In the specific implementation, in consideration of the difference of the adopted speech recognition algorithms, in the recognition result, the homophones of the nouns can be the same as the characters in the nouns or different from the characters in the nouns. For example, the user inputs the voice as "flying in flight", and the recognition result may be "flying in flight" or "not flying in flight".

In order to ensure the accuracy of the recognition of proper nouns and avoid the retention of wrong words, at least one word in the nouns which is homophonic with the homophonic words can be determined and retained. For example, whether the recognition result is "flying" or "flying not", the word homophonic to "flying" and "not" or "flying" is determined to be "flying", and then the finally retained word is "flying".

In an embodiment of the present invention, step S101 shown in fig. 1 may include the following steps: and entering the proper noun recognition mode in response to the triggering command of the user.

Specifically, the user's trigger command may be a voice, a gesture operation, a touch screen operation, a key operation, or the like.

Because the additional operation, namely the identification operation of the proper nouns, can be executed in the proper noun identification mode, and the power consumption is larger, in the embodiment of the invention, the proper noun identification mode is entered when the user issues the trigger command, and the voice identification of the proper nouns can be realized on the basis of reducing the power consumption.

In a preferred embodiment of the present invention, the step S103 shown in fig. 1 may further include the following steps: and feeding back the reserved recognition result to the user or storing the recognition result in a word bank.

Specifically, all word combinations in the recognition result are processed, and after only homophones in the word combinations are reserved, the reserved recognition result is obtained. The reserved recognition result is the final recognition result, and the reserved recognition result can be fed back to the user. The reserved recognition result can also be stored in a word stock, and the word stock is added with the proper noun obtained by the voice recognition, so that the recognition result of the proper noun can be directly recognized and obtained during subsequent voice recognition.

Referring to fig. 3, the embodiment of the invention further discloses a voice recognition device 30. The speech recognition apparatus 30 may include a mode entering module 301, a speech recognition module 302, and a processing module 303.

The mode entering module 301 is adapted to enter a proper noun recognition mode; the voice recognition module 302 is adapted to obtain a voice input by a user, and recognize the voice to obtain a recognition result; the processing module 303 is adapted to, when a word combination meeting a preset combination rule exists in the recognition result, only retain homophones in the word combination, where the word combination includes a sequentially arranged noun, a preset associated word, and homophones of the noun.

The terminal equipment is in a proper noun recognition mode and indicates that the word combination meeting the preset combination rule in the recognition result can be processed. The preset combination rule may be preset. The preset combination rule can be a noun + a preset associated word + a homophone of the noun. Therefore, the word combination satisfying the preset combination rule means that the word combination comprises nouns arranged in sequence, preset associated words and homophones of the nouns. For example, the words are "Liu of Liu Bei" and "Zhang Fei".

In the embodiment of the invention, the reading habit of the user on the special nouns is considered, and the word combination meeting the preset combination rule is processed in the special noun recognition mode, namely, the homophone in the word combination is only reserved, so that the recognition on the special nouns such as the names of people and places can be realized, the accuracy of voice recognition is improved, and the user experience is improved.

In an embodiment of the present invention, the processing module 303 shown in fig. 3 may include a determining unit (not shown) adapted to determine at least one word of the noun that is homophonic with the homophonic word; a reservation unit (not shown) adapted to reserve the at least one word.

In another embodiment of the present invention, the mode entering module 301 enters the proper noun recognition mode in response to a trigger command from the user.

The speech recognition device 30 shown in fig. 3 may further include a feedback module (not shown) adapted to feed back the retained recognition result to the user or store the retained recognition result in a word bank.

For more details of the operation principle and the operation mode of the speech recognition apparatus 30, reference may be made to the relevant descriptions in fig. 1 to fig. 2, which are not described herein again.

The embodiment of the invention also discloses a storage medium, on which computer instructions are stored, and when the computer instructions are operated, the steps of the voice recognition method shown in fig. 1 or fig. 2 can be executed. The storage medium may include ROM, RAM, magnetic or optical disks, etc. The storage medium may further include a non-volatile memory (non-volatile) or a non-transitory memory (non-transient), and the like.

The embodiment of the invention also discloses a terminal which can comprise a memory and a processor, wherein the memory is stored with computer instructions capable of running on the processor. The processor, when executing the computer instructions, may perform the steps of the speech recognition method shown in fig. 1 or fig. 2. The terminal includes, but is not limited to, a mobile phone, a computer, a tablet computer and other terminal devices.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A speech recognition method, comprising:

entering a proper noun recognition mode;

acquiring voice input by a user, and recognizing the voice to obtain a recognition result;

when a word combination meeting a preset combination rule exists in the recognition result, only homophones in the word combination are reserved, the word combination comprises nouns, preset associated words and homophones of the nouns which are sequentially arranged, and the preset associated words are selected from the words or the places.

2. The speech recognition method of claim 1, wherein the retaining homophones in the word combinations comprises:

determining at least one word of the noun that is homophonic with the homophonic word;

the at least one word is reserved.

3. The speech recognition method of claim 1, wherein the entering the proper noun recognition mode comprises:

and entering the proper noun recognition mode in response to the triggering command of the user.

4. The speech recognition method of claim 1, further comprising:

and feeding back the reserved recognition result to the user or storing the recognition result in a word bank.

5. The speech recognition method of any one of claims 1 to 4, wherein the noun is selected from a name of a person, a name of a place, a name of an object, or a name of a brand.

6. A speech recognition apparatus, comprising:

the mode entering module is suitable for entering a proper noun recognition mode;

the voice recognition module is suitable for acquiring voice input by a user and recognizing the voice to obtain a recognition result;

and the processing module is suitable for only keeping homophones in the word combination when the word combination meeting a preset combination rule exists in the recognition result, the word combination comprises nouns which are arranged in sequence, preset associated words and homophones of the nouns, and the preset associated words are selected from the words or the places.

7. The speech recognition device of claim 6, wherein the processing module comprises:

a determining unit adapted to determine at least one word of the noun that is homophonic with the homophonic word;

a reservation unit adapted to reserve the at least one word.

8. The speech recognition device of claim 6, wherein the mode entry module enters the proper noun recognition mode in response to a trigger command from the user.

9. The speech recognition device of claim 6, further comprising:

and the feedback module is suitable for feeding back the reserved recognition result to the user or storing the reserved recognition result in a word stock.

10. The speech recognition apparatus of any one of claims 6 to 9, wherein the noun is selected from a name of a person, a name of a place, a name of an object, or a name of a brand.

11. A storage medium having stored thereon computer instructions, wherein the computer instructions are operable to perform the steps of the speech recognition method of any one of claims 1 to 5.

12. A terminal comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the speech recognition method according to any one of claims 1 to 5.