CN108962232B - Voice recognition method and device, storage medium and terminal - Google Patents

Voice recognition method and device, storage medium and terminal Download PDF

Info

Publication number
CN108962232B
CN108962232B CN201810777632.7A CN201810777632A CN108962232B CN 108962232 B CN108962232 B CN 108962232B CN 201810777632 A CN201810777632 A CN 201810777632A CN 108962232 B CN108962232 B CN 108962232B
Authority
CN
China
Prior art keywords
word
recognition
name
speech recognition
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810777632.7A
Other languages
Chinese (zh)
Other versions
CN108962232A (en
Inventor
王华勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xiaoyi Technology Co Ltd
Original Assignee
Shanghai Xiaoyi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xiaoyi Technology Co Ltd filed Critical Shanghai Xiaoyi Technology Co Ltd
Priority to CN201810777632.7A priority Critical patent/CN108962232B/en
Publication of CN108962232A publication Critical patent/CN108962232A/en
Application granted granted Critical
Publication of CN108962232B publication Critical patent/CN108962232B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting

Abstract

A voice recognition method and device, a storage medium and a terminal are provided, and the voice recognition method comprises the following steps: entering a proper noun recognition mode; acquiring voice input by a user, and recognizing the voice to obtain a recognition result; and when a word combination meeting a preset combination rule exists in the recognition result, only homophones in the word combination are reserved, and the word combination comprises sequentially arranged nouns, preset associated words and homophones of the nouns. The technical scheme of the invention can improve the accuracy of identifying the proper nouns.

Description

Voice recognition method and device, storage medium and terminal
Technical Field
The present invention relates to the field of speech processing technologies, and in particular, to a speech recognition method and apparatus, a storage medium, and a terminal.
Background
In the prior art, when speech recognition is performed on the speech of a user, the speech recognition is generally performed on the basis of words stored in a knowledge base. The knowledge base may be pre-stored with words commonly used in life, professional words in the field, etc.
However, when the knowledge base does not store the proper nouns such as the name of a person, the name of a place, and the name of a brand, the recognition error occurs when the user inputs the above words by voice, and the user experience is poor.
Disclosure of Invention
The invention solves the technical problem of how to improve the accuracy of identifying proper nouns.
In order to solve the foregoing technical problem, an embodiment of the present invention provides a speech recognition method, where the speech recognition method includes: entering a proper noun recognition mode; acquiring voice input by a user, and recognizing the voice to obtain a recognition result; and when a word combination meeting a preset combination rule exists in the recognition result, only homophones in the word combination are reserved, and the word combination comprises sequentially arranged nouns, preset associated words and homophones of the nouns.
Optionally, the keeping of homophones in the word combination includes: determining at least one word of the noun that is homophonic with the homophonic word; the at least one word is reserved.
Optionally, the entering into the proper noun recognition mode includes: and entering the proper noun recognition mode in response to the triggering command of the user.
Optionally, the speech recognition method further includes: and feeding back the reserved recognition result to the user or storing the recognition result in a word bank.
Optionally, the noun is selected from a name of a person, a name of a place, a name of an object, or a name of a brand.
Optionally, the preset relevant word is selected from the group consisting of a word and a word.
In order to solve the above technical problem, an embodiment of the present invention further discloses a speech recognition apparatus, including: the mode entering module is suitable for entering a proper noun recognition mode; the voice recognition module is suitable for acquiring voice input by a user and recognizing the voice to obtain a recognition result; and the processing module is suitable for only keeping homophones in the word combination when the word combination meeting a preset combination rule exists in the recognition result, and the word combination comprises a noun, a preset associated word and homophones of the noun which are sequentially arranged.
Optionally, the processing module includes: a determining unit adapted to determine at least one word of the noun that is homophonic with the homophonic word; a reservation unit adapted to reserve the at least one word.
Optionally, the mode entering module enters the proper noun recognition mode in response to a trigger command of the user.
Optionally, the speech recognition apparatus further includes: and the feedback module is suitable for feeding back the reserved recognition result to the user or storing the reserved recognition result in a word stock.
Optionally, the noun is selected from a name of a person, a name of a place, a name of an object, or a name of a brand.
Optionally, the preset relevant word is selected from the group consisting of a word and a word.
The embodiment of the invention also discloses a storage medium, wherein computer instructions are stored on the storage medium, and the steps of the voice recognition method are executed when the computer instructions are executed.
The embodiment of the invention also discloses a terminal which comprises a memory and a processor, wherein the memory is stored with a computer instruction capable of running on the processor, and the processor executes the steps of the voice recognition method when running the computer instruction.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:
the technical scheme of the invention enters a proper noun recognition mode; acquiring voice input by a user, and recognizing the voice to obtain a recognition result; and when a word combination meeting a preset combination rule exists in the recognition result, only homophones in the word combination are reserved, and the word combination comprises sequentially arranged nouns, preset associated words and homophones of the nouns. According to the technical scheme, the reading habit of the user on the special nouns is considered, the word combination meeting the preset combination rule is processed in the special noun recognition mode, namely, the homophone in the word combination is only reserved, so that the special nouns such as the names of people and places can be recognized, the accuracy of voice recognition is improved, and the user experience is improved.
Further, determining at least one character of the noun that is homophonic with the homophonic character; the at least one word is reserved. In the technical scheme of the invention, the noun in the word combination has at least one character which is homophonic with the homophonic character, and the at least one character is a character to be expressed by a user, so that the at least one character can be reserved as a final recognition result for subsequent steps. Therefore, the retention of wrong homophones can be avoided, and the accuracy of identifying the proper nouns is realized.
Further, in response to a trigger command of the user, the proper noun recognition mode is entered. Because the additional operation, namely the identification operation of the proper nouns, can be executed in the proper noun identification mode, and the power consumption is larger, in the technical scheme of the invention, the proper noun identification mode is entered when the user issues the trigger command, and the voice identification of the proper nouns can be realized on the basis of reducing the power consumption.
Drawings
FIG. 1 is a flow chart of a speech recognition method according to an embodiment of the present invention;
FIG. 2 is a flow chart of another speech recognition method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present invention.
Detailed Description
As described in the background art, when the unique nouns such as the name of a person, the name of a place, and the name of a brand are not stored in the knowledge base, when the user inputs the above words by voice, a recognition error occurs, and the user experience is poor.
According to the technical scheme, the reading habit of the user on the special nouns is considered, the word combination meeting the preset combination rule is processed in the special noun recognition mode, namely, the homophone in the word combination is only reserved, so that the special nouns such as the names of people and places can be recognized, the accuracy of voice recognition is improved, and the user experience is improved.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Fig. 1 is a flowchart of a speech recognition method according to an embodiment of the present invention.
The speech recognition method shown in fig. 1 can be executed by a computer, for example, by writing computer program instructions and executing the instructions, and can be executed on any terminal device, such as a mobile phone, a computer, and the like.
The speech recognition method shown in fig. 1 may comprise the steps of:
step S101: entering a proper noun recognition mode;
step S102: acquiring voice input by a user, and recognizing the voice to obtain a recognition result;
step S103: and when a word combination meeting a preset combination rule exists in the recognition result, only homophones in the word combination are reserved, and the word combination comprises sequentially arranged nouns, preset associated words and homophones of the nouns.
In the implementation of step S101, the terminal device may enter a proper noun recognition mode. After entering the proper noun recognition mode, the processing of the word combinations meeting the preset combination rules in the subsequent steps can be triggered.
Specifically, when the terminal device is in a non-specific name recognition mode, if the voice input by the user is acquired, the voice is directly recognized to obtain a recognition result, and the recognition result does not need to be processed. That is, all the words in the recognition result obtained by recognizing the speech are retained.
Correspondingly, when the terminal equipment is in the proper noun recognition mode, if the voice of the user is acquired, the voice is firstly recognized, and a recognition result is obtained. The recognition result comprises all characters obtained by the voice recognition.
Specifically, the specific way of acquiring the voice input by the user may be directly receiving the voice input by the user, or may be called from other devices, applications, or databases.
It is to be understood that any implementable existing algorithm may be used for the specific process of performing speech recognition on a speech, and the embodiment of the present invention is not limited thereto.
Further, in the specific implementation of step S103, the terminal device is in the proper noun recognition mode, which indicates that the word combinations satisfying the preset combination rule in the recognition result can be processed. The preset combination rule may be preset. The preset combination rule can be a noun + a preset associated word + a homophone of the noun. Therefore, the word combination satisfying the preset combination rule means that the word combination comprises nouns arranged in sequence, preset associated words and homophones of the nouns. For example, the words are "Liu of Liu Bei" and "Zhang Fei".
Specifically, if a word combination satisfying a preset combination rule exists in the recognition result, only homophones in the word combination are reserved. Nouns and preset associated words in the word combination can be eliminated. For example, for the word "Liu of Liu Bei", only the homophone word "Liu" is retained; for the word combination "flying by Zhang Fei", only the homophone "flying" is retained.
In this embodiment, the reserved homophones are proper nouns. Further, the proper noun is a reserved combination of adjacent homophones, namely a name of a person, a place name or a brand name. In other words, when the user inputs a voice according to the preset combination rule, the embodiment of the present invention may resolve the proper noun based on the preset combination rule.
In this embodiment, only the homophones in the word combination are reserved, which means that the homophones replace the word combination. For example, a "fly" is substituted for a "Zhang fly".
It should be noted that different preset combination rules may be set according to different application scenarios or different user expression habits.
In a specific application scenario of the invention, the user inputs a voice "the name of my colleague is the fly of liu zhuang of liu liao". The recognition result of the speech in the prior art is that the name of the co-worker is the flying time of the Liu Zhang of Liu Bei. In the embodiment of the present invention, after the processing from step S101 to step S103 shown in fig. 1, the final recognition result is "the name of my colleague is liufei". Compared with the speech recognition result in the prior art, the embodiment of the invention can realize accurate recognition of the proper noun and improve the accuracy of the speech recognition.
According to the technical scheme, the reading habit of the user on the special nouns is considered, the word combination meeting the preset combination rule is processed in the special noun recognition mode, namely, the homophone in the word combination is only reserved, so that the special nouns such as the names of people and places can be recognized, the accuracy of voice recognition is improved, and the user experience is improved.
In a particular application of the invention, the noun may be selected from a name of a person, a name of a place, a name of an object, or a name of a brand. The preset associated word is selected from the group or the place.
It should be noted that, in different application scenarios, the noun and the preset related word may also be configured in a user-defined manner, which is not limited in this embodiment of the present invention.
In an embodiment of the present invention, referring to fig. 2, step S103 shown in fig. 1 may include the following steps: step S201: determining at least one word of the noun that is homophonic with the homophonic word; step S202: the at least one word is reserved.
In the specific implementation, in consideration of the difference of the adopted speech recognition algorithms, in the recognition result, the homophones of the nouns can be the same as the characters in the nouns or different from the characters in the nouns. For example, the user inputs the voice as "flying in flight", and the recognition result may be "flying in flight" or "not flying in flight".
In order to ensure the accuracy of the recognition of proper nouns and avoid the retention of wrong words, at least one word in the nouns which is homophonic with the homophonic words can be determined and retained. For example, whether the recognition result is "flying" or "flying not", the word homophonic to "flying" and "not" or "flying" is determined to be "flying", and then the finally retained word is "flying".
In an embodiment of the present invention, step S101 shown in fig. 1 may include the following steps: and entering the proper noun recognition mode in response to the triggering command of the user.
Specifically, the user's trigger command may be a voice, a gesture operation, a touch screen operation, a key operation, or the like.
Because the additional operation, namely the identification operation of the proper nouns, can be executed in the proper noun identification mode, and the power consumption is larger, in the embodiment of the invention, the proper noun identification mode is entered when the user issues the trigger command, and the voice identification of the proper nouns can be realized on the basis of reducing the power consumption.
In a preferred embodiment of the present invention, the step S103 shown in fig. 1 may further include the following steps: and feeding back the reserved recognition result to the user or storing the recognition result in a word bank.
Specifically, all word combinations in the recognition result are processed, and after only homophones in the word combinations are reserved, the reserved recognition result is obtained. The reserved recognition result is the final recognition result, and the reserved recognition result can be fed back to the user. The reserved recognition result can also be stored in a word stock, and the word stock is added with the proper noun obtained by the voice recognition, so that the recognition result of the proper noun can be directly recognized and obtained during subsequent voice recognition.
Referring to fig. 3, the embodiment of the invention further discloses a voice recognition device 30. The speech recognition apparatus 30 may include a mode entering module 301, a speech recognition module 302, and a processing module 303.
The mode entering module 301 is adapted to enter a proper noun recognition mode; the voice recognition module 302 is adapted to obtain a voice input by a user, and recognize the voice to obtain a recognition result; the processing module 303 is adapted to, when a word combination meeting a preset combination rule exists in the recognition result, only retain homophones in the word combination, where the word combination includes a sequentially arranged noun, a preset associated word, and homophones of the noun.
The terminal equipment is in a proper noun recognition mode and indicates that the word combination meeting the preset combination rule in the recognition result can be processed. The preset combination rule may be preset. The preset combination rule can be a noun + a preset associated word + a homophone of the noun. Therefore, the word combination satisfying the preset combination rule means that the word combination comprises nouns arranged in sequence, preset associated words and homophones of the nouns. For example, the words are "Liu of Liu Bei" and "Zhang Fei".
Specifically, if a word combination satisfying a preset combination rule exists in the recognition result, only homophones in the word combination are reserved. Nouns and preset associated words in the word combination can be eliminated. For example, for the word "Liu of Liu Bei", only the homophone word "Liu" is retained; for the word combination "flying by Zhang Fei", only the homophone "flying" is retained.
In this embodiment, the reserved homophones are proper nouns. Further, the proper noun is a reserved combination of adjacent homophones, namely a name of a person, a place name or a brand name. In other words, when the user inputs a voice according to the preset combination rule, the embodiment of the present invention may resolve the proper noun based on the preset combination rule.
In the embodiment of the invention, the reading habit of the user on the special nouns is considered, and the word combination meeting the preset combination rule is processed in the special noun recognition mode, namely, the homophone in the word combination is only reserved, so that the recognition on the special nouns such as the names of people and places can be realized, the accuracy of voice recognition is improved, and the user experience is improved.
In an embodiment of the present invention, the processing module 303 shown in fig. 3 may include a determining unit (not shown) adapted to determine at least one word of the noun that is homophonic with the homophonic word; a reservation unit (not shown) adapted to reserve the at least one word.
In another embodiment of the present invention, the mode entering module 301 enters the proper noun recognition mode in response to a trigger command from the user.
The speech recognition device 30 shown in fig. 3 may further include a feedback module (not shown) adapted to feed back the retained recognition result to the user or store the retained recognition result in a word bank.
For more details of the operation principle and the operation mode of the speech recognition apparatus 30, reference may be made to the relevant descriptions in fig. 1 to fig. 2, which are not described herein again.
The embodiment of the invention also discloses a storage medium, on which computer instructions are stored, and when the computer instructions are operated, the steps of the voice recognition method shown in fig. 1 or fig. 2 can be executed. The storage medium may include ROM, RAM, magnetic or optical disks, etc. The storage medium may further include a non-volatile memory (non-volatile) or a non-transitory memory (non-transient), and the like.
The embodiment of the invention also discloses a terminal which can comprise a memory and a processor, wherein the memory is stored with computer instructions capable of running on the processor. The processor, when executing the computer instructions, may perform the steps of the speech recognition method shown in fig. 1 or fig. 2. The terminal includes, but is not limited to, a mobile phone, a computer, a tablet computer and other terminal devices.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (12)

1. A speech recognition method, comprising:
entering a proper noun recognition mode;
acquiring voice input by a user, and recognizing the voice to obtain a recognition result;
when a word combination meeting a preset combination rule exists in the recognition result, only homophones in the word combination are reserved, the word combination comprises nouns, preset associated words and homophones of the nouns which are sequentially arranged, and the preset associated words are selected from the words or the places.
2. The speech recognition method of claim 1, wherein the retaining homophones in the word combinations comprises:
determining at least one word of the noun that is homophonic with the homophonic word;
the at least one word is reserved.
3. The speech recognition method of claim 1, wherein the entering the proper noun recognition mode comprises:
and entering the proper noun recognition mode in response to the triggering command of the user.
4. The speech recognition method of claim 1, further comprising:
and feeding back the reserved recognition result to the user or storing the recognition result in a word bank.
5. The speech recognition method of any one of claims 1 to 4, wherein the noun is selected from a name of a person, a name of a place, a name of an object, or a name of a brand.
6. A speech recognition apparatus, comprising:
the mode entering module is suitable for entering a proper noun recognition mode;
the voice recognition module is suitable for acquiring voice input by a user and recognizing the voice to obtain a recognition result;
and the processing module is suitable for only keeping homophones in the word combination when the word combination meeting a preset combination rule exists in the recognition result, the word combination comprises nouns which are arranged in sequence, preset associated words and homophones of the nouns, and the preset associated words are selected from the words or the places.
7. The speech recognition device of claim 6, wherein the processing module comprises:
a determining unit adapted to determine at least one word of the noun that is homophonic with the homophonic word;
a reservation unit adapted to reserve the at least one word.
8. The speech recognition device of claim 6, wherein the mode entry module enters the proper noun recognition mode in response to a trigger command from the user.
9. The speech recognition device of claim 6, further comprising:
and the feedback module is suitable for feeding back the reserved recognition result to the user or storing the reserved recognition result in a word stock.
10. The speech recognition apparatus of any one of claims 6 to 9, wherein the noun is selected from a name of a person, a name of a place, a name of an object, or a name of a brand.
11. A storage medium having stored thereon computer instructions, wherein the computer instructions are operable to perform the steps of the speech recognition method of any one of claims 1 to 5.
12. A terminal comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the speech recognition method according to any one of claims 1 to 5.
CN201810777632.7A 2018-07-16 2018-07-16 Voice recognition method and device, storage medium and terminal Expired - Fee Related CN108962232B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810777632.7A CN108962232B (en) 2018-07-16 2018-07-16 Voice recognition method and device, storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810777632.7A CN108962232B (en) 2018-07-16 2018-07-16 Voice recognition method and device, storage medium and terminal

Publications (2)

Publication Number Publication Date
CN108962232A CN108962232A (en) 2018-12-07
CN108962232B true CN108962232B (en) 2021-01-01

Family

ID=64481819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810777632.7A Expired - Fee Related CN108962232B (en) 2018-07-16 2018-07-16 Voice recognition method and device, storage medium and terminal

Country Status (1)

Country Link
CN (1) CN108962232B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109841216B (en) * 2018-12-26 2020-12-15 珠海格力电器股份有限公司 Voice data processing method and device and intelligent terminal

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1151489C (en) * 2000-11-15 2004-05-26 中国科学院自动化研究所 Voice recognition method for Chinese personal name place name and unit name
KR100931790B1 (en) * 2002-12-18 2009-12-14 주식회사 케이티 Recognition dictionary generation method using phonetic name list in speech recognition system and method of processing similar phonetic name using same
CN1835077B (en) * 2005-03-14 2011-05-11 台达电子工业股份有限公司 Automatic speech recognizing input method and system for Chinese names
JP5132430B2 (en) * 2008-05-29 2013-01-30 インターナショナル・ビジネス・マシーンズ・コーポレーション Information processing apparatus, information processing method, and program for generating first and last name candidates
CN108108373B (en) * 2016-11-25 2020-09-25 阿里巴巴集团控股有限公司 Name matching method and device
CN107609098B (en) * 2017-09-11 2019-02-01 北京金堤科技有限公司 Searching method and device

Also Published As

Publication number Publication date
CN108962232A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
CN106658129B (en) Terminal control method and device based on emotion and terminal
EP2713255B1 (en) Method and electronic device for prompting character input
US8812302B2 (en) Techniques for inserting diacritical marks to text input via a user device
WO2015169134A1 (en) Method and apparatus for phonetically annotating text
CN116628157A (en) Parameter collection and automatic dialog generation in dialog systems
JP2000315096A5 (en)
CN113168305A (en) Expediting interaction with a digital assistant by predicting user responses
CN108121455B (en) Identification correction method and device
US10712933B2 (en) Terminal and method for controlling terminal
CN110334242B (en) Method and device for generating voice instruction suggestion information and electronic equipment
AU2017268604B2 (en) Accumulated retrieval processing method, device, terminal, and storage medium
CN108962232B (en) Voice recognition method and device, storage medium and terminal
WO2024067307A1 (en) Display method and apparatus for virtual keyboard, electronic device and storage medium
US20180173691A1 (en) Predicting text by combining attempts
US8386236B2 (en) Method for prompting by suggesting stroke order of chinese character, electronic device, and computer program product
JP6160115B2 (en) Information processing apparatus, presentation material optimization method, and program
US10056080B2 (en) Identifying contacts using speech recognition
CN109712613A (en) Semantic analysis library update method, device and electronic equipment
WO2017159207A1 (en) Processing execution device, method for controlling processing execution device, and control program
CN108153574B (en) Application processing method and device and electronic equipment
CN104182061A (en) Multiword input method and equipment
CN112835494A (en) Voice recognition result error correction method and device
US9542569B2 (en) Information processing system, information processing apparatus, storage medium having stored therein information processing program, and method of storing saved data
KR102503581B1 (en) Display method for providing customized studio space
CN113504836B (en) Information input method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210101

Termination date: 20210716