CN115810359A

CN115810359A - Speech recognition method and device, storage medium and electronic device

Info

Publication number: CN115810359A
Application number: CN202211193604.3A
Authority: CN
Inventors: 邓邱伟; 彭强
Original assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd; Haier Uplus Intelligent Technology Beijing Co Ltd
Current assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd; Haier Uplus Intelligent Technology Beijing Co Ltd
Priority date: 2022-09-28
Filing date: 2022-09-28
Publication date: 2023-03-17

Abstract

The application discloses a voice recognition method and device, a storage medium and an electronic device, and relates to the technical field of smart home, wherein the method comprises the following steps: identifying error correction intentions from target voice to be converted into text, wherein the error correction intentions are used for describing characters in the text converted by using the voice; under the condition that the error correction intention is recognized from the target voice, dividing the target voice into a first section and a second section, wherein the first section carries the target error correction intention in the target voice; performing text conversion on the second paragraph to obtain a candidate text; the candidate text is corrected by using the first paragraph to obtain the target text corresponding to the target voice.

Description

Speech recognition method and device, storage medium and electronic device

Technical Field

The application relates to the technical field of smart home, in particular to a voice recognition method and device, a storage medium and an electronic device.

Background

With the increasing maturity of the related technologies of artificial intelligence, more and more intelligent devices begin to enter people's lives, the devices can interact with people, convenience is continuously provided for the production and life of people, and one of the most frequently used interaction modes in the interaction process is voice interaction. In the voice interaction field, voice recognition is carried out on the voice of a user, the voice is converted into corresponding characters, so that key information carried in the voice or intention information of the user is obtained, the control on the running state of the intelligent device can be realized through the voice recognition result directly, and the intelligent device can work according to the intention contained in the voice of the user. However, in the current technology, when performing speech recognition, because of the problems of the user's accent, homophone, algorithm limitation, etc., speech recognition errors easily occur, so that the intelligent device may not work according to the intention expected by the user.

Aiming at the problems of low voice recognition efficiency and the like in the related technology, an effective solution is not provided.

Disclosure of Invention

The embodiment of the application provides a voice recognition method and device, a storage medium and an electronic device, so as to at least solve the problems of low voice recognition efficiency and the like in the related art.

According to an embodiment of the present application, there is provided a speech recognition method including: identifying error correction intentions from target voice to be converted into text, wherein the error correction intentions are used for describing words in the text converted by using the voice; under the condition that the intention of error correction is recognized from the target voice, dividing the target voice into a first paragraph and a second paragraph, wherein the first paragraph carries the intention of error correction in the target voice; performing text conversion on the second paragraph to obtain a candidate text; and correcting the candidate text by using the first paragraph to obtain a target text corresponding to the target voice.

Optionally, the recognizing the intention to correct the text from the target speech to be converted into the text includes: receiving a text conversion request, wherein the text conversion request is used for requesting the target voice to be converted into text; and in response to the text conversion request, recognizing statement description in a target format from the target voice, wherein the target format is a language expression format used for describing a word style of a word to be corrected.

Optionally, the sentence description in the target format is recognized from the target speech, and the sentence description in the target format includes at least one of: retrieving a sentence description comprising the text word structure to be modified from the target voice; and retrieving a sentence description comprising a word using the character to be corrected from the target voice.

Optionally, the retrieving, from the target speech, a sentence description including a word using the word to be corrected includes: acquiring a target character string corresponding to the target voice; under the condition that a first character string corresponding to a reference keyword exists in the target character string, detecting whether a second character string in the target character string, which is positioned in front of the first character string, comprises a third character string positioned behind the first character string; matching the second character string with a word character string included in a target dictionary if the second character string includes the third character string; and determining a sentence description including a word using the word to be corrected in the target speech if some or all of the word character strings existing in the second character string in the target dictionary match the matching word character string.

Optionally, the dividing the target speech into a first paragraph and a second paragraph includes: dividing the target voice into a plurality of voice fragments according to the semanteme; extracting a target voice segment with a semantic meaning expressing an error correction intention from the plurality of voice segments as the first paragraph, and determining other voice segments except the target voice segment from the plurality of voice segments as the second paragraph.

Optionally, the modifying the candidate text by using the first paragraph to obtain a target text corresponding to the target speech includes: converting the first paragraph into a target error correction word; and correcting corresponding characters in the candidate text by using the target error correction characters to obtain the target text.

Optionally, the converting the first paragraph into the target error correction text includes: converting the first paragraph into a target character string, wherein the target character string is used for indicating pronunciation of the first paragraph; extracting a key character string expressing the target error correction intention from the target character string; and acquiring the target error correction characters corresponding to the key character strings.

Optionally, the extracting, from the target character string, a key character string expressing the target error correction intention includes: determining the target character string as the key character string under the condition that the language expression format of the target character string is used for describing the structure of the target error correction text, wherein the text corresponding to the key character string in the character string and the text with the corresponding relationship is the target error correction text; and determining a character string corresponding to the target word as the key character string when the language expression format of the target character string is used for describing the target word using the target error correction words, wherein the character string corresponding to the key character string in the character string and the character having the corresponding relationship is the target error correction words.

According to another aspect of the embodiments of the present application, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned speech recognition method when running.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the voice recognition method through the computer program.

In the embodiment of the application, the error correction intention is recognized from the target voice to be converted into the text, wherein the error correction intention is used for describing characters in the text converted by using the voice; under the condition that the error correction intention is recognized from the target voice, dividing the target voice into a first paragraph and a second paragraph, wherein the first paragraph carries the target error correction intention in the target voice; performing text conversion on the second paragraph to obtain a candidate text; and modifying the candidate text by using the first paragraph to obtain a target text corresponding to the target voice, namely, the target voice comprises the first paragraph carrying the target error correction intention and a second paragraph to be converted into text content, under the condition that the error correction intention is recognized from the target voice, modifying the candidate text of the second paragraph by using the content of the first paragraph carrying the error correction intention, and outputting the target text, so that the output target text can be matched with the intention of the target voice. By adopting the technical scheme, the problems of low voice recognition efficiency and the like in the related technology are solved, and the technical effect of improving the voice recognition efficiency is realized.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a diagram illustrating a hardware environment of a speech recognition method according to an embodiment of the present application;

FIG. 2 is a flow chart of a method of speech recognition according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an alternative speech segment division according to an embodiment of the present application;

FIG. 4 is a flow chart of an alternative text error correction process according to an embodiment of the application;

FIG. 5 is a flow chart of an alternative text error correction process according to an embodiment of the application;

fig. 6 is a block diagram of a speech recognition apparatus according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of an embodiment of the present application, a method for recognizing speech is provided. The method is widely applied to full-House intelligent digital control application scenes such as intelligent homes (Smart Home), intelligent homes, intelligent Home equipment ecology, intelligent House (Intelligence House) ecology and the like. Alternatively, fig. 1 is a schematic diagram of a hardware environment of a speech recognition method according to an embodiment of the present application, and in this embodiment, the method may be applied to the hardware environment formed by the terminal device 102 and the server 104 shown in fig. 1. As shown in fig. 1, the server 104 is connected to the terminal device 102 through a network, and may be configured to provide a service (e.g., an application service) for the terminal or a client installed on the terminal, set a database on the server or independent of the server, and provide a data storage service for the server 104, and configure a cloud computing and/or edge computing service on the server or independent of the server, and provide a data operation service for the server 104.

The network may include, but is not limited to, at least one of: wired networks, wireless networks. The wired network may include, but is not limited to, at least one of: wide area networks, metropolitan area networks, local area networks, which may include, but are not limited to, at least one of the following: WIFI (Wireless Fidelity), bluetooth. Terminal equipment 102 can be but not limited to be PC, the cell-phone, the panel computer, intelligent air conditioner, intelligent cigarette machine, intelligent refrigerator, intelligent oven, intelligent kitchen range, intelligent washing machine, intelligent water heater, intelligent washing equipment, intelligent dish washer, intelligent projection equipment, intelligent TV, intelligent clothes hanger, intelligent (window) curtain, intelligence audio-visual, smart jack, intelligent stereo set, intelligent audio amplifier, intelligent new trend equipment, intelligent kitchen guarding equipment, intelligent bathroom equipment, intelligence robot of sweeping the floor, intelligence robot of wiping the window, intelligence robot of mopping the ground, intelligent air purification equipment, intelligent steam ager, intelligent microwave oven, intelligent kitchen is precious, intelligent clarifier, intelligent water dispenser, intelligent lock etc..

In this embodiment, a speech recognition method is provided, and is applied to the above device terminal, and fig. 2 is a flowchart of a speech recognition method according to an embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:

step S202, recognizing an error correction intention from target voice to be converted into a text, wherein the error correction intention is used for describing characters in the text converted by using the voice;

step S204, under the condition that the error correction intention is recognized from the target voice, dividing the target voice into a first paragraph and a second paragraph, wherein the first paragraph carries the target error correction intention in the target voice;

step S206, performing text conversion on the second paragraph to obtain a candidate text;

and S208, correcting the candidate text by using the first paragraph to obtain a target text corresponding to the target voice.

Through the steps, the target voice comprises the first paragraph carrying the target error correction intention and the second paragraph to be converted into the text content, and when the error correction intention is recognized from the target voice, the candidate text of the second paragraph is corrected by using the content of the first paragraph carrying the error correction intention, and the target text is output, so that the output target text can be matched with the intention of the target voice. By adopting the technical scheme, the problems of low voice recognition efficiency and the like in the related technology are solved, and the technical effect of improving the voice recognition efficiency is realized.

In the technical solution provided in the above step S202, the step of identifying the error correction intention in the target speech may be to identify a target keyword in the speech for indicating the error correction intention, and further determine the speech content including the target keyword as the speech content corresponding to the error correction intention, for example, when detecting that the target speech includes a keyword such as "change", "modify", "change", etc., it is determined that the error correction intention exists in the target speech, and the speech content including the keyword is used as the speech content for error correction.

Optionally, in this embodiment, the identifying the error correction intention in the target speech may be by way of format identification of the speech content in the target speech, and in a case that the speech content in the target format is included in the identified speech, the part of the speech content is determined to be the speech content corresponding to the error correction intention, the target format is a language expression format for describing a character to be modified, such as a known idiom or poetry, and one or more characters included in the idiom or the sentence are the same as the reading included in other part of the content in the target speech.

Optionally, in this embodiment, the error correction intention may be to describe the text to be converted by describing a composition structure of the text, a stroke order of the text, a commonly used word group of the text, or a well-known sentence, for example, by "bang liu" to describe a chinese character structure of chinese character liu, and then can indicate that the text to be replaced in speech is "liu" character, for example, "left-falling and right-falling" is a writing stroke order of a chinese character person, and further by "left-falling and right-falling" to describe that the text to be replaced is "person" character, for example, "number element and" the second "are described.

In the technical solution provided in step S204 above, the order of the first paragraph and the second paragraph in the target speech may be any order, for example, the first paragraph may be before the first paragraph, the second paragraph may be after the second paragraph, the first paragraph is before the second paragraph, or the second paragraph is located in the middle of the first paragraphs, for example, the target speech includes the second paragraph to be converted into text and a plurality of first paragraphs, the second paragraph may be at the beginning of the target speech, at the end of the target speech, or inserted between a plurality of first paragraphs, and the comparison in the present solution is not limited.

Optionally, in this embodiment, the second paragraph may be a continuous speech content in the target speech, or may be obtained by splicing multiple discontinuous speech contents in the target speech, for example, when dividing, sentence breaking may be performed on the target speech according to a short interval between the speeches to obtain multiple speech segments, and the target speech segment expressing the intention of error correction is used as the first paragraph, and speech segments other than the target speech segment in the multiple speech segments are used as the second paragraph, or after recognizing the multiple speech segments and finding that all the multiple speech recognition segments are the first speech segment, the intention of error correction of the first speech segment may be recognized, so as to extract the speech content corresponding to the word to be corrected in the intention of error correction, and splice the second paragraph in order, for example, the text content that the user wants to output is a "i want to remove the taishan", the user may speak a text that is "i want to remove the taishan" in the target speech of error correction, and then the text that the user wants to remove the error correction is a "i" to remove the taishan "and then extracts the text corresponding to remove the taishan" corresponding to the error correction, and then the text corresponding to the error correction is a "5" may be used as the second paragraph "to correct the error correction, and the target speech segment" 5 "may be used as the error correction text to correct the error correction, and the error correction description of the target speech segment, and the error correction is also describe the error correction text segment.

In the technical solution provided in step S206, the candidate text may be a text, or may also be a pinyin text corresponding to the target voice.

In the technical solution provided in step S208, the method for correcting the candidate text using the first paragraph may include, but is not limited to, changing characters in the candidate text or adding character contents to the candidate text, for example, determining a target character to be corrected according to the first paragraph, and replacing a character in the candidate text that has the same pronunciation as the target character with the target character, or determining a target character to be added in the candidate text and an adding position of the target character in the candidate text according to the first paragraph, so as to obtain the target text after adding the target character to the candidate text.

As an alternative embodiment, the recognizing the intention of error correction from the target speech to be converted into text includes:

receiving a text conversion request, wherein the text conversion request is used for requesting the target voice to be converted into text;

and in response to the text conversion request, recognizing statement description in a target format from the target voice, wherein the target format is a language expression format used for describing a word style of a word to be corrected.

Optionally, in this embodiment, the sentence description in the target format is recognized from the speech in the target format, but not limited to, according to the pinyin text corresponding to the target speech, and then the sentence description in the target format is obtained by recognizing according to the pinyin text.

Alternatively, in this embodiment, the sentence description in the target format may include, but is not limited to, a descriptive sentence describing a writing structure of a character, a descriptive sentence describing a writing order of a character, a word forming manner of a character, or a poetry sentence including a character, for example, "Wen Daoliu" is a writing structure descriptive sentence of a character "liu", a character style of a character to be corrected with a pronunciation of "liu" in the candidate text is described by the sentence second, or a descriptive sentence "one left-falling character and one right-falling character" describes a writing order of a character "man", a character style of a character to be corrected with a pronunciation of "ren" in the candidate text is described by the sentence, and a character style of a character to be corrected with a pronunciation of "di" in the candidate text is described by the sentence, for example, "shape element and" the second "are described by the sentence.

As an alternative embodiment, the recognizing the statement description in the target format from the target speech includes at least one of:

retrieving a sentence description comprising the text word structure to be modified from the target voice;

and retrieving a sentence description including a word using the word to be corrected from the target voice.

Optionally, in this embodiment, the manner of retrieving the sentence description including the text word structure to be modified from the target speech may be a manner of matching the target speech by using a preset word structure description sentence, for example, converting the target speech into a pinyin text, and matching the pinyin text of the pre-review structure description sentence with the pinyin text corresponding to the target speech, so as to identify the sentence description in the first format in the target speech.

Optionally, in this embodiment, the manner of retrieving the sentence description including the word using the word to be corrected from the target speech may be a manner of recognizing the target speech by using a preset word description word, for example, the sentence description in the second format included in the target speech is "state element and the first," and the sentence description in the second format includes the word "first" by using the word "state element and the first," so that the target speech is matched by using the preset description word, and in a case where the preset word and the pronunciation of the word "state element and the first" included in the target speech match, it is determined that "state element and the first" are the sentence description in the second format.

Fig. 3 is a schematic diagram of an alternative voice segment division according to an embodiment of the present application, as shown in fig. 3, which may include, but is not limited to, the following:

s301, target voice is obtained, wherein the target voice comprises two parts of contents, namely a second paragraph to be converted into a text and a first paragraph for correcting the contents in the second paragraph;

s302, matching the target voice by using a preset text structure description sentence to obtain a sentence description comprising a text word structure to be modified, and determining the sentence description comprising the text word structure to be modified as a first paragraph;

s303, matching the target voice by using preset character description words to obtain statement description including words using the characters to be corrected, and determining the statement description including the words using the characters to be corrected as a first paragraph;

s304, determining the content of the target voice except the first paragraph as a second paragraph.

As an alternative embodiment, the retrieving, from the target speech, a sentence description including a word using the word to be corrected includes:

acquiring a target character string corresponding to the target voice;

under the condition that a first character string corresponding to a reference keyword exists in the target character string, detecting whether a second character string in the target character string, which is positioned in front of the first character string, comprises a third character string positioned behind the first character string;

matching the second character string with a word character string included in a target dictionary if the second character string includes the third character string;

and determining a sentence description including a word using the word to be corrected in the target speech if some or all of the word character strings existing in the second character string in the target dictionary match the matching word character string.

Optionally, in this embodiment, the reference keyword is a keyword that indicates a target error correction word in a statement description describing the word using the word to be corrected, the reference keyword may include, but is not limited to, "the" xth word "such as" the state element and the first "in the statement description" is a statement description describing the word using the "second" in the word to be corrected, the "of the keyword" in the statement description is used to indicate that "the state element and the fourth word" the "second" in the four words are the word to be corrected, or the "the state element and the fourth word" the "state element and the fourth word" in the statement description are a statement description describing the word using the "fourth" in the word to be corrected, and the fourth word "the fourth word in the statement description is used to indicate that" the state element and the fourth word that are ordered and located at the fourth word "the second" are the word to be corrected.

Optionally, in this embodiment, the second character string includes a third character string used to indicate that part or all of the character strings in the second character string are the same as the third character string, for example, for the sentence description of "shape element and the first sentence", the reference keyword is "shape element and the first sentence", the second character string is a pinyin text corresponding to "shape element and the first sentence", the third character string is a pinyin text of the "second" sentence after "the reference keyword", and the "second" character string in the second character string is the same as the third character string.

Optionally, in this embodiment, the words in the target dictionary are determined according to the recognition result of the reference speech in the historical time period, or may also be preset known idioms or words, for example, the reference text recognized by the speech memorability is referred to at the historical time, and the words or sentences in the reference text are used as the target words for the subsequent text recognition.

As an alternative embodiment, the dividing the target speech into a first paragraph and a second paragraph includes:

dividing the target voice into a plurality of voice fragments according to semanteme;

extracting a target voice segment with a semantic meaning expressing an error correction intention from the plurality of voice segments as the first paragraph, and determining other voice segments except the target voice segment from the plurality of voice segments as the second paragraph.

Optionally, in this embodiment, the target speech segment may be extracted from the plurality of speech segments by detecting whether there is a target keyword corresponding to the error correction intention in the speech segment, so that the target speech segment including the target keyword is used as the first paragraph, the target keyword may include, but is not limited to, "delete", "modify", "change", and the like, the target keyword may be a keyword determined according to a historical speech habit of the user, for example, two keywords of "modify" and "change" are commonly used in a past period of time of the user, so that the "modify" and "change" may be directly used as the target keyword, whether the two keywords exist in the speech segment is detected, or text error correction may be performed according to two keywords of "modify" and "change" commonly used in a past period of time of the user, reference word vectors of the two keywords are calculated, word segmentation is performed on the speech segment, and word vector calculation is performed on the speech segment, and a keyword matching the reference word vector exists in the speech segment as the target keyword.

Optionally, in this embodiment, a target speech segment may be extracted from a plurality of speech segments by detecting whether an error correction template corresponding to an error correction intention exists in the speech segment, where the error correction template is used to describe a text pattern describing a target error correction word, such as by describing a writing order of the target error correction word, and to describe the target error correction word by describing a text structure of the target error correction word, or by describing a reference word including the target error correction word, such as an error correction template of "muzi lie" for describing a text structure of a chinese character "lie", and the "shape element and the" second "error correction template are used to describe a text pattern of a chinese character" second ", and are described by a familiar idiom including the chinese character" second ". The error correction template can be determined by recognizing the voice habits of the user, or can be manually set by the user according to the needs of the user.

As an optional embodiment, the modifying the candidate text by using the first paragraph to obtain a target text corresponding to the target speech includes:

converting the first paragraph into a target error correction word;

and correcting corresponding characters in the candidate text by using the target error correction characters to obtain the target text.

Optionally, in this embodiment, the target error-correcting words may be generated according to the target description sentences included in the first paragraph for describing the word styles of the words to be corrected, or may also be the target error-correcting words corresponding to the target description sentences determined from the description sentences and the error-correcting words having the corresponding relationship, for example, the description sentences are sentences used for describing the stroke order of the words or the constituent structures of the words, and then the word generation model is used to generate the target error-correcting words corresponding to the target description sentences.

In the foregoing embodiment, the first paragraph carries a target error correction intention in a target speech, a statement description in a first format for describing a structure of a word to be corrected may be recorded in the first paragraph, or a statement description in a second format for describing a word using the word to be corrected may also be recorded in the first paragraph, so that the target error correction word can be determined through the first paragraph, fig. 4 is an optional word error correction flowchart one according to the application embodiment, as shown in fig. 4, which may include, but is not limited to, the following steps:

s401, obtaining target voice, wherein the target voice comprises a quantitative part, namely a second paragraph to be converted into a text and a first paragraph for correcting the content in the second paragraph, and converting the target text into a pinyin character string in a pinyin format;

s402, storing sentence descriptions including the text character structure to be modified for describing the text character structure to be modified in a preset dictionary, wherein the descriptions describe Chinese character radicals and formed characters, and sentence description corresponding pinyin character strings, such as: "Lizaozhang", "Gongchangzhang", "Baiyao/baishaodi", "Tuyede/tuyedi", "Lizaozhang", "Gongchangzhang", "Baiyao (Baishaode/Baishaodi)";

s403, determining whether sentence description comprising the text structure to be corrected exists in the target voice by matching the pinyin character string corresponding to the target text with the pinyin character string corresponding to the sentence description comprising the text structure to be corrected;

s404, under the condition that the sentence description comprising the character structure to be corrected exists, determining target error correcting characters, namely ' chapters ', ' sheets ', ' and the like according to the sentence description.

S405, detecting whether the text (candidate text) corresponding to the second paragraph of the target voice has the characters to be corrected with the same pronunciation as the target characters;

s406, replacing the characters to be corrected in the candidate text corresponding to the target voice with the target characters to be corrected under the condition that the characters to be corrected exist;

s407, outputting the text error corrected target text.

Fig. 5 is a second optional text error correction flow chart according to an embodiment of the application, as shown in fig. 5, which may include, but is not limited to, the following steps:

s501, obtaining target voice, wherein the target voice comprises a quantitative part, namely a second paragraph to be converted into a text and a first paragraph for correcting the content in the second paragraph, and converting the target text into a pinyin character string in a pinyin format;

s502, detecting whether characters with the same pronunciation as the keyword exist in the target text;

s503, when there is a character with the same pronunciation as the character "of the keyword" in the target text, detecting whether there is a target word before the keyword, and there is a character with the same pronunciation as the character at the position adjacent to the keyword after the target word, for example, "shape element and the first" the (di) th including homophone ";

s504, detecting whether the target word is located in a preset dictionary, where the preset dictionary records well-known words, commonly used words of the user, and pinyin-formatted character strings of the commonly used words, such as "guest (jiabin)", "wisdom (zhihui)", "jiayidingding (jiayidingding)", "shape element and the first (zhuangyuanjidi)", and so on;

s505, under the condition that the target word exists in the preset dictionary, determining the character in the target word, which has the same pronunciation as the postamble character of the keyword, as a target error correction character;

s506, replacing characters to be corrected in the candidate text corresponding to the target voice with target correction characters;

and S507, outputting the text error corrected target text.

As an alternative embodiment, the converting the first paragraph into the target error correction text includes:

converting the first paragraph into a target character string, wherein the target character string is used for indicating pronunciation of the first paragraph;

extracting a key character string expressing the target error correction intention from the target character string;

and acquiring the target error correction characters corresponding to the key character strings.

Optionally, in this embodiment, the target character may be a pinyin character string of the first paragraph, or a flat letter character string, or a character string of an english word, which is not limited in this embodiment.

Optionally, in this embodiment, a key character string is extracted from the target character string by detecting whether the target character string includes a keyword corresponding to the target error correction intention or a character string with the same pronunciation as the error correction template, for example, the target character string in the converted pinyin format is "lizalozhang", the character string is the same as the pinyin text in the error correction template "immediately preceding chapter", and thus the character string is determined to be the key character string, or the target character string in the converted details format includes the pinyin character string "gensgai" corresponding to the keyword "change", and thus the character string corresponding to the pinyin character string can be determined to be the key character string.

As an alternative embodiment, the extracting, from the target character string, the key character string expressing the target error correction intention includes:

determining the target character string as the key character string under the condition that the language expression format of the target character string is used for describing the structure of the target error correction text, wherein the text corresponding to the key character string in the character string and the text with the corresponding relationship is the target error correction text;

and determining a character string corresponding to the target word as the key character string when the language expression format of the target character string is used for describing the target word using the target error correction words, wherein the character string corresponding to the key character string in the character string and the character having the corresponding relationship is the target error correction words.

Alternatively, in this embodiment, the target word may be determined according to the recognition result of the reference speech in the historical time period, or may also be a preset known idiom or word, for example, a reference text that is recognized in the historical time by referring to the speech memorability is referred, and a word or a sentence in the reference text is used as the target word for the subsequent text recognition.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method of the embodiments of the present application.

Fig. 6 is a block diagram of a speech recognition apparatus according to an embodiment of the present application; as shown in fig. 6, includes: a recognition module 62, configured to recognize a corrective intention from a target voice to be converted into a text, where the corrective intention is used to describe a word in the text converted by using the voice;

a processing module 64, configured to, in a case that an error correction intention is identified from the target speech, divide the target speech into a first paragraph and a second paragraph, where the first paragraph carries the target error correction intention in the target speech;

a conversion module 66, configured to perform text conversion on the second paragraph to obtain a candidate text;

and a correcting module 68, configured to correct the candidate text by using the first paragraph, so as to obtain a target text corresponding to the target speech.

With the above embodiment, the error correction intention is recognized from the target speech to be converted into the text, wherein the error correction intention is used for describing the characters in the text converted by using the speech; under the condition that the error correction intention is recognized from the target voice, dividing the target voice into a first section and a second section, wherein the first section carries the target error correction intention in the target voice; performing text conversion on the second paragraph to obtain a candidate text; and modifying the candidate text by using the first paragraph to obtain a target text corresponding to the target voice, namely, the target voice comprises the first paragraph carrying the target error correction intention and a second paragraph to be converted into text content, under the condition that the error correction intention is recognized from the target voice, modifying the candidate text of the second paragraph by using the content of the first paragraph carrying the error correction intention, and outputting the target text, so that the output target text can be matched with the intention of the target voice. By adopting the technical scheme, the problems of low voice recognition efficiency and the like in the related technology are solved, and the technical effect of improving the voice recognition efficiency is realized.

Optionally, the identification module includes: a receiving unit, configured to receive a text conversion request, where the text conversion request is used to request that the target speech be converted into a text; and the recognition unit is used for recognizing the statement description of the target format from the target voice in response to the text conversion request, wherein the target format is a language expression format used for describing the character style of the character to be corrected.

Optionally, the identification unit is configured to perform at least one of the following operations: retrieving a sentence description comprising the text word structure to be modified from the target voice; and retrieving a sentence description including a word using the word to be corrected from the target voice.

Optionally, the identification unit is further configured to: acquiring a target character string corresponding to the target voice; under the condition that a first character string corresponding to a reference keyword exists in the target character string, detecting whether a second character string in the target character string, which is positioned in front of the first character string, comprises a third character string positioned behind the first character string; matching the second character string with a word character string included in a target dictionary if the second character string includes the third character string; and determining a sentence description including a word using the word to be corrected in the target speech if some or all of the word character strings existing in the second character string in the target dictionary match the matching word character string.

Optionally, the processing module includes: dividing the target voice into a plurality of voice fragments according to the semanteme; extracting a target voice segment with a semantic meaning expressing an error correction intention from the plurality of voice segments as the first paragraph, and determining other voice segments except the target voice segment from the plurality of voice segments as the second paragraph.

Optionally, the modification module includes: the conversion unit is used for converting the first paragraph into a target error correction character; and the correcting unit is used for correcting the corresponding characters in the candidate text by using the target error correction characters to obtain the target text.

Optionally, the conversion unit is configured to: converting the first paragraph into a target character string, wherein the target character string is used for indicating pronunciation of the first paragraph; extracting a key character string expressing the target error correction intention from the target character string; and acquiring the target error correction characters corresponding to the key character strings.

Optionally, the conversion unit is configured to: determining the target character string as the key character string under the condition that the language expression format of the target character string is used for describing the structure of the target error correction text, wherein the text corresponding to the key character string in the character string and the text with the corresponding relationship is the target error correction text; and determining a character string corresponding to the target word as the key character string when the language expression format of the target character string is used for describing the target word using the target error correction words, wherein the character string corresponding to the key character string in the character string and the character having the corresponding relationship is the target error correction words.

An embodiment of the present application further provides a storage medium including a stored program, where the program executes the speech recognition method according to any one of the above methods.

Alternatively, in the present embodiment, the storage medium may be configured to store program codes for performing the following steps: identifying error correction intentions from target voice to be converted into text, wherein the error correction intentions are used for describing words in the text converted by using the voice; under the condition that the error correction intention is recognized from the target voice, dividing the target voice into a first paragraph and a second paragraph, wherein the first paragraph carries the target error correction intention in the target voice; performing text conversion on the second paragraph to obtain a candidate text; and correcting the candidate text by using the first paragraph to obtain a target text corresponding to the target voice.

Embodiments of the present application further provide an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the steps in any of the above embodiments of the speech recognition method.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program: identifying error correction intentions from target voice to be converted into text, wherein the error correction intentions are used for describing words in the text converted by using the voice; under the condition that the error correction intention is recognized from the target voice, dividing the target voice into a first paragraph and a second paragraph, wherein the first paragraph carries the target error correction intention in the target voice; performing text conversion on the second paragraph to obtain a candidate text; and correcting the candidate text by using the first paragraph to obtain a target text corresponding to the target voice.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A method for speech recognition, comprising:

identifying error correction intentions from target voice to be converted into text, wherein the error correction intentions are used for describing words in the text converted by using the voice;

under the condition that the intention of error correction is recognized from the target voice, dividing the target voice into a first paragraph and a second paragraph, wherein the first paragraph carries the intention of error correction in the target voice;

performing text conversion on the second paragraph to obtain a candidate text;

and correcting the candidate text by using the first paragraph to obtain a target text corresponding to the target voice.

2. The method of claim 1, wherein the identifying the intent to correct is from target speech to be converted to text comprises:

3. The method of claim 2, wherein the recognizing a statement description in a target format from the target speech comprises at least one of:

and retrieving a sentence description comprising a word using the character to be corrected from the target voice.

4. The method according to claim 3, wherein the retrieving, from the target speech, a sentence description including a word using the word to be corrected comprises:

acquiring a target character string corresponding to the target voice;

5. The method of claim 1, wherein the dividing the target speech into a first segment and a second segment comprises:

dividing the target voice into a plurality of voice fragments according to the semanteme;

6. The method of claim 1, wherein the modifying the candidate text using the first paragraph to obtain a target text corresponding to the target speech comprises:

converting the first paragraph into a target error correction word;

7. The method of claim 6, wherein converting the first paragraph to a target error correction word comprises:

8. The method according to claim 7, wherein the extracting, from the target character string, the key character string expressing the target error correction intention comprises:

and under the condition that the language expression format of the target character string is used for describing a target word using the target error correction characters, determining a character string corresponding to the target word as the key character string, wherein characters corresponding to the character string and characters with corresponding relationship in the key character string are the target error correction characters.

9. A computer-readable storage medium, comprising a stored program, wherein the program when executed performs the method of any of claims 1 to 8.

10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 8 by means of the computer program.