CN112767923B - Voice recognition method and device - Google Patents

Voice recognition method and device Download PDF

Info

Publication number
CN112767923B
CN112767923B CN202110008353.6A CN202110008353A CN112767923B CN 112767923 B CN112767923 B CN 112767923B CN 202110008353 A CN202110008353 A CN 202110008353A CN 112767923 B CN112767923 B CN 112767923B
Authority
CN
China
Prior art keywords
data
text
pinyin
unvoiced
preset database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110008353.6A
Other languages
Chinese (zh)
Other versions
CN112767923A (en
Inventor
张伟涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Weimeng Enterprise Development Co ltd
Original Assignee
Shanghai Weimeng Enterprise Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Weimeng Enterprise Development Co ltd filed Critical Shanghai Weimeng Enterprise Development Co ltd
Priority to CN202110008353.6A priority Critical patent/CN112767923B/en
Publication of CN112767923A publication Critical patent/CN112767923A/en
Application granted granted Critical
Publication of CN112767923B publication Critical patent/CN112767923B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/685Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/086Recognition of spelled words

Abstract

The invention discloses a voice recognition method and a voice recognition device. The method and the device have the advantages that the corresponding unvoiced pinyin data is obtained by learning the voice to be recognized, the accuracy of the voice to be recognized can be improved, the matched text is searched from the preset database according to the corresponding unvoiced pinyin data to obtain the recognition result, and compared with the existing method for obtaining the characters corresponding to the voice to be recognized by directly learning, the accuracy of the voice to be recognized can be improved.

Description

Voice recognition method and device
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a speech recognition method and apparatus.
Background
In the prior art, the method for realizing voice recognition is mainly applied to general scenes, but for some special fields such as catering fields, the voice recognition methods have lower recognition accuracy rate on used proper nouns, and meanwhile, the recognition rate is lower due to the interference of noise of the external environment or other factors in natural scenes.
Disclosure of Invention
In view of the foregoing, it is an object of the present invention to provide a speech recognition method and apparatus capable of improving recognition accuracy.
In order to achieve the purpose, the invention provides the following technical scheme:
a speech recognition method, comprising:
acquiring voice data to be recognized;
according to the voice data to be recognized, a first detection model is used for obtaining unvoiced pinyin data corresponding to the voice data to be recognized;
and searching a text matched with the unvoiced pinyin data from a preset database according to the obtained unvoiced pinyin data, and outputting the obtained text.
Preferably, retrieving a text matching the unvoiced pinyin data from a preset database according to the obtained unvoiced pinyin data includes:
according to the obtained unvoiced pinyin data, if a text with unvoiced pinyin consistent with the unvoiced pinyin data is not retrieved from the preset database, character data corresponding to the voice data to be recognized are obtained by using a second detection model according to the obtained unvoiced pinyin data;
and searching a text matched with the unvoiced pinyin data or the character data from the preset database according to the obtained unvoiced pinyin data or the character data, and outputting the obtained text.
Preferably, retrieving a text matching the unvoiced pinyin data from a preset database according to the obtained unvoiced pinyin data includes:
and according to the obtained unvoiced pinyin data, if a text with unvoiced pinyin consistent with the unvoiced pinyin data is retrieved from the preset database, outputting the obtained text.
Preferably, the retrieving, from the preset database, a text matched with the unvoiced pinyin data or the text data according to the obtained unvoiced pinyin data or the text data includes:
according to the obtained character data, if the text which is consistent with the character data is not retrieved from the preset database, retrieving the text which meets the requirement of the first similarity between the unvoiced pinyin and the unvoiced pinyin data from the preset database according to the obtained unvoiced pinyin data, retrieving the text which meets the requirement of the second similarity between the text data and the preset database according to the obtained character data, and outputting the obtained text.
Preferably, the method specifically comprises the following steps: and searching a text with the first similarity of the unvoiced pinyin and the unvoiced pinyin data meeting the requirement from the preset database according to the obtained unvoiced pinyin data, searching a text with the second similarity of the character data meeting the requirement from the preset database according to the obtained character data, and merging and de-duplicating the two parts of texts.
Preferably, the method specifically comprises the following steps: and screening out a text meeting the requirement from the texts retrieved from the preset database according to a first similarity between the unvoiced pinyin of the text retrieved from the preset database and the obtained unvoiced pinyin data, a second similarity between the text retrieved from the preset database and the obtained character data and a common character ratio between the text retrieved from the preset database and the obtained character data.
Preferably, the method specifically comprises the following steps: summing the first similarity of the unvoiced pinyin of the text retrieved from the preset database and the obtained unvoiced pinyin data, the second similarity of the text retrieved from the preset database and the obtained character data, and the ratio of common characters between the text retrieved from the preset database and the obtained character data, and screening out the text meeting the requirements from the text retrieved from the preset database according to the summation result.
Preferably, according to the obtained unvoiced pinyin data or the obtained text data, retrieving a text matching the unvoiced pinyin data or the text data from the preset database includes:
and according to the obtained character data, if a text consistent with the character data is retrieved from the preset database, outputting the obtained text.
Preferably, the first detection model and the second detection model are obtained by using a data set for training, the data set includes voice data, text data corresponding to the voice and pinyin data corresponding to the voice, the first detection model uses the non-tonal pinyin as a label, and the second detection model uses the text as the label.
A speech recognition apparatus for performing the speech recognition method described above.
According to the technical scheme, the voice recognition method and the voice recognition device provided by the invention have the advantages that the voice data to be recognized are firstly obtained, then the unvoiced pinyin data corresponding to the voice data to be recognized are obtained by using the first detection model according to the voice data to be recognized, the text matched with the unvoiced pinyin data is further retrieved from the preset database according to the obtained unvoiced pinyin data, and the obtained text is output. The voice recognition method and the voice recognition device can acquire the corresponding soundless tone pinyin data by learning the voice to be recognized, can improve the accuracy of the voice to be recognized, and can retrieve the matched text from the preset database according to the corresponding soundless tone pinyin data to acquire the recognition result.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a speech recognition method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a speech recognition method according to another embodiment of the present invention;
fig. 3 is a flowchart of a method for retrieving a text matching the unvoiced pinyin data or text data from a predetermined database according to the obtained unvoiced pinyin data or text data according to an embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart of a speech recognition method according to an embodiment of the present invention, and it can be seen that the speech recognition method includes the following steps:
s10: and acquiring voice data to be recognized.
The voice data to be recognized is voice data obtained through the voice obtaining device.
S11: and according to the voice data to be recognized, using a first detection model to obtain unvoiced pinyin data corresponding to the voice data to be recognized.
The first detection model takes voice data as input data and takes unvoiced pinyin data as output data. The first detection model obtains unvoiced pinyin data corresponding to the voice data by extracting and learning features from the input voice data.
S12: and searching a text matched with the unvoiced pinyin data from a preset database according to the obtained unvoiced pinyin data, and outputting the obtained text.
The preset database includes text for matching. And obtaining a recognition result of the voice data to be recognized by retrieving the text matched with the voice data to be recognized from a preset database. In practical application, a corresponding preset database can be established according to a practical application scene.
The text matching the unvoiced pinyin data means that the unvoiced pinyin for the text is at least partially identical to the unvoiced pinyin data. And searching out a text matched with the soundless tone pinyin data from a preset database according to the soundless tone pinyin data corresponding to the obtained speech data to be recognized, and obtaining a recognition result of the speech to be recognized.
The voice recognition method of the embodiment obtains the unvoiced pinyin data corresponding to the voice to be recognized, can improve the accuracy of learning the voice to be recognized, and retrieves the matched text from the preset database according to the corresponding unvoiced pinyin data to obtain the recognition result.
Referring to fig. 2, fig. 2 is a flowchart of a speech recognition method according to another embodiment of the present invention, and it can be seen that the speech recognition method includes the following steps:
s20: and acquiring voice data to be recognized.
The voice data to be recognized is voice data obtained by a voice obtaining device, including but not limited to a microphone.
S21: and according to the voice data to be recognized, using a first detection model to obtain unvoiced pinyin data corresponding to the voice data to be recognized.
S22: and searching a text with the unvoiced pinyin consistent with the unvoiced pinyin data from a preset database according to the obtained unvoiced pinyin data.
And searching the text with the silent pinyin of the text consistent with the silent pinyin data from a preset database according to the silent pinyin data corresponding to the voice data to be recognized, which is obtained through the first detection model.
S23: and according to the obtained unvoiced pinyin data, if a text with unvoiced pinyin consistent with the unvoiced pinyin data is retrieved from the preset database, outputting the obtained text. Thereby obtaining a recognition result for the voice data to be recognized.
S24: and according to the obtained unvoiced pinyin data, if a text with unvoiced pinyin consistent with the unvoiced pinyin data is not retrieved from the preset database, using a second detection model to obtain character data corresponding to the voice data to be recognized according to the obtained unvoiced pinyin data.
The second detection model takes the silent pinyin data as input data and takes the text data as output data. The second detection model converts the unvoiced pinyin data into corresponding text data by extracting and learning features from the input unvoiced pinyin data.
And if the text with the unvoiced pinyin consistent with the unvoiced pinyin data corresponding to the voice data to be recognized is not searched from the preset database, inputting the unvoiced pinyin data corresponding to the voice data to be recognized into the second detection model to obtain the character data corresponding to the voice data to be recognized.
S25: and searching a text matched with the unvoiced pinyin data or the character data from the preset database according to the obtained unvoiced pinyin data or the character data, and outputting the obtained text.
The text matching the character data means that the text is at least partially identical to the character data. And retrieving a text matched with the phonetic data of the silent tone from a preset database according to the phonetic data of the silent tone corresponding to the phonetic data to be recognized, or/and retrieving a text matched with the character data from the preset database according to the character data corresponding to the obtained phonetic data to be recognized, so as to obtain a recognition result of the phonetic data to be recognized.
Preferably, referring to fig. 3, the step of retrieving the text matching the unvoiced pinyin data or the text data from the preset database according to the obtained unvoiced pinyin data or the text data may specifically include the following steps:
s250: and retrieving a text consistent with the character data from the preset database according to the obtained character data.
And retrieving a text consistent with the character data from a preset database according to the character data corresponding to the voice data to be recognized, which is obtained through the second detection model.
S251: and according to the obtained character data, if a text consistent with the character data is retrieved from the preset database, outputting the obtained text. A recognition result for the speech data to be recognized is obtained.
S252: according to the obtained character data, if a text which is consistent with the character data is not retrieved from the preset database, retrieving a text which meets the requirement of first similarity between the silent pinyin and the silent pinyin data from the preset database according to the obtained silent pinyin data, retrieving a text which meets the requirement of second similarity between the silent pinyin and the character data from the preset database according to the obtained character data, and outputting the obtained text.
The first similarity represents the similarity between two pinyin data, and the second similarity represents the similarity between two text data.
If the text which is consistent with the character data obtained by the second detection model is not searched from the preset database, searching the text which is matched with the unvoiced pinyin data from the preset database according to the obtained unvoiced pinyin data, calculating the first similarity of the text and the unvoiced pinyin data, screening out the text which meets the requirement according to the first similarity, and outputting the text. And searching a text matched with the character data from a preset database according to the obtained character data, calculating a second similarity of the text and the character data, screening out the text meeting the requirement according to the second similarity, and outputting the text.
In practical application, a text with a first similarity meeting the requirement of the unvoiced pinyin and the unvoiced pinyin data can be retrieved from a preset database according to the obtained unvoiced pinyin data, a text with a second similarity meeting the requirement of the text data is retrieved from the preset database according to the obtained text data, and the two texts are merged and deduplicated to obtain a candidate text. Results can be further screened from the candidate text.
Optionally, the text which meets the requirement is screened from the text retrieved from the preset database according to the first similarity between the unvoiced pinyin of the text retrieved from the preset database and the obtained unvoiced pinyin data, the second similarity between the text retrieved from the preset database and the obtained text data, and the shared character ratio between the text retrieved from the preset database and the obtained text data, so as to output the recognition result of the voice data to be recognized.
Further preferably, the first similarity between the unvoiced pinyin of the text retrieved from the preset database and the obtained unvoiced pinyin data, the second similarity between the text retrieved from the preset database and the obtained text data, and the ratio of common characters between the text retrieved from the preset database and the obtained text data may be summed, and a text meeting the requirements may be screened from the text retrieved from the preset database according to the summation result.
In practical application, the retrieved matching texts can be sorted according to the size of the summation result of the first similarity, the second similarity and the ratio of the common characters, and the text with a larger summation result is selected from the sorted matching texts and output.
Optionally, the first similarity may be a similarity calculated according to pinyin characters. The second similarity may be a cosine similarity calculated from the representation of the text as a vector. The common character ratio may employ a jaccard coefficient for calculating a ratio of common characters to total characters between two character data.
The method comprises the steps that a first detection model or a second detection model is obtained through pre-training, a data set is used for training to obtain the first detection model and the second detection model, the data set comprises voice data, character data corresponding to the voice and pinyin data corresponding to the voice, the first detection model takes the non-intonation pinyin as a label, and the second detection model takes the characters as the label.
The first detection model or the second detection model may be obtained by training using data common to the corresponding application scenario, and the data included in the data set used may be data common to the corresponding application scenario. In practical applications, the public speech data set may be used when no corpus is available.
The method can be applied to the catering neighborhood, and the established preset database is a dish knowledge base. In one embodiment, the unvoiced pinyin obtained by inputting the speech to be recognized into the first detection model is "hong shao qi zi", and a completely consistent text cannot be retrieved from the dish knowledge base according to the result. Then "hong shao qi zi" is input into the second detection model to obtain the corresponding character "wife braised in soy sauce". And if the completely consistent text cannot be retrieved from the dish knowledge base according to the character result, retrieving the matched text from the dish knowledge base according to the hong shao qi zi and the hong shao wife to obtain the hong shao eggplant, the hong shao pork joint and the hong shao ball which are arranged in the first three, and returning the identification result for the user to select. For data with a null return result or a low score after sorting, the data can be considered as a new dish name or not. Whether the dish name is the dish name or not can be judged by training a language model based on the dish knowledge base.
The voice recognition method of the embodiment obtains the unvoiced pinyin data corresponding to the voice to be recognized through the first detection model, and compared with a method for learning the voice data by taking characters as labels, the voice recognition method greatly reduces the number of the labels, so that the parameter quantity can be reduced in a training model, and the accuracy can be improved.
In addition, in the existing method for learning and recognizing voice data by using characters as labels, a large amount of data sets of special data are needed for training a special neighborhood, and the result is not controllable.
Correspondingly, the embodiment of the invention also provides a voice recognition device, which is used for executing the voice recognition method.
The speech recognition device of the embodiment firstly obtains speech data to be recognized, then uses the first detection model to obtain unvoiced pinyin data corresponding to the speech data to be recognized according to the speech data to be recognized, further retrieves a text matched with the unvoiced pinyin data from a preset database according to the obtained unvoiced pinyin data, and outputs the obtained text. The voice recognition device obtains the unvoiced pinyin data corresponding to the voice to be recognized, can improve the accuracy of learning the voice to be recognized, retrieves the matched text from the preset database according to the corresponding unvoiced pinyin data to obtain a recognition result, and can improve the accuracy of recognizing the voice to be recognized compared with the existing method for directly learning and obtaining the characters corresponding to the voice to be recognized.
The voice recognition method and device provided by the invention are described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (9)

1. A speech recognition method, comprising:
acquiring voice data to be recognized;
according to the voice data to be recognized, a first detection model is used for obtaining unvoiced pinyin data corresponding to the voice data to be recognized;
searching a text matched with the unvoiced pinyin data from a preset database according to the obtained unvoiced pinyin data, and outputting the obtained text;
the retrieving, from a preset database, a text matching the unvoiced pinyin data according to the obtained unvoiced pinyin data includes:
according to the obtained unvoiced pinyin data, if a text with unvoiced pinyin consistent with the unvoiced pinyin data is not retrieved from the preset database, character data corresponding to the voice data to be recognized are obtained by using a second detection model according to the obtained unvoiced pinyin data;
and according to the obtained character data, retrieving a text matched with the character data from the preset database, and outputting the obtained text.
2. The speech recognition method according to claim 1, wherein retrieving, from a preset database, text that matches the unvoiced pinyin data based on the obtained unvoiced pinyin data comprises:
and according to the obtained unvoiced pinyin data, if a text with unvoiced pinyin consistent with the unvoiced pinyin data is retrieved from the preset database, outputting the obtained text.
3. The speech recognition method of claim 1, wherein retrieving, from the pre-set database, text matching the text data based on the obtained text data comprises:
according to the obtained character data, if a text which is consistent with the character data is not retrieved from the preset database, retrieving a text which meets the requirement of first similarity between the silent pinyin and the silent pinyin data from the preset database according to the obtained silent pinyin data, retrieving a text which meets the requirement of second similarity between the silent pinyin and the character data from the preset database according to the obtained character data, and outputting the obtained text.
4. The speech recognition method of claim 3, comprising in particular: and searching a text with the first similarity of the unvoiced pinyin and the unvoiced pinyin data meeting the requirement from the preset database according to the obtained unvoiced pinyin data, searching a text with the second similarity of the character data meeting the requirement from the preset database according to the obtained character data, and merging and de-duplicating the two parts of texts.
5. The speech recognition method according to claim 3, comprising in particular: and screening out a text meeting the requirement from the texts retrieved from the preset database according to a first similarity between the unvoiced pinyin of the text retrieved from the preset database and the obtained unvoiced pinyin data, a second similarity between the text retrieved from the preset database and the obtained character data and a common character ratio between the text retrieved from the preset database and the obtained character data.
6. The speech recognition method of claim 3, comprising in particular: summing a first similarity between the unvoiced pinyin of the text retrieved from the preset database and the obtained unvoiced pinyin data, a second similarity between the text retrieved from the preset database and the obtained text data, and a ratio of common characters between the text retrieved from the preset database and the obtained text data, and screening out a text meeting requirements from the text retrieved from the preset database according to a summation result.
7. The speech recognition method of claim 1, wherein retrieving, from the pre-set database, text matching the text data based on the obtained text data comprises:
and according to the obtained character data, if a text consistent with the character data is retrieved from the preset database, outputting the obtained text.
8. The speech recognition method of claim 1, wherein the first detection model and the second detection model are obtained by training using a data set, the data set comprises speech data, text data corresponding to the speech, and pinyin data corresponding to the speech, the first detection model is labeled with silent pinyin, and the second detection model is labeled with text.
9. A speech recognition apparatus for performing the speech recognition method of any one of claims 1-8.
CN202110008353.6A 2021-01-05 2021-01-05 Voice recognition method and device Active CN112767923B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110008353.6A CN112767923B (en) 2021-01-05 2021-01-05 Voice recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110008353.6A CN112767923B (en) 2021-01-05 2021-01-05 Voice recognition method and device

Publications (2)

Publication Number Publication Date
CN112767923A CN112767923A (en) 2021-05-07
CN112767923B true CN112767923B (en) 2022-12-23

Family

ID=75699340

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110008353.6A Active CN112767923B (en) 2021-01-05 2021-01-05 Voice recognition method and device

Country Status (1)

Country Link
CN (1) CN112767923B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1514387A (en) * 2002-12-31 2004-07-21 中国科学院计算技术研究所 Sound distinguishing method in speech sound inquiry
CN101825953A (en) * 2010-04-06 2010-09-08 朱建政 Chinese character input product with combined voice input and Chinese phonetic alphabet input functions
CN111681669A (en) * 2020-05-14 2020-09-18 上海眼控科技股份有限公司 Neural network-based voice data identification method and equipment

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002229590A (en) * 2001-02-01 2002-08-16 Atr Onsei Gengo Tsushin Kenkyusho:Kk Speech recognition system
CN101000766B (en) * 2007-01-09 2011-02-02 黑龙江大学 Chinese intonation base frequency contour generating method based on intonation model
US8977535B2 (en) * 2011-04-06 2015-03-10 Pierre-Henry DE BRUYN Transliterating methods between character-based and phonetic symbol-based writing systems
US8521539B1 (en) * 2012-03-26 2013-08-27 Nuance Communications, Inc. Method for chinese point-of-interest search
CN105389326B (en) * 2015-09-16 2018-08-31 中国科学院计算技术研究所 Image labeling method based on weak matching probability typical relevancy models
JP6708035B2 (en) * 2016-07-19 2020-06-10 株式会社デンソー Utterance content recognition device
CN108682423A (en) * 2018-05-24 2018-10-19 北京奔流网络信息技术有限公司 A kind of audio recognition method and device
CN110164435A (en) * 2019-04-26 2019-08-23 平安科技(深圳)有限公司 Audio recognition method, device, equipment and computer readable storage medium
CN111739514B (en) * 2019-07-31 2023-11-14 北京京东尚科信息技术有限公司 Voice recognition method, device, equipment and medium
CN110853629A (en) * 2019-11-21 2020-02-28 中科智云科技有限公司 Speech recognition digital method based on deep learning
CN111312255A (en) * 2020-04-24 2020-06-19 郑州迈拓信息技术有限公司 Pronunciation self-correcting device for word and pinyin tones based on voice recognition

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1514387A (en) * 2002-12-31 2004-07-21 中国科学院计算技术研究所 Sound distinguishing method in speech sound inquiry
CN101825953A (en) * 2010-04-06 2010-09-08 朱建政 Chinese character input product with combined voice input and Chinese phonetic alphabet input functions
CN111681669A (en) * 2020-05-14 2020-09-18 上海眼控科技股份有限公司 Neural network-based voice data identification method and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种改进后的递增式语音语料抽选算法;宁振江等;《中国科学院研究生院学报》;20050315(第02期);全文 *

Also Published As

Publication number Publication date
CN112767923A (en) 2021-05-07

Similar Documents

Publication Publication Date Title
WO2021232725A1 (en) Voice interaction-based information verification method and apparatus, and device and computer storage medium
US10403282B2 (en) Method and apparatus for providing voice service
KR101309042B1 (en) Apparatus for multi domain sound communication and method for multi domain sound communication using the same
CN107016994B (en) Voice recognition method and device
CN105931644B (en) A kind of audio recognition method and mobile terminal
CN109637537B (en) Method for automatically acquiring annotated data to optimize user-defined awakening model
CN108428446A (en) Audio recognition method and device
CN106486121B (en) Voice optimization method and device applied to intelligent robot
CN105869640B (en) Method and device for recognizing voice control instruction aiming at entity in current page
JP2019061662A (en) Method and apparatus for extracting information
CN105956053B (en) A kind of searching method and device based on the network information
JP7266683B2 (en) Information verification method, apparatus, device, computer storage medium, and computer program based on voice interaction
CN109448704A (en) Construction method, device, server and the storage medium of tone decoding figure
CN109920409B (en) Sound retrieval method, device, system and storage medium
CN110334110A (en) Natural language classification method, device, computer equipment and storage medium
CN111951779A (en) Front-end processing method for speech synthesis and related equipment
CN112309365A (en) Training method and device of speech synthesis model, storage medium and electronic equipment
CN110019741A (en) Request-answer system answer matching process, device, equipment and readable storage medium storing program for executing
KR20060070605A (en) Using domain dialogue model and language model in intelligent robot speech recognition service device and method
CN110675866A (en) Method, apparatus and computer-readable recording medium for improving at least one semantic unit set
JPWO2016178337A1 (en) Information processing apparatus, information processing method, and computer program
KR101677859B1 (en) Method for generating system response using knowledgy base and apparatus for performing the method
Chakraborty et al. Knowledge-based framework for intelligent emotion recognition in spontaneous speech
KR20190059185A (en) Method and system for improving the accuracy of speech recognition technology based on text data analysis for deaf students
CN112767923B (en) Voice recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant