CN115223588B - Child voice phrase matching method based on pinyin distance and sliding window - Google Patents
Child voice phrase matching method based on pinyin distance and sliding window Download PDFInfo
- Publication number
- CN115223588B CN115223588B CN202210292844.2A CN202210292844A CN115223588B CN 115223588 B CN115223588 B CN 115223588B CN 202210292844 A CN202210292844 A CN 202210292844A CN 115223588 B CN115223588 B CN 115223588B
- Authority
- CN
- China
- Prior art keywords
- distance
- target text
- pinyin
- phrase
- matching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000013518 transcription Methods 0.000 claims abstract description 16
- 230000035897 transcription Effects 0.000 claims abstract description 16
- 238000002372 labelling Methods 0.000 claims abstract description 4
- 230000001174 ascending effect Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000019771 cognition Effects 0.000 abstract description 2
- 230000009286 beneficial effect Effects 0.000 abstract 1
- 230000003930 cognitive ability Effects 0.000 description 2
- 230000001149 cognitive effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/169—Annotation, e.g. comment data or footnotes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a child voice phrase matching method based on pinyin distance and a sliding window, which comprises the following steps: collecting phrase audios of children, obtaining audio transcription texts through a voice recognition model, and marking given target text phrases; converting the target text phrase and the transcribed text into corresponding pinyin sequences, and searching the minimum pinyin distance between the target text phrase and the pinyin sequences of the transcribed text by using a sliding window; and calculating an optimal judgment section according to the data labeling result and the minimum distance set, wherein the phrase matching is successful when the minimum distance is smaller than the left end point of the section, the matching is failed when the minimum distance is larger than or equal to the right end point of the section, and the matching is submitted to manual judgment when the minimum distance is within the section. The invention considers the ambiguity of the pronunciation of the children and the uncertainty of the sentence length, combines the ideas of the pinyin distance and the sliding window, and uses the artificial auxiliary judgment, thereby being beneficial to improving the accuracy of target text phrase matching, more accurately judging the cognition level of the children and having practicability.
Description
Technical Field
The invention relates to the field of natural language processing, in particular to a child voice phrase matching method based on pinyin distance and a sliding window.
Background
Today, the cognitive ability assessment of children is one direction of brain science research, and one solution is to let children give a phrase description of a picture or a scene, and match the phrase description with a target text to determine the cognitive correctness. However, due to the lack of learning ability of the young children, the young children often need to evaluate according to speaking content, which involves audio acquisition, transcription and judgment, thus greatly increasing the workload of volunteers. Aiming at the problem, the machine can participate in the transfer and judgment links so as to save labor cost. Along with the development of voice recognition technology, the current adult voice recognition accuracy can reach more than 95%, and related products are widely applied. However, the children may speak with unclear mouth and teeth, and the existing speech recognition model is difficult to correct the part with ambiguous expression, so that the target text phrase is difficult to match, and the number of audio frequency misjudged as cognitive errors is increased.
From the pinyin perspective, if two completely different Chinese characters are similar in pronunciation, the corresponding pinyin also has certain similarity. By measuring the pinyin distance, chinese characters with similar pronunciation can be matched in a certain range, and the problems can be well solved. At present, the pinyin distance is usually represented by the edit distance of two pinyin corresponding to English letter strings, so that the method has certain feasibility, but ignores the similarity of pronunciation between initials or finals of the pinyin.
In the audio acquisition process, the cognitive ability of the young children is considered, the length of the speaking content of the children is difficult to limit, more redundant words often appear, and matching of target text phrases is affected. Thus, in long children's voice transcription text, there is a need to find target text phrases that may match, and the sliding window strategy is viable.
Disclosure of Invention
In view of the above, the present invention aims to provide a method for matching children's speech phrases based on pinyin distance and sliding window, so as to find possibly matched target text phrases in the content of children speaking, and reduce adverse effects caused by children's pronunciation ambiguity.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a children's voice phrase matching method based on Pinyin distance and sliding window includes the following steps:
Step 1: giving a target text phrase, collecting audio of the phrase of the child, obtaining a transcription text of the audio of the phrase of the child through a voice recognition model, and marking according to whether the content of the audio expression comprises the target text phrase;
step 2: converting the target text phrase and the transcription text into corresponding pinyin sequences, searching a subsequence with the smallest pinyin distance with the target text phrase in the pinyin sequences corresponding to the transcription text by using a sliding window algorithm, and recording the minimum distance, wherein the method specifically comprises the following steps of:
2.1 Regardless of pinyin tones, converting the target text phrase and the transcribed text into a corresponding pinyin sequence;
2.2 Using a sliding window algorithm, wherein the window size is the same as the word number of the target text phrases, sliding the window rightwards by 1 word each time, traversing the pinyin sequence corresponding to the transcription text, searching the subsequence with the smallest pinyin distance with the target text phrases, wherein the subsequence length=window size, recording the minimum distance, if a plurality of target text phrases exist, respectively performing the operation on each target text phrase to obtain a set with the minimum distance, wherein the number of set elements is the number of the target text phrases, and finally searching the set for the minimum value as the minimum distance between the transcription text and the plurality of target text phrases;
2.3 For two pinyin sequences s= { S 1,s2,......,sn}、Q={q1,q2,......,qn }, there are:
d(S,Q)=[d(s1,q1)+d(s2,q2)+……+d(sn,qn)]÷n
d is the pinyin distance, and for pinyin s i、qi of two independent words, s i、qi is split into an initial part and a final part respectively, then there are:
d (s i,qi) =initial distance (s i,qi) +final distance (s i,qi)
Initial consonant distance (s i,qi) =initial consonant edit distance (s i,qi) ×initial weight (s i,qi)
Wherein the sound mother right value (s i,qi) is designed manually according to the similarity of the pronunciation of the initials of s i、qi, the weight range is 0.5 and 1.5, and the calculation mode of the vowel distance (s i,qi) is consistent with the distance of the initials.
Step 3: calculating the minimum distance of all marked data in the step 1 by using the method in the step 2 to obtain a set of minimum distances, obtaining a judging section according to the set proportion of the artificial participation degree, and for each minimum distance, if the minimum distance is smaller than the left end point of the section, matching the target text phrase successfully, if the minimum distance is larger than or equal to the right end point of the section, matching the target text phrase fails, and if the minimum distance is within the section, namely comprises the left end point of the section but does not comprise the right end point of the section, manually judging whether the target text phrase is matched; according to the data labeling result, for each set manual participation proportion, a sliding window algorithm is used for searching a corresponding judgment interval when the accuracy reaches the maximum, and the method specifically comprises the following steps:
3.1 If the minimum distance is less than left, the matching of the target text phrase is successful, if the minimum distance is more than or equal to right, the matching of the target text phrase is failed, and if the minimum distance is less than or equal to right, the matching of the target text phrase is manually judged;
3.2 The set proportion of the artificial participation degree is a sequence {0, k 1%,k2%,......,kt% }, m minimum distances of all marked data calculated in the step 2 are subjected to ascending order to obtain an ordered array a= { d 1,d2,......,di,......,dj,......,dm }, when the artificial proportion is k r%, a sliding window algorithm is used for the ordered array a, m x k r% is used as a window size, the current window is made to be (d i,dj), and j-i+1=m x k r%, and the determination mode of a determination interval [ left, right ] is as follows:
For each judging section, all data are subjected to matching result judgment by using the rule in the step 3.1), data needing manual judgment are filtered, and compared with marked data, and the accuracy of the current judging result is calculated; when the sliding window algorithm is used, the window is moved rightward by 1 unit each time, and a determination section where the accuracy of the determination result is maximized is found as an optimal determination section where the manual ratio is k r%.
Compared with the prior art, the invention has the following technical effects:
compared with the traditional method for calculating the pinyin similarity by only using the edit distance of the pinyin, the method for matching the children voice phrase based on the pinyin distance and the sliding window has the advantages that the pronunciation similarity of the initials and the finals is considered, the weight matrix of the edit distance between the initials and the finals is constructed, and the calculation mode of the pinyin distance is further optimized. Meanwhile, the judgment section is determined based on a large amount of data, and has statistical significance.
The invention considers the ambiguity of the pronunciation of the children and the redundancy of the speaking content, improves the accuracy of matching the target text phrases, more accurately judges the cognition level of the children, and has the practicability.
Drawings
Fig. 1 is a schematic flow chart of an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to specific examples and figures.
Examples
Referring to FIG. 1, the invention provides a child voice phrase matching method based on pinyin distance and a sliding window, which comprises the following steps:
Step 1: obtaining a transcription text { which is dental } of the child voice with fuzzy pronunciation through a voice recognition model, and giving a target text phrase { dental };
step 2: converting the target text phrase and the transcription text into corresponding pinyin sequences, searching a subsequence with the smallest pinyin distance with the target text phrase in the pinyin sequences corresponding to the transcription text by using a sliding window algorithm, and recording the minimum distance, wherein the method specifically comprises the following steps of:
2.1 Regardless of pinyin tones, converting the target text phrase and the transcribed text into corresponding pinyin sequences { ya, chi }, { zhe, shi, ya, ci }, respectively;
2.2 Using a sliding window algorithm, the window size is the same as the number of words of the target text phrase (the number of words is 2), the window slides 1 word to the right each time, the pinyin sequence { zhe, shi, ya, ci } corresponding to the transcribed text is traversed, the subsequence with the smallest pinyin distance to the target text phrase is found, the subsequence length = window size, and the minimum distance d min = min { d ({ ya, chi }, { zhe, shi }), d ({ ya, chi }, { shi, ya }), d ({ ya, chi }, { ya, ci }) is recorded, d is the pinyin distance. If a plurality of target text phrases exist, respectively performing the operation on each target text phrase to obtain a set with minimum distance, wherein the number of elements in the set is the number of the target text phrases, and finally searching a minimum value in the set to be used as the minimum distance between a transfer text and the plurality of target text phrases;
2.3 Distance between two pinyin sequences s= { ya, chi }, q= { ya, ci }, is:
d (S, Q) = [ d (ya, ya) +d (chi, ci) ]2 (d is pinyin distance)
For the pinyin chi and ci of two independent words, the chi and ci are split into the initial consonant parts ch and c and the final part i and i respectively, then the following are:
d(chi,ci)=d(ch,c)+d(i,i)
d (ch, c) =edit distance (ch, c) ×weight (ch, c)
Wherein the weight (ch, c) is designed manually according to the pronunciation similarity of ch, c, and the value is 0.5, d (ch, c) =1×0.5=0.5, and d (i, i) =0×1.0=0.
Step 3: calculating the minimum distance of all marked data in the step 1 by using the method in the step 2 to obtain a set of minimum distances, obtaining a judging section according to the set proportion of the artificial participation degree, and for each minimum distance, if the minimum distance is smaller than the left end point of the section, matching the target text phrase successfully, if the minimum distance is larger than or equal to the right end point of the section, matching the target text phrase fails, and if the minimum distance is within the section, namely comprises the left end point of the section but does not comprise the right end point of the section, manually judging whether the target text phrase is matched; according to the data labeling result, for each set manual participation proportion, a sliding window algorithm is used for searching a corresponding judgment interval when the accuracy reaches the maximum, and the method specifically comprises the following steps:
3.1 If the minimum distance is less than left, the matching of the target text phrase is successful, if the minimum distance is more than or equal to right, the matching of the target text phrase is failed, and if the minimum distance is less than or equal to right, the matching of the target text phrase is manually judged;
3.2 The set proportion of the artificial participation degree is the sequence {0,5%,10%,. The.4.4.50% }, the total m=5000 minimum distances of all marked data calculated in the step 2 are sorted in ascending order to obtain an ordered array a={d1,d2,......,di-1,di,di+1,......,dj-1,dj,dj+1,......,dm}={0,0,......,1.4,1.5,1.5,......1.9,1.9,1.9,......,4.0},, when the artificial proportion is 5%, a sliding window algorithm is used for the ordered array a, and when 5000×5% = 250 is used as the window size, i and j meet j-i+1=250, i and j exist, so that the window is (d i,dj) = (1.5,1.9), and the determination mode of the determination interval [ left, right) is as follows:
left=(1.4+1.5+1.5)÷3≈1.47
right=(1.9+1.9+1.9)÷3=1.9
for each judging section, all data are subjected to matching result judgment by using the rule in the step 3.1), data needing manual judgment are filtered, and compared with marked data, and the accuracy of the current judging result is calculated; when the sliding window algorithm is used, the window is moved rightward by 1 unit each time, and a determination section [1.5,1.9 ] at which the accuracy of the determination result is the maximum, that is, 89.29% is found as the optimal determination section at which the manual ratio is 5%.
The foregoing description is only of the preferred embodiments of the invention, and certain modifications may be made thereto without departing from the scope of the invention as defined in the following claims.
Claims (3)
1. A children's voice phrase matching method based on spelling distance and sliding window is characterized in that the method comprises the following steps:
Step 1: giving a target text phrase, collecting audio of the phrase of the child, obtaining a transcription text of the audio of the phrase of the child through a voice recognition model, and marking according to whether the content of the audio expression comprises the target text phrase;
Step 2: converting the target text phrase and the transcription text into corresponding pinyin sequences, searching a subsequence with the smallest pinyin distance with the target text phrase in the pinyin sequences corresponding to the transcription text by using a sliding window algorithm, and recording the minimum distance;
Step 3: calculating the minimum distance of all marked data in the step 1 by using the method in the step 2 to obtain a set of minimum distances, obtaining a judging section according to the set proportion of the artificial participation degree, and for each minimum distance, if the minimum distance is smaller than the left end point of the section, matching the target text phrase successfully, if the minimum distance is larger than or equal to the right end point of the section, matching the target text phrase fails, and if the minimum distance is within the section, namely comprises the left end point of the section but does not comprise the right end point of the section, manually judging whether the target text phrase is matched; and according to the data labeling result, for each set manual participation proportion, searching a corresponding judgment section when the accuracy reaches the maximum by using a sliding window algorithm.
2. The method for matching children's voice phrase according to claim 1, wherein the step 2 specifically comprises:
2.1 Regardless of pinyin tones, converting the target text phrase and the transcribed text into a corresponding pinyin sequence;
2.2 Using a sliding window algorithm, wherein the window size is the same as the word number of the target text phrases, sliding the window rightwards by 1 word each time, traversing the pinyin sequence corresponding to the transcription text, searching the subsequence with the smallest pinyin distance with the target text phrases, wherein the subsequence length=window size, recording the minimum distance, if a plurality of target text phrases exist, respectively operating each target text phrase to obtain a set with the minimum distance, wherein the number of elements in the set is the number of the target text phrases, and finally searching the minimum value in the set to be used as the minimum distance between the transcription text and the plurality of target text phrases;
2.3 For two pinyin sequences s= { S 1,s2,……,sn}、Q={q1,q2,……,qn }, there are:
d(S,Q)=[d(s1,q1)+d(s2,q2)+……+d(sn,qn)]÷n
d is the pinyin distance, and for pinyin s i、qi of two independent words, s i、qi is split into an initial part and a final part respectively, then there are:
d (s i,qi) =initial distance (s i,qi) +final distance (s i,qi)
Initial consonant distance (s i,qi) =initial consonant edit distance (s i,qi) ×initial weight (s i,qi)
Wherein the sound mother right value (s i,qi) is designed manually according to the similarity of the pronunciation of the initials of s i、qi, the weight range is 0.5 and 1.5, and the calculation mode of the vowel distance (s i,qi) is consistent with the distance of the initials.
3. The method for matching children's voice phrases according to claim 1, wherein the step 3 specifically comprises:
3.1 If the minimum distance is less than or equal to right, the matching of the target text phrase is successful, if the minimum distance is less than or equal to right, the matching of the target text phrase is failed, and if the minimum distance is less than or equal to right, the matching of the target text phrase is manually judged;
3.2 The set proportion of the artificial participation degree is a sequence {0, k 1%,k2%,……,kt% }, m minimum distances of all marked data calculated in the step 2 are subjected to ascending order to obtain an ordered array a= { d 1,d2,……,di,……,dj,……,dm }, when the artificial proportion is k r%, a sliding window algorithm is used for the ordered array a, m x k r% is used as a window size, the current window is made to be (d i,dj), and j-i+1=m x k r%, and the determination mode of a determination interval [ left, right ] is as follows:
For each judging section, all data are subjected to matching result judgment by using the rule in the step 3.1), data needing manual judgment are filtered, and compared with marked data, and the accuracy of the current judging result is calculated; when the sliding window algorithm is used, the window is moved rightward by 1 unit each time, and a determination section where the accuracy of the determination result is maximized is found as an optimal determination section where the manual ratio is k r%.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210292844.2A CN115223588B (en) | 2022-03-24 | 2022-03-24 | Child voice phrase matching method based on pinyin distance and sliding window |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210292844.2A CN115223588B (en) | 2022-03-24 | 2022-03-24 | Child voice phrase matching method based on pinyin distance and sliding window |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115223588A CN115223588A (en) | 2022-10-21 |
CN115223588B true CN115223588B (en) | 2024-08-13 |
Family
ID=83606923
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210292844.2A Active CN115223588B (en) | 2022-03-24 | 2022-03-24 | Child voice phrase matching method based on pinyin distance and sliding window |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115223588B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2003900584A0 (en) * | 2003-02-11 | 2003-02-27 | Telstra New Wave Pty Ltd | System for predicting speech recognition accuracy and development for a dialog system |
CN107967916A (en) * | 2016-10-20 | 2018-04-27 | 谷歌有限责任公司 | Determine voice relation |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9418152B2 (en) * | 2011-02-09 | 2016-08-16 | Nice-Systems Ltd. | System and method for flexible speech to text search mechanism |
WO2020050820A1 (en) * | 2018-09-04 | 2020-03-12 | Google Llc | Reading progress estimation based on phonetic fuzzy matching and confidence interval |
CN109256152A (en) * | 2018-11-08 | 2019-01-22 | 上海起作业信息科技有限公司 | Speech assessment method and device, electronic equipment, storage medium |
CN112149406B (en) * | 2020-09-25 | 2023-09-08 | 中国电子科技集团公司第十五研究所 | Chinese text error correction method and system |
CN112509609B (en) * | 2020-12-16 | 2022-06-10 | 北京乐学帮网络技术有限公司 | Audio processing method and device, electronic equipment and storage medium |
CN113486155B (en) * | 2021-07-28 | 2022-05-20 | 国际关系学院 | Chinese naming method fusing fixed phrase information |
-
2022
- 2022-03-24 CN CN202210292844.2A patent/CN115223588B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2003900584A0 (en) * | 2003-02-11 | 2003-02-27 | Telstra New Wave Pty Ltd | System for predicting speech recognition accuracy and development for a dialog system |
CN107967916A (en) * | 2016-10-20 | 2018-04-27 | 谷歌有限责任公司 | Determine voice relation |
Also Published As
Publication number | Publication date |
---|---|
CN115223588A (en) | 2022-10-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107741928B (en) | Method for correcting error of text after voice recognition based on domain recognition | |
EP1575029B1 (en) | Generating large units of graphonemes with mutual information criterion for letter to sound conversion | |
US6067520A (en) | System and method of recognizing continuous mandarin speech utilizing chinese hidden markou models | |
CN105957518A (en) | Mongolian large vocabulary continuous speech recognition method | |
Gao et al. | A study on robust detection of pronunciation erroneous tendency based on deep neural network. | |
JP4885160B2 (en) | Method of constructing module for identifying English variant pronunciation, and computer-readable recording medium storing program for realizing construction of said module | |
CN109979257B (en) | Method for performing accurate splitting operation correction based on English reading automatic scoring | |
CN111985234B (en) | Voice text error correction method | |
Ye et al. | An approach to mispronunciation detection and diagnosis with acoustic, phonetic and linguistic (apl) embeddings | |
Avram et al. | Towards a romanian end-to-end automatic speech recognition based on deepspeech2 | |
CN114386399A (en) | Text error correction method and device | |
US20240346950A1 (en) | Speaking practice system with redundant pronunciation correction | |
CN115223588B (en) | Child voice phrase matching method based on pinyin distance and sliding window | |
CN116090441B (en) | Chinese spelling error correction method integrating local semantic features and global semantic features | |
CN111046663A (en) | Intelligent correction method for Chinese form | |
CN111508522A (en) | Statement analysis processing method and system | |
Li et al. | Improving mandarin tone mispronunciation detection for non-native learners with soft-target tone labels and blstm-based deep models | |
CN115440193A (en) | Pronunciation evaluation scoring method based on deep learning | |
Evans et al. | Developing automatic speech recognition for Scottish Gaelic | |
Vythelingum et al. | Error detection of grapheme-to-phoneme conversion in text-to-speech synthesis using speech signal and lexical context | |
CN111429886B (en) | Voice recognition method and system | |
US11341961B2 (en) | Multi-lingual speech recognition and theme-semanteme analysis method and device | |
Chen et al. | Using Taigi dramas with Mandarin Chinese subtitles to improve Taigi speech recognition | |
Pellegrini et al. | Extension of the lectra corpus: classroom lecture transcriptions in european portuguese | |
Sertsi et al. | Hybrid input-type recurrent neural network language modeling for end-to-end speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |