CN117292680A - Voice recognition method for power transmission operation detection based on small sample synthesis - Google Patents
Voice recognition method for power transmission operation detection based on small sample synthesis Download PDFInfo
- Publication number
- CN117292680A CN117292680A CN202311215139.3A CN202311215139A CN117292680A CN 117292680 A CN117292680 A CN 117292680A CN 202311215139 A CN202311215139 A CN 202311215139A CN 117292680 A CN117292680 A CN 117292680A
- Authority
- CN
- China
- Prior art keywords
- transmission operation
- detection
- data
- professional
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000005540 biological transmission Effects 0.000 title claims abstract description 115
- 238000001514 detection method Methods 0.000 title claims abstract description 90
- 238000000034 method Methods 0.000 title claims abstract description 72
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 16
- 238000003786 synthesis reaction Methods 0.000 title claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 14
- 238000012360 testing method Methods 0.000 claims abstract description 13
- 230000008520 organization Effects 0.000 claims abstract description 8
- 238000012937 correction Methods 0.000 claims description 33
- 238000007689 inspection Methods 0.000 claims description 23
- 230000008569 process Effects 0.000 claims description 15
- 238000000354 decomposition reaction Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 3
- 230000036632 reaction speed Effects 0.000 claims description 2
- 230000007547 defect Effects 0.000 description 9
- 239000012212 insulator Substances 0.000 description 5
- 230000002159 abnormal effect Effects 0.000 description 3
- 230000033764 rhythmic process Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Abstract
The invention provides a voice recognition method for transmission and detection based on small sample synthesis, which comprises the steps of obtaining a plurality of transmission and detection professional text corpora, respectively establishing a transmission and detection professional language model and extracting a transmission and detection professional language organization structure according to the plurality of transmission and detection professional text corpora; generating a large amount of transmission operation and examination professional text data by using the transmission operation and examination professional language organization structure; extracting a small amount of data from the large amount of transmission operation and detection professional text data, and performing voice input by a plurality of transmission operation and detection personnel to establish acoustic models of the plurality of transmission operation and detection personnel; training the rest of the large-amount transmission operation detection professional text data after extracting a small amount of data from the large-amount transmission operation detection professional text data by utilizing the acoustic models of the plurality of transmission operation detection personnel to generate a corresponding voice recognition model, and recognizing test data by utilizing the voice recognition model; decoding the recognition result of the test data by using the established transmission operation detection professional language model to recognize a character result; and establishing a semantic recognition model by using the generated large amount of transmission operation and detection professional text data, and carrying out semantic recognition analysis on the text result recognized by decoding. And the professional voice data of the power transmission operation and detection is accurately identified.
Description
Technical Field
The invention relates to the technical field of power mobile inspection voice recognition, in particular to a voice recognition method for power transmission operation inspection based on small sample synthesis.
Background
From the current operation and inspection result recording method of the power transmission operation and inspection, more records of pictures, voices and texts are reserved on site, and the recorded results are systematically recorded after the inspection is finished. The voice data is usually identified by manual identification, so that the working efficiency is low and the repeatability is high. While the universal speech recognition model is often not accurate enough for recognition of specialized vocabulary and lines when recognizing operation-detected speech.
The Chinese patent No. 114822545A discloses a method for improving the speech recognition rate in the professional field, which is mainly used for recognizing the speech in the professional field or the specific industry. The professional field usually relates to a large number of professional terms and proper nouns with local characteristics are combined by each application department in the professional field, such as equipment names containing the names of the places, working section names and even person names of professionals, so that the voice recognition error rate is higher, a secondary difference frequency principle is put forward, a difference frequency special word stock is automatically established, a primary difference frequency sub-stock for storing the local special words and a secondary difference frequency sub-stock for storing the professional terms are included, the difference frequency special words are used as centers to match pinyin and characters, and an arbitrary position conversion mechanism is adopted. The accuracy of voice recognition is improved, and the special vocabulary of the local professional department can be identified. However, the disclosed voice recognition method in the professional field is not suitable for the existing inspection result recording condition in the power mobile inspection field and the field noisy environment of power transmission and operation inspection.
Therefore, the method for accurately recognizing the transmission operation detection voice is provided to solve the problems of low working efficiency and inaccurate recognition existing in the existing transmission operation detection voice recognition.
Disclosure of Invention
The invention aims to provide a method for accurately identifying transmission operation detection voice.
In order to solve the technical problems, the invention provides a method for recognizing voice of power transmission operation detection based on small sample synthesis, which comprises the following steps:
acquiring a plurality of transmission operation detection professional text corpora, respectively establishing a transmission operation detection professional language model and extracting a transmission operation detection professional language organization structure according to the plurality of transmission operation detection professional text corpora;
generating a large amount of transmission operation and examination professional text data by using the transmission operation and examination professional language organization structure;
extracting a small amount of data from the large amount of transmission operation and detection professional text data, and performing voice input by a plurality of transmission operation and detection personnel to establish acoustic models of the plurality of transmission operation and detection personnel;
training the rest of the large-amount transmission operation detection professional text data after extracting a small amount of data from the large-amount transmission operation detection professional text data by utilizing the acoustic models of the plurality of transmission operation detection personnel to generate a corresponding voice recognition model, and recognizing test data by utilizing the voice recognition model;
decoding the recognition result of the test data by using the established transmission operation detection professional language model to recognize a character result; and
and establishing a semantic recognition model by using the generated large amount of transmission operation and detection professional text data, and carrying out semantic recognition analysis on the text result obtained by decoding and recognition.
Further, the method further comprises judging whether the result of semantic recognition analysis of the decoded and recognized text result by the semantic recognition model is consistent with the decoded and recognized text result or not:
if yes, outputting the text result identified by decoding;
and if the text results do not accord with the text results, carrying out semantic correction on the text results identified by the decoding based on the result of the semantic identification analysis.
Further, the step of judging whether the result of the semantic recognition analysis of the text result recognized by the decoding by the semantic recognition model is consistent with the text result recognized by the decoding specifically includes the following steps:
extracting key fields from the generated large amount of transmission operation and detection professional text data, labeling and training to generate a corresponding voice recognition model;
carrying out semantic segmentation and keyword extraction on the text result of the voice recognition by the established semantic recognition model;
and carrying out similarity judgment on the extracted keywords and the key fields extracted from the large amount of transmission operation and inspection professional text data.
Further, in the process of performing similarity judgment on the extracted keywords and the key fields extracted from the large amount of transmission operation and detection professional text data, the similarity judgment satisfies the formula:
wherein N represents an N-th field, m represents m-th data in a field database, WER (N, m) is word error rate output of output words and reference text, S represents the number of words replaced, D represents the number of words deleted, I represents the number of words inserted, and N represents the number of words of the reference text.
Further, the method for performing semantic correction on the text result identified by decoding based on the result of the semantic recognition analysis further comprises setting a correction threshold value, and completing correction adjustment on field information conforming to a correction interval, so that a formula is satisfied:
output(n)=data(n,x)0<WER(n)<check_threhold
wherein check_threshold is the set correction threshold, and data (n, x) is the nth type of the xth data in the database, where WER (n, x) =wer_n, i.e. the correction method includes:
if the word error rate is 0, no correction is needed;
if the word error rate is greater than 0 and smaller than the correction threshold value, the field information with the highest similarity in the transmission operation detection professional text database is taken out for correction adjustment;
if the word error rate is greater than the correction threshold, information extraction errors exist and cannot be corrected.
Further, the method includes the steps of extracting a small amount of data from the large amount of transmission operation and detection professional text data, performing voice input by a plurality of transmission operation and detection personnel to establish an acoustic model of the plurality of transmission operation and detection personnel, and fusing noise data of a simulated transmission operation and detection site with the voice input of the plurality of transmission operation and detection personnel to generate more complete and more real transmission operation and detection voice data.
Preferably, the decoding method in the voice recognition process is a greedy search method and satisfies the formula:
score_greedy(t)=max(P_t)
where score_greedy (t) represents the fraction at time t, p_t represents the probability distribution of the model output at time t, and max operation represents the probability of choosing the maximum.
Further, the decoding method in the speech recognition process further comprises a set probability threshold value, and the output probability in the decoding method is compared with the set probability threshold value:
if the output probability is greater than or equal to the set probability threshold, the probability threshold is the output of greedy decoding;
if the output probability is smaller than the set probability threshold, outputting a result by using a decoding method of the bundle search, wherein the bundle search satisfies the formula:
wherein, thredhold is the judgment setting threshold value,
if the greedy output of the score_greddy is greater than or equal to the set threshold, the decoding result output is a greedy decoding output sequence;
if any one step score_greddy (i) is smaller than the set threshold, the decoding result output is the decoding output sequence beam_sreach_output of the bundle search.
Further, the beam_sreach_output output formula:
Beam_sreach_output=max(y ∧ (i)),i=1,…,k
wherein in the bundle search decoding process, k partial solutions are generated, each of which is a sequence y ∧ (i),i=1,…,k 0 Calculating the score of each partial solution, wherein the calculation formula of the score satisfies the following conditions:
score(y ∧ (i))=sum(logP(y_t(i)∣y_1(i),…,y_(t-1)(i)))
wherein y_t (i) is an element representing the partial decomposition i at time t, y _ 1 (i), …, y_ (t-1) (i) represents the elements of the partial decomposition i at time steps 1 to t-1.
Further, the cluster search is performed for the voice data with low score of the greedy search, and then judgment is performed, so that dynamic balance of accuracy and reaction speed in the decoding process is realized.
Compared with the prior art, the invention has the following beneficial effects:
according to the invention, a small amount of data is extracted from a large amount of transmission operation and detection professional text data, and voice input is carried out by different transmission operation and detection personnel to establish acoustic models of the different transmission operation and detection personnel, so that decoding, semantic analysis and correction are further carried out, a transmission operation and detection voice recognition method based on small sample synthesis is realized, and the voice recognition is accurate and high in recognition efficiency.
Drawings
FIG. 1 is a flow chart of the present invention;
Detailed Description
In order to make the technical solutions and technical effects of the present invention more clear, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments.
The invention provides a method for recognizing voice of power transmission operation detection based on small sample synthesis, which comprises the following steps:
acquiring a plurality of transmission operation detection professional text corpora, respectively establishing a transmission operation detection professional language model and extracting a transmission operation detection professional language organization structure according to the plurality of transmission operation detection professional text corpora: the method specifically comprises the steps of obtaining text corpus information such as names, voltage levels, defect positions, defect parts, defect basis content and the like of the power transmission lines in the operation and inspection area.
The following is a corpus example of a section of defect content of power transmission operation inspection:
"insulator broken";
training the collected text corpus by using a kenlm training tool to generate an N-gram language model for a decoding stage of speech recognition, wherein a model estimation probability formula is as follows:
wherein the probability of predicting the current word (nth) given all of the history information is equivalent to the probability of predicting the current word given the first N-1 words; the method also comprises the step of optimizing the N-gram language model training by adopting a corrected Kneser-Ney smoothing method, so that the problems of incomplete corpus and the like are avoided.
And generating a large amount of transmission operation and examination professional text data by using the transmission operation and examination professional language organization structure.
The following is an example of integrated text data:
the "channel inspection 220kV nest line 28 number tower medium phase insulator breakage", wherein "channel inspection" is inspection type, "220kV" is voltage grade, "nest line" is line name, "28 number tower" is tower number, "medium phase" is defect position, "insulator breakage" is defect content in the text data of this example.
And carrying out data preprocessing on the transmission operation inspection professional text data, converting an original text into phonemes, segmenting Chinese and English, and simultaneously checking whether illegal data exist.
Extracting a small amount of data from a large amount of transmission operation and detection professional text data, and performing voice recording by a plurality of transmission operation and detection personnel to establish acoustic models of the plurality of transmission operation and detection personnel, wherein an MFA tool is utilized to align audio frequency with the transmission operation and detection professional text data and complete time length segmentation, and simultaneously complete independent modeling of time length, fundamental frequency and energy (the time length of a phoneme directly influences pronunciation length and integral rhythm, the fundamental frequency is another characteristic influencing emotion and rhythm, the energy directly influences the amplitude of a frequency spectrum and directly influences the volume of the audio), and the encoded time length information is used as input of a voice attribute modeling network structure; the output of the duration prediction is also taken as the input of the fundamental frequency and energy prediction, and finally the output of the duration prediction is added together with the predicted output of the fundamental frequency and energy, and is taken as the input of a downstream network, and the formula is as follows:
x=x+pitch_embedding+energy_embedding
wherein x is the output of the encoder after the time length information is unfolded, pitch_embedding is the fundamental frequency embedded vector, and energy_embedding is the energy embedded vector; the output information is converted into mel frequency spectrum by the decoder and sent to the vocoder to generate audio data.
Training the residual large-amount power transmission operation inspection professional text data after extracting a small amount of data from the large-amount power transmission operation inspection professional text data by utilizing the acoustic models of the plurality of power transmission operation inspection personnel to generate a corresponding voice recognition model, and recognizing test data by utilizing the voice recognition model: training the operation detection voice data by using an end-to-end model method to generate a corresponding voice recognition model; the speech recognition model generated by training adopts a pretrain+finetune fine tuning method, so that the model training efficiency can be improved, and the method is suitable for the conditions of small data volume and more training models.
And decoding the identification result of the test data by using the established transmission operation detection professional language model to identify a text result, wherein the method further comprises the step of adding criterion setting in the decoding process of the transmission operation detection professional language model, and adaptively selecting a more proper and efficient decoding method according to the criterion result.
And establishing a semantic recognition model by using the generated large amount of transmission operation and detection professional text data, and carrying out semantic recognition analysis on the decoded and recognized text result.
Judging whether the result of semantic recognition analysis of the text result recognized by the decoding by the semantic recognition model is consistent with the text result recognized by the decoding or not:
if yes, outputting the text result identified by decoding;
and if the text results do not accord with the text results, carrying out semantic correction on the text results identified by the decoding based on the result of the semantic identification analysis.
Judging whether the result of semantic recognition analysis on the decoded and recognized text result by the semantic recognition model is consistent with the decoded and recognized text result or not, specifically comprising the following steps:
and extracting key fields from the generated large amount of transmission operation and detection professional text data, labeling and training to generate a corresponding voice recognition model.
The following is an example of text data extracted from a key field:
the phase (defect position) insulator (defect part) in the channel inspection (inspection type) 220kV (voltage class) nest line (line name) No. 28 tower (tower number) is damaged (defect content).
And carrying out semantic segmentation and keyword extraction on the text result of the voice recognition by the established semantic recognition model.
And (3) performing similarity judgment on the extracted keywords and the extracted key fields in a large amount of transmission operation and inspection professional text data, removing extracted abnormal text field information, and reorganizing corrected field information to generate corrected voice recognition text.
The following is an example of an abnormal piece of text data:
the channel inspection bridge east line o is not damaged to a phase insulator in a 220kV nest line No. 28 tower, the example is a misread condition, and the bridge east line o is not paired and marked as an abnormal field label.
Wherein, the similarity judgment satisfies the formula:
wherein N represents an N-th field, m represents m-th data in a field database, WER (N, m) is word error rate output of output words and reference text, S represents the number of words replaced, D represents the number of words deleted, I represents the number of words inserted, and N represents the number of words of the reference text.
The method for carrying out semantic correction on the decoded and recognized text result based on the result of semantic recognition analysis further comprises the steps of setting a correction threshold value, and completing correction adjustment on field information conforming to a correction interval so as to satisfy the formula:
output(n)=data(n,x)0<WER(n)<check_threhold
wherein check_threshold is a set correction threshold, and data (n, x) is the nth type of the xth data in the database, where WER (n, x) =wer_n, i.e. the correction method includes:
if the word error rate is 0, no correction is needed;
if the word error rate is greater than 0 and smaller than the correction threshold value, extracting field information with highest similarity in the transmission operation detection professional text database, and correcting and adjusting;
if the word error rate is greater than the correction threshold, there is an information extraction error, which cannot be corrected.
Further, a small amount of data is extracted from a large amount of transmission operation and detection professional text data, and voice input is carried out by a plurality of transmission operation and detection personnel to establish an acoustic model of the plurality of transmission operation and detection personnel, wherein the method further comprises the step of fusing noise data of a simulated transmission operation and detection site with voice input of the plurality of transmission operation and detection personnel to generate more complete and more real transmission operation and detection voice data, and the model robustness is improved.
The decoding method in the speech recognition process is preferably a greedy search method and satisfies the formula:
score_greedy(t)=max(P_t)
where score_greedy (t) represents the fraction at time t, p_t represents the probability distribution of the model output at time t, and max operation represents the probability of choosing the maximum.
The decoding method in the speech recognition process further comprises a set probability threshold value, and the output probability in the decoding method is compared with the set probability threshold value:
if the output probability is greater than or equal to a set probability threshold, the probability threshold is the output of greedy decoding;
if the output probability is smaller than the set probability threshold, outputting a result by using a decoding method of the bundle search, wherein the bundle search satisfies the formula:
wherein, threshold is determined and set,
if the greedy output of all time steps score_greddy is greater than or equal to a set threshold, outputting a decoding result output as a greedy decoding output sequence;
if any step score_greddy (i) is smaller than the set threshold, the final decoding result output is the decoding output sequence beam_sreach_output of the bundle search. The beam_sreach_output output formula is as follows:
Beam_sreach_output=max(y ∧ (i)),i=1,…,k
wherein in the bundle search decoding process, k partial solutions are generated, each of which is a sequence y ∧ (i),i=1,…,k 0 The score of each partial solution is calculated by multiplying the probability of each element in the partial solution and taking the logarithm, and the calculation formula of the score satisfies the following conditions:
score(y ∧ (i))=sum(logP(y_t(i)∣y_1(i),…,y_(t-1)(i)))
where y_t (i) is an element representing the partial decomposition i at time t, y_1 (i), …, y_ (t-1) (i) represents an element of the partial decomposition i at time steps 1 to t-1.
The accuracy of the cluster search is greater than that of the greedy search, but the time efficiency is longer, so that the cluster search is carried out for voice data with low greedy search score and then the judgment is carried out, and the dynamic balance of the accuracy and the response speed in the decoding process is realized.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (10)
1. A method for voice recognition of power transmission operation based on small sample synthesis, the method comprising:
acquiring a plurality of transmission operation detection professional text corpora, respectively establishing a transmission operation detection professional language model and extracting a transmission operation detection professional language organization structure according to the plurality of transmission operation detection professional text corpora;
generating a large amount of transmission operation and examination professional text data by using the transmission operation and examination professional language organization structure;
extracting a small amount of data from the large amount of transmission operation and detection professional text data, and performing voice input by a plurality of transmission operation and detection personnel to establish acoustic models of the plurality of transmission operation and detection personnel;
training the rest of the large-amount transmission operation detection professional text data after extracting a small amount of data from the large-amount transmission operation detection professional text data by utilizing the acoustic models of the plurality of transmission operation detection personnel to generate a corresponding voice recognition model, and recognizing test data by utilizing the voice recognition model;
decoding the recognition result of the test data by using the established transmission operation detection professional language model to recognize a character result; and
and establishing a semantic recognition model by using the generated large amount of transmission operation and detection professional text data, and carrying out semantic recognition analysis on the text result obtained by decoding and recognition.
2. The method of claim 1, further comprising determining whether the result of the semantic recognition analysis of the decoded and recognized word result by the semantic recognition model corresponds to the decoded and recognized word result:
if yes, outputting the text result identified by decoding;
and if the text results do not accord with the text results, carrying out semantic correction on the text results identified by the decoding based on the result of the semantic identification analysis.
3. The method for recognizing voice of electric transmission operation and test based on small sample synthesis according to claim 2, wherein the step of judging whether the result of the semantic recognition analysis of the decoded and recognized text result by the semantic recognition model is consistent with the decoded and recognized text result comprises the following steps:
extracting key fields from the generated large amount of transmission operation and detection professional text data, labeling and training to generate a corresponding voice recognition model;
carrying out semantic segmentation and keyword extraction on the text result of the voice recognition by the established semantic recognition model;
and carrying out similarity judgment on the extracted keywords and the key fields extracted from the large amount of transmission operation and inspection professional text data.
4. The method for voice recognition of electric transmission and detection based on small sample synthesis according to claim 3, wherein in the process of performing similarity judgment on the extracted keywords and the extracted keywords in the large amount of electric transmission and detection professional text data, the similarity judgment satisfies the formula:
wherein N represents an N-th field, m represents m-th data in the field database, VER (N, m) is word error rate output of output words and reference text, S represents the number of words replaced, D represents the number of words deleted, I represents the number of words inserted, and N represents the number of words of the reference text.
5. The method for voice recognition of electric transmission and detection based on small sample synthesis according to claim 4, wherein the method for performing semantic correction on the decoded and recognized text result based on the result of the semantic recognition analysis further comprises setting a correction threshold, and performing correction adjustment on field information conforming to a correction interval, so as to satisfy a formula:
output(n)=data(n,x)0<WER(n)<check_threhold
wherein check_threshold is the set correction threshold, and data (n, x) is the nth type of the xth data in the database, where WER (n, x) =ver_n, i.e. the correction method includes:
if the word error rate is 0, no correction is needed;
if the word error rate is greater than 0 and smaller than the correction threshold value, the field information with the highest similarity in the transmission operation detection professional text database is taken out for correction adjustment;
if the word error rate is greater than the correction threshold, information extraction errors exist and cannot be corrected.
6. The method for voice recognition of power transmission tests based on small sample synthesis according to claim 1, wherein,
the method comprises the steps of extracting a small amount of data from a large amount of transmission operation and detection professional text data, and performing voice input by a plurality of transmission operation and detection personnel to establish an acoustic model of the plurality of transmission operation and detection personnel, wherein the method further comprises the step of fusing noise data of a simulated transmission operation and detection site with the voice input of the plurality of transmission operation and detection personnel to generate more complete and more real transmission operation and detection voice data.
7. The method for voice recognition of electric transmission operation and test based on small sample synthesis according to claim 1, wherein the decoding method in the voice recognition process is a greedy search method and satisfies the formula:
score_greedy(t)=max(P_t)
where score_greedy (t) represents the fraction at time t, p_t represents the probability distribution of the model output at time t, and max operation represents the probability of choosing the maximum.
8. The method for voice recognition of power transmission tests based on small sample synthesis of claim 7, wherein the decoding method in the voice recognition process further comprises a set probability threshold, and the output probability in the decoding method is compared with the set probability threshold:
if the output probability is greater than or equal to the set probability threshold, the probability threshold is the output of greedy decoding;
if the output probability is smaller than the set probability threshold, outputting a result by using a decoding method of the bundle search, wherein the bundle search satisfies the formula:
wherein, threshold is determined and set,
if the greedy output of the score_greedy in all time steps is greater than or equal to the set threshold, outputting the decoding result to be a greedy decoding output sequence;
if any one step of score_greedy (i) is smaller than the set threshold, the decoding result output is the decoding output sequence beam_sreach_output of the bundle search.
9. The method for voice recognition of power transmission tests based on small sample synthesis of claim 8, wherein the beam_sreach_output output formula:
Beam_sreach_output=max(y ∧ (i)),i=1,…,k
wherein in the bundle search decoding process, k partial solutions are generated, each of which is a sequence y ∧ (i),i=1,…,k 0 Calculating the score of each partial solution, wherein the calculation formula of the score satisfies the following conditions:
score(y ∧ (i))=sum(logP(y_t(i)∣y_1(i),…,y_(t-1)(i)))
where y_t (i) is an element representing the partial decomposition i at time t, y_1 (i), …, y_ (t-1) (i) represents an element of the partial decomposition i at time steps 1 to t-1.
10. The method for voice recognition of electric transmission and detection based on small sample synthesis according to claim 9, wherein the cluster search is performed for voice data with low score of the greedy search and then the cluster search is judged to realize dynamic balance of accuracy and reaction speed in the decoding process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311215139.3A CN117292680A (en) | 2023-09-19 | 2023-09-19 | Voice recognition method for power transmission operation detection based on small sample synthesis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311215139.3A CN117292680A (en) | 2023-09-19 | 2023-09-19 | Voice recognition method for power transmission operation detection based on small sample synthesis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117292680A true CN117292680A (en) | 2023-12-26 |
Family
ID=89251055
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311215139.3A Pending CN117292680A (en) | 2023-09-19 | 2023-09-19 | Voice recognition method for power transmission operation detection based on small sample synthesis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117292680A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117611393A (en) * | 2024-01-24 | 2024-02-27 | 国网安徽省电力有限公司合肥供电公司 | Big data-based anti-electricity-stealing data acquisition method |
-
2023
- 2023-09-19 CN CN202311215139.3A patent/CN117292680A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117611393A (en) * | 2024-01-24 | 2024-02-27 | 国网安徽省电力有限公司合肥供电公司 | Big data-based anti-electricity-stealing data acquisition method |
CN117611393B (en) * | 2024-01-24 | 2024-04-05 | 国网安徽省电力有限公司合肥供电公司 | Big data-based anti-electricity-stealing data acquisition method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8185376B2 (en) | Identifying language origin of words | |
CN106297800B (en) | Self-adaptive voice recognition method and equipment | |
US20060064177A1 (en) | System and method for measuring confusion among words in an adaptive speech recognition system | |
CN111640418B (en) | Prosodic phrase identification method and device and electronic equipment | |
CN112397054B (en) | Power dispatching voice recognition method | |
CN112397056B (en) | Voice evaluation method and computer storage medium | |
CN112259083B (en) | Audio processing method and device | |
CN112466279B (en) | Automatic correction method and device for spoken English pronunciation | |
CN111369974A (en) | Dialect pronunciation labeling method, language identification method and related device | |
CN112580340A (en) | Word-by-word lyric generating method and device, storage medium and electronic equipment | |
CN117292680A (en) | Voice recognition method for power transmission operation detection based on small sample synthesis | |
Shivakumar et al. | Kannada speech to text conversion using CMU Sphinx | |
CN113450757A (en) | Speech synthesis method, speech synthesis device, electronic equipment and computer-readable storage medium | |
US20050197838A1 (en) | Method for text-to-pronunciation conversion capable of increasing the accuracy by re-scoring graphemes likely to be tagged erroneously | |
Guillaume et al. | Plugging a neural phoneme recognizer into a simple language model: a workflow for low-resource settings | |
CN113822052A (en) | Text error detection method and device, electronic equipment and storage medium | |
Yoon et al. | Off-Topic Spoken Response Detection with Word Embeddings. | |
Larabi-Marie-Sainte et al. | A new framework for Arabic recitation using speech recognition and the Jaro Winkler algorithm | |
KR20130126570A (en) | Apparatus for discriminative training acoustic model considering error of phonemes in keyword and computer recordable medium storing the method thereof | |
JP3875357B2 (en) | Word / collocation classification processing method, collocation extraction method, word / collocation classification processing device, speech recognition device, machine translation device, collocation extraction device, and word / collocation storage medium | |
Decadt et al. | Transcription of out-of-vocabulary words in large vocabulary speech recognition based on phoneme-to-grapheme conversion | |
JP2010277036A (en) | Speech data retrieval device | |
CN115527551A (en) | Voice annotation quality evaluation method and device, electronic equipment and storage medium | |
CN112908359A (en) | Voice evaluation method and device, electronic equipment and computer readable medium | |
Ma et al. | Low-frequency word enhancement with similar pairs in speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |