CN117292680A - Voice recognition method for power transmission operation detection based on small sample synthesis - Google Patents

Voice recognition method for power transmission operation detection based on small sample synthesis Download PDF

Info

Publication number
CN117292680A
CN117292680A CN202311215139.3A CN202311215139A CN117292680A CN 117292680 A CN117292680 A CN 117292680A CN 202311215139 A CN202311215139 A CN 202311215139A CN 117292680 A CN117292680 A CN 117292680A
Authority
CN
China
Prior art keywords
transmission operation
detection
data
professional
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311215139.3A
Other languages
Chinese (zh)
Inventor
宋秦风
王金志
范跃祖
李承斌
王淮海
董瑞
王佳佳
徐南锦
田维斌
孙瑞丽
崔垚
陆锦焱
程益伟
邓哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Power Supply Co of State Grid Anhui Electric Power Co Ltd
Feidong Power Supply Co of State Grid Anhui Electric Power Co Ltd
Original Assignee
Hefei Power Supply Co of State Grid Anhui Electric Power Co Ltd
Feidong Power Supply Co of State Grid Anhui Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Power Supply Co of State Grid Anhui Electric Power Co Ltd, Feidong Power Supply Co of State Grid Anhui Electric Power Co Ltd filed Critical Hefei Power Supply Co of State Grid Anhui Electric Power Co Ltd
Priority to CN202311215139.3A priority Critical patent/CN117292680A/en
Publication of CN117292680A publication Critical patent/CN117292680A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention provides a voice recognition method for transmission and detection based on small sample synthesis, which comprises the steps of obtaining a plurality of transmission and detection professional text corpora, respectively establishing a transmission and detection professional language model and extracting a transmission and detection professional language organization structure according to the plurality of transmission and detection professional text corpora; generating a large amount of transmission operation and examination professional text data by using the transmission operation and examination professional language organization structure; extracting a small amount of data from the large amount of transmission operation and detection professional text data, and performing voice input by a plurality of transmission operation and detection personnel to establish acoustic models of the plurality of transmission operation and detection personnel; training the rest of the large-amount transmission operation detection professional text data after extracting a small amount of data from the large-amount transmission operation detection professional text data by utilizing the acoustic models of the plurality of transmission operation detection personnel to generate a corresponding voice recognition model, and recognizing test data by utilizing the voice recognition model; decoding the recognition result of the test data by using the established transmission operation detection professional language model to recognize a character result; and establishing a semantic recognition model by using the generated large amount of transmission operation and detection professional text data, and carrying out semantic recognition analysis on the text result recognized by decoding. And the professional voice data of the power transmission operation and detection is accurately identified.

Description

Voice recognition method for power transmission operation detection based on small sample synthesis
Technical Field
The invention relates to the technical field of power mobile inspection voice recognition, in particular to a voice recognition method for power transmission operation inspection based on small sample synthesis.
Background
From the current operation and inspection result recording method of the power transmission operation and inspection, more records of pictures, voices and texts are reserved on site, and the recorded results are systematically recorded after the inspection is finished. The voice data is usually identified by manual identification, so that the working efficiency is low and the repeatability is high. While the universal speech recognition model is often not accurate enough for recognition of specialized vocabulary and lines when recognizing operation-detected speech.
The Chinese patent No. 114822545A discloses a method for improving the speech recognition rate in the professional field, which is mainly used for recognizing the speech in the professional field or the specific industry. The professional field usually relates to a large number of professional terms and proper nouns with local characteristics are combined by each application department in the professional field, such as equipment names containing the names of the places, working section names and even person names of professionals, so that the voice recognition error rate is higher, a secondary difference frequency principle is put forward, a difference frequency special word stock is automatically established, a primary difference frequency sub-stock for storing the local special words and a secondary difference frequency sub-stock for storing the professional terms are included, the difference frequency special words are used as centers to match pinyin and characters, and an arbitrary position conversion mechanism is adopted. The accuracy of voice recognition is improved, and the special vocabulary of the local professional department can be identified. However, the disclosed voice recognition method in the professional field is not suitable for the existing inspection result recording condition in the power mobile inspection field and the field noisy environment of power transmission and operation inspection.
Therefore, the method for accurately recognizing the transmission operation detection voice is provided to solve the problems of low working efficiency and inaccurate recognition existing in the existing transmission operation detection voice recognition.
Disclosure of Invention
The invention aims to provide a method for accurately identifying transmission operation detection voice.
In order to solve the technical problems, the invention provides a method for recognizing voice of power transmission operation detection based on small sample synthesis, which comprises the following steps:
acquiring a plurality of transmission operation detection professional text corpora, respectively establishing a transmission operation detection professional language model and extracting a transmission operation detection professional language organization structure according to the plurality of transmission operation detection professional text corpora;
generating a large amount of transmission operation and examination professional text data by using the transmission operation and examination professional language organization structure;
extracting a small amount of data from the large amount of transmission operation and detection professional text data, and performing voice input by a plurality of transmission operation and detection personnel to establish acoustic models of the plurality of transmission operation and detection personnel;
training the rest of the large-amount transmission operation detection professional text data after extracting a small amount of data from the large-amount transmission operation detection professional text data by utilizing the acoustic models of the plurality of transmission operation detection personnel to generate a corresponding voice recognition model, and recognizing test data by utilizing the voice recognition model;
decoding the recognition result of the test data by using the established transmission operation detection professional language model to recognize a character result; and
and establishing a semantic recognition model by using the generated large amount of transmission operation and detection professional text data, and carrying out semantic recognition analysis on the text result obtained by decoding and recognition.
Further, the method further comprises judging whether the result of semantic recognition analysis of the decoded and recognized text result by the semantic recognition model is consistent with the decoded and recognized text result or not:
if yes, outputting the text result identified by decoding;
and if the text results do not accord with the text results, carrying out semantic correction on the text results identified by the decoding based on the result of the semantic identification analysis.
Further, the step of judging whether the result of the semantic recognition analysis of the text result recognized by the decoding by the semantic recognition model is consistent with the text result recognized by the decoding specifically includes the following steps:
extracting key fields from the generated large amount of transmission operation and detection professional text data, labeling and training to generate a corresponding voice recognition model;
carrying out semantic segmentation and keyword extraction on the text result of the voice recognition by the established semantic recognition model;
and carrying out similarity judgment on the extracted keywords and the key fields extracted from the large amount of transmission operation and inspection professional text data.
Further, in the process of performing similarity judgment on the extracted keywords and the key fields extracted from the large amount of transmission operation and detection professional text data, the similarity judgment satisfies the formula:
wherein N represents an N-th field, m represents m-th data in a field database, WER (N, m) is word error rate output of output words and reference text, S represents the number of words replaced, D represents the number of words deleted, I represents the number of words inserted, and N represents the number of words of the reference text.
Further, the method for performing semantic correction on the text result identified by decoding based on the result of the semantic recognition analysis further comprises setting a correction threshold value, and completing correction adjustment on field information conforming to a correction interval, so that a formula is satisfied:
output(n)=data(n,x)0<WER(n)<check_threhold
wherein check_threshold is the set correction threshold, and data (n, x) is the nth type of the xth data in the database, where WER (n, x) =wer_n, i.e. the correction method includes:
if the word error rate is 0, no correction is needed;
if the word error rate is greater than 0 and smaller than the correction threshold value, the field information with the highest similarity in the transmission operation detection professional text database is taken out for correction adjustment;
if the word error rate is greater than the correction threshold, information extraction errors exist and cannot be corrected.
Further, the method includes the steps of extracting a small amount of data from the large amount of transmission operation and detection professional text data, performing voice input by a plurality of transmission operation and detection personnel to establish an acoustic model of the plurality of transmission operation and detection personnel, and fusing noise data of a simulated transmission operation and detection site with the voice input of the plurality of transmission operation and detection personnel to generate more complete and more real transmission operation and detection voice data.
Preferably, the decoding method in the voice recognition process is a greedy search method and satisfies the formula:
score_greedy(t)=max(P_t)
where score_greedy (t) represents the fraction at time t, p_t represents the probability distribution of the model output at time t, and max operation represents the probability of choosing the maximum.
Further, the decoding method in the speech recognition process further comprises a set probability threshold value, and the output probability in the decoding method is compared with the set probability threshold value:
if the output probability is greater than or equal to the set probability threshold, the probability threshold is the output of greedy decoding;
if the output probability is smaller than the set probability threshold, outputting a result by using a decoding method of the bundle search, wherein the bundle search satisfies the formula:
wherein, thredhold is the judgment setting threshold value,
if the greedy output of the score_greddy is greater than or equal to the set threshold, the decoding result output is a greedy decoding output sequence;
if any one step score_greddy (i) is smaller than the set threshold, the decoding result output is the decoding output sequence beam_sreach_output of the bundle search.
Further, the beam_sreach_output output formula:
Beam_sreach_output=max(y (i)),i=1,…,k
wherein in the bundle search decoding process, k partial solutions are generated, each of which is a sequence y (i),i=1,…,k 0 Calculating the score of each partial solution, wherein the calculation formula of the score satisfies the following conditions:
score(y (i))=sum(logP(y_t(i)∣y_1(i),…,y_(t-1)(i)))
wherein y_t (i) is an element representing the partial decomposition i at time t, y _ 1 (i), …, y_ (t-1) (i) represents the elements of the partial decomposition i at time steps 1 to t-1.
Further, the cluster search is performed for the voice data with low score of the greedy search, and then judgment is performed, so that dynamic balance of accuracy and reaction speed in the decoding process is realized.
Compared with the prior art, the invention has the following beneficial effects:
according to the invention, a small amount of data is extracted from a large amount of transmission operation and detection professional text data, and voice input is carried out by different transmission operation and detection personnel to establish acoustic models of the different transmission operation and detection personnel, so that decoding, semantic analysis and correction are further carried out, a transmission operation and detection voice recognition method based on small sample synthesis is realized, and the voice recognition is accurate and high in recognition efficiency.
Drawings
FIG. 1 is a flow chart of the present invention;
Detailed Description
In order to make the technical solutions and technical effects of the present invention more clear, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments.
The invention provides a method for recognizing voice of power transmission operation detection based on small sample synthesis, which comprises the following steps:
acquiring a plurality of transmission operation detection professional text corpora, respectively establishing a transmission operation detection professional language model and extracting a transmission operation detection professional language organization structure according to the plurality of transmission operation detection professional text corpora: the method specifically comprises the steps of obtaining text corpus information such as names, voltage levels, defect positions, defect parts, defect basis content and the like of the power transmission lines in the operation and inspection area.
The following is a corpus example of a section of defect content of power transmission operation inspection:
"insulator broken";
training the collected text corpus by using a kenlm training tool to generate an N-gram language model for a decoding stage of speech recognition, wherein a model estimation probability formula is as follows:
wherein the probability of predicting the current word (nth) given all of the history information is equivalent to the probability of predicting the current word given the first N-1 words; the method also comprises the step of optimizing the N-gram language model training by adopting a corrected Kneser-Ney smoothing method, so that the problems of incomplete corpus and the like are avoided.
And generating a large amount of transmission operation and examination professional text data by using the transmission operation and examination professional language organization structure.
The following is an example of integrated text data:
the "channel inspection 220kV nest line 28 number tower medium phase insulator breakage", wherein "channel inspection" is inspection type, "220kV" is voltage grade, "nest line" is line name, "28 number tower" is tower number, "medium phase" is defect position, "insulator breakage" is defect content in the text data of this example.
And carrying out data preprocessing on the transmission operation inspection professional text data, converting an original text into phonemes, segmenting Chinese and English, and simultaneously checking whether illegal data exist.
Extracting a small amount of data from a large amount of transmission operation and detection professional text data, and performing voice recording by a plurality of transmission operation and detection personnel to establish acoustic models of the plurality of transmission operation and detection personnel, wherein an MFA tool is utilized to align audio frequency with the transmission operation and detection professional text data and complete time length segmentation, and simultaneously complete independent modeling of time length, fundamental frequency and energy (the time length of a phoneme directly influences pronunciation length and integral rhythm, the fundamental frequency is another characteristic influencing emotion and rhythm, the energy directly influences the amplitude of a frequency spectrum and directly influences the volume of the audio), and the encoded time length information is used as input of a voice attribute modeling network structure; the output of the duration prediction is also taken as the input of the fundamental frequency and energy prediction, and finally the output of the duration prediction is added together with the predicted output of the fundamental frequency and energy, and is taken as the input of a downstream network, and the formula is as follows:
x=x+pitch_embedding+energy_embedding
wherein x is the output of the encoder after the time length information is unfolded, pitch_embedding is the fundamental frequency embedded vector, and energy_embedding is the energy embedded vector; the output information is converted into mel frequency spectrum by the decoder and sent to the vocoder to generate audio data.
Training the residual large-amount power transmission operation inspection professional text data after extracting a small amount of data from the large-amount power transmission operation inspection professional text data by utilizing the acoustic models of the plurality of power transmission operation inspection personnel to generate a corresponding voice recognition model, and recognizing test data by utilizing the voice recognition model: training the operation detection voice data by using an end-to-end model method to generate a corresponding voice recognition model; the speech recognition model generated by training adopts a pretrain+finetune fine tuning method, so that the model training efficiency can be improved, and the method is suitable for the conditions of small data volume and more training models.
And decoding the identification result of the test data by using the established transmission operation detection professional language model to identify a text result, wherein the method further comprises the step of adding criterion setting in the decoding process of the transmission operation detection professional language model, and adaptively selecting a more proper and efficient decoding method according to the criterion result.
And establishing a semantic recognition model by using the generated large amount of transmission operation and detection professional text data, and carrying out semantic recognition analysis on the decoded and recognized text result.
Judging whether the result of semantic recognition analysis of the text result recognized by the decoding by the semantic recognition model is consistent with the text result recognized by the decoding or not:
if yes, outputting the text result identified by decoding;
and if the text results do not accord with the text results, carrying out semantic correction on the text results identified by the decoding based on the result of the semantic identification analysis.
Judging whether the result of semantic recognition analysis on the decoded and recognized text result by the semantic recognition model is consistent with the decoded and recognized text result or not, specifically comprising the following steps:
and extracting key fields from the generated large amount of transmission operation and detection professional text data, labeling and training to generate a corresponding voice recognition model.
The following is an example of text data extracted from a key field:
the phase (defect position) insulator (defect part) in the channel inspection (inspection type) 220kV (voltage class) nest line (line name) No. 28 tower (tower number) is damaged (defect content).
And carrying out semantic segmentation and keyword extraction on the text result of the voice recognition by the established semantic recognition model.
And (3) performing similarity judgment on the extracted keywords and the extracted key fields in a large amount of transmission operation and inspection professional text data, removing extracted abnormal text field information, and reorganizing corrected field information to generate corrected voice recognition text.
The following is an example of an abnormal piece of text data:
the channel inspection bridge east line o is not damaged to a phase insulator in a 220kV nest line No. 28 tower, the example is a misread condition, and the bridge east line o is not paired and marked as an abnormal field label.
Wherein, the similarity judgment satisfies the formula:
wherein N represents an N-th field, m represents m-th data in a field database, WER (N, m) is word error rate output of output words and reference text, S represents the number of words replaced, D represents the number of words deleted, I represents the number of words inserted, and N represents the number of words of the reference text.
The method for carrying out semantic correction on the decoded and recognized text result based on the result of semantic recognition analysis further comprises the steps of setting a correction threshold value, and completing correction adjustment on field information conforming to a correction interval so as to satisfy the formula:
output(n)=data(n,x)0<WER(n)<check_threhold
wherein check_threshold is a set correction threshold, and data (n, x) is the nth type of the xth data in the database, where WER (n, x) =wer_n, i.e. the correction method includes:
if the word error rate is 0, no correction is needed;
if the word error rate is greater than 0 and smaller than the correction threshold value, extracting field information with highest similarity in the transmission operation detection professional text database, and correcting and adjusting;
if the word error rate is greater than the correction threshold, there is an information extraction error, which cannot be corrected.
Further, a small amount of data is extracted from a large amount of transmission operation and detection professional text data, and voice input is carried out by a plurality of transmission operation and detection personnel to establish an acoustic model of the plurality of transmission operation and detection personnel, wherein the method further comprises the step of fusing noise data of a simulated transmission operation and detection site with voice input of the plurality of transmission operation and detection personnel to generate more complete and more real transmission operation and detection voice data, and the model robustness is improved.
The decoding method in the speech recognition process is preferably a greedy search method and satisfies the formula:
score_greedy(t)=max(P_t)
where score_greedy (t) represents the fraction at time t, p_t represents the probability distribution of the model output at time t, and max operation represents the probability of choosing the maximum.
The decoding method in the speech recognition process further comprises a set probability threshold value, and the output probability in the decoding method is compared with the set probability threshold value:
if the output probability is greater than or equal to a set probability threshold, the probability threshold is the output of greedy decoding;
if the output probability is smaller than the set probability threshold, outputting a result by using a decoding method of the bundle search, wherein the bundle search satisfies the formula:
wherein, threshold is determined and set,
if the greedy output of all time steps score_greddy is greater than or equal to a set threshold, outputting a decoding result output as a greedy decoding output sequence;
if any step score_greddy (i) is smaller than the set threshold, the final decoding result output is the decoding output sequence beam_sreach_output of the bundle search. The beam_sreach_output output formula is as follows:
Beam_sreach_output=max(y (i)),i=1,…,k
wherein in the bundle search decoding process, k partial solutions are generated, each of which is a sequence y (i),i=1,…,k 0 The score of each partial solution is calculated by multiplying the probability of each element in the partial solution and taking the logarithm, and the calculation formula of the score satisfies the following conditions:
score(y (i))=sum(logP(y_t(i)∣y_1(i),…,y_(t-1)(i)))
where y_t (i) is an element representing the partial decomposition i at time t, y_1 (i), …, y_ (t-1) (i) represents an element of the partial decomposition i at time steps 1 to t-1.
The accuracy of the cluster search is greater than that of the greedy search, but the time efficiency is longer, so that the cluster search is carried out for voice data with low greedy search score and then the judgment is carried out, and the dynamic balance of the accuracy and the response speed in the decoding process is realized.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. A method for voice recognition of power transmission operation based on small sample synthesis, the method comprising:
acquiring a plurality of transmission operation detection professional text corpora, respectively establishing a transmission operation detection professional language model and extracting a transmission operation detection professional language organization structure according to the plurality of transmission operation detection professional text corpora;
generating a large amount of transmission operation and examination professional text data by using the transmission operation and examination professional language organization structure;
extracting a small amount of data from the large amount of transmission operation and detection professional text data, and performing voice input by a plurality of transmission operation and detection personnel to establish acoustic models of the plurality of transmission operation and detection personnel;
training the rest of the large-amount transmission operation detection professional text data after extracting a small amount of data from the large-amount transmission operation detection professional text data by utilizing the acoustic models of the plurality of transmission operation detection personnel to generate a corresponding voice recognition model, and recognizing test data by utilizing the voice recognition model;
decoding the recognition result of the test data by using the established transmission operation detection professional language model to recognize a character result; and
and establishing a semantic recognition model by using the generated large amount of transmission operation and detection professional text data, and carrying out semantic recognition analysis on the text result obtained by decoding and recognition.
2. The method of claim 1, further comprising determining whether the result of the semantic recognition analysis of the decoded and recognized word result by the semantic recognition model corresponds to the decoded and recognized word result:
if yes, outputting the text result identified by decoding;
and if the text results do not accord with the text results, carrying out semantic correction on the text results identified by the decoding based on the result of the semantic identification analysis.
3. The method for recognizing voice of electric transmission operation and test based on small sample synthesis according to claim 2, wherein the step of judging whether the result of the semantic recognition analysis of the decoded and recognized text result by the semantic recognition model is consistent with the decoded and recognized text result comprises the following steps:
extracting key fields from the generated large amount of transmission operation and detection professional text data, labeling and training to generate a corresponding voice recognition model;
carrying out semantic segmentation and keyword extraction on the text result of the voice recognition by the established semantic recognition model;
and carrying out similarity judgment on the extracted keywords and the key fields extracted from the large amount of transmission operation and inspection professional text data.
4. The method for voice recognition of electric transmission and detection based on small sample synthesis according to claim 3, wherein in the process of performing similarity judgment on the extracted keywords and the extracted keywords in the large amount of electric transmission and detection professional text data, the similarity judgment satisfies the formula:
wherein N represents an N-th field, m represents m-th data in the field database, VER (N, m) is word error rate output of output words and reference text, S represents the number of words replaced, D represents the number of words deleted, I represents the number of words inserted, and N represents the number of words of the reference text.
5. The method for voice recognition of electric transmission and detection based on small sample synthesis according to claim 4, wherein the method for performing semantic correction on the decoded and recognized text result based on the result of the semantic recognition analysis further comprises setting a correction threshold, and performing correction adjustment on field information conforming to a correction interval, so as to satisfy a formula:
output(n)=data(n,x)0<WER(n)<check_threhold
wherein check_threshold is the set correction threshold, and data (n, x) is the nth type of the xth data in the database, where WER (n, x) =ver_n, i.e. the correction method includes:
if the word error rate is 0, no correction is needed;
if the word error rate is greater than 0 and smaller than the correction threshold value, the field information with the highest similarity in the transmission operation detection professional text database is taken out for correction adjustment;
if the word error rate is greater than the correction threshold, information extraction errors exist and cannot be corrected.
6. The method for voice recognition of power transmission tests based on small sample synthesis according to claim 1, wherein,
the method comprises the steps of extracting a small amount of data from a large amount of transmission operation and detection professional text data, and performing voice input by a plurality of transmission operation and detection personnel to establish an acoustic model of the plurality of transmission operation and detection personnel, wherein the method further comprises the step of fusing noise data of a simulated transmission operation and detection site with the voice input of the plurality of transmission operation and detection personnel to generate more complete and more real transmission operation and detection voice data.
7. The method for voice recognition of electric transmission operation and test based on small sample synthesis according to claim 1, wherein the decoding method in the voice recognition process is a greedy search method and satisfies the formula:
score_greedy(t)=max(P_t)
where score_greedy (t) represents the fraction at time t, p_t represents the probability distribution of the model output at time t, and max operation represents the probability of choosing the maximum.
8. The method for voice recognition of power transmission tests based on small sample synthesis of claim 7, wherein the decoding method in the voice recognition process further comprises a set probability threshold, and the output probability in the decoding method is compared with the set probability threshold:
if the output probability is greater than or equal to the set probability threshold, the probability threshold is the output of greedy decoding;
if the output probability is smaller than the set probability threshold, outputting a result by using a decoding method of the bundle search, wherein the bundle search satisfies the formula:
wherein, threshold is determined and set,
if the greedy output of the score_greedy in all time steps is greater than or equal to the set threshold, outputting the decoding result to be a greedy decoding output sequence;
if any one step of score_greedy (i) is smaller than the set threshold, the decoding result output is the decoding output sequence beam_sreach_output of the bundle search.
9. The method for voice recognition of power transmission tests based on small sample synthesis of claim 8, wherein the beam_sreach_output output formula:
Beam_sreach_output=max(y (i)),i=1,…,k
wherein in the bundle search decoding process, k partial solutions are generated, each of which is a sequence y (i),i=1,…,k 0 Calculating the score of each partial solution, wherein the calculation formula of the score satisfies the following conditions:
score(y (i))=sum(logP(y_t(i)∣y_1(i),…,y_(t-1)(i)))
where y_t (i) is an element representing the partial decomposition i at time t, y_1 (i), …, y_ (t-1) (i) represents an element of the partial decomposition i at time steps 1 to t-1.
10. The method for voice recognition of electric transmission and detection based on small sample synthesis according to claim 9, wherein the cluster search is performed for voice data with low score of the greedy search and then the cluster search is judged to realize dynamic balance of accuracy and reaction speed in the decoding process.
CN202311215139.3A 2023-09-19 2023-09-19 Voice recognition method for power transmission operation detection based on small sample synthesis Pending CN117292680A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311215139.3A CN117292680A (en) 2023-09-19 2023-09-19 Voice recognition method for power transmission operation detection based on small sample synthesis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311215139.3A CN117292680A (en) 2023-09-19 2023-09-19 Voice recognition method for power transmission operation detection based on small sample synthesis

Publications (1)

Publication Number Publication Date
CN117292680A true CN117292680A (en) 2023-12-26

Family

ID=89251055

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311215139.3A Pending CN117292680A (en) 2023-09-19 2023-09-19 Voice recognition method for power transmission operation detection based on small sample synthesis

Country Status (1)

Country Link
CN (1) CN117292680A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611393A (en) * 2024-01-24 2024-02-27 国网安徽省电力有限公司合肥供电公司 Big data-based anti-electricity-stealing data acquisition method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611393A (en) * 2024-01-24 2024-02-27 国网安徽省电力有限公司合肥供电公司 Big data-based anti-electricity-stealing data acquisition method
CN117611393B (en) * 2024-01-24 2024-04-05 国网安徽省电力有限公司合肥供电公司 Big data-based anti-electricity-stealing data acquisition method

Similar Documents

Publication Publication Date Title
US8185376B2 (en) Identifying language origin of words
CN106297800B (en) Self-adaptive voice recognition method and equipment
US20060064177A1 (en) System and method for measuring confusion among words in an adaptive speech recognition system
CN111640418B (en) Prosodic phrase identification method and device and electronic equipment
CN112397054B (en) Power dispatching voice recognition method
CN112397056B (en) Voice evaluation method and computer storage medium
CN112259083B (en) Audio processing method and device
CN112466279B (en) Automatic correction method and device for spoken English pronunciation
CN111369974A (en) Dialect pronunciation labeling method, language identification method and related device
CN112580340A (en) Word-by-word lyric generating method and device, storage medium and electronic equipment
CN117292680A (en) Voice recognition method for power transmission operation detection based on small sample synthesis
Shivakumar et al. Kannada speech to text conversion using CMU Sphinx
CN113450757A (en) Speech synthesis method, speech synthesis device, electronic equipment and computer-readable storage medium
US20050197838A1 (en) Method for text-to-pronunciation conversion capable of increasing the accuracy by re-scoring graphemes likely to be tagged erroneously
Guillaume et al. Plugging a neural phoneme recognizer into a simple language model: a workflow for low-resource settings
CN113822052A (en) Text error detection method and device, electronic equipment and storage medium
Yoon et al. Off-Topic Spoken Response Detection with Word Embeddings.
Larabi-Marie-Sainte et al. A new framework for Arabic recitation using speech recognition and the Jaro Winkler algorithm
KR20130126570A (en) Apparatus for discriminative training acoustic model considering error of phonemes in keyword and computer recordable medium storing the method thereof
JP3875357B2 (en) Word / collocation classification processing method, collocation extraction method, word / collocation classification processing device, speech recognition device, machine translation device, collocation extraction device, and word / collocation storage medium
Decadt et al. Transcription of out-of-vocabulary words in large vocabulary speech recognition based on phoneme-to-grapheme conversion
JP2010277036A (en) Speech data retrieval device
CN115527551A (en) Voice annotation quality evaluation method and device, electronic equipment and storage medium
CN112908359A (en) Voice evaluation method and device, electronic equipment and computer readable medium
Ma et al. Low-frequency word enhancement with similar pairs in speech recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination