CN118588059A - Intelligent auxiliary diagnosis and analysis method, system and storage medium for voice recognition problem - Google Patents

Intelligent auxiliary diagnosis and analysis method, system and storage medium for voice recognition problem Download PDF

Info

Publication number
CN118588059A
CN118588059A CN202410749014.7A CN202410749014A CN118588059A CN 118588059 A CN118588059 A CN 118588059A CN 202410749014 A CN202410749014 A CN 202410749014A CN 118588059 A CN118588059 A CN 118588059A
Authority
CN
China
Prior art keywords
language
acoustic
candidates
voice recognition
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410749014.7A
Other languages
Chinese (zh)
Inventor
董鑫
王鑫
陆一帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
Sipic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sipic Technology Co Ltd filed Critical Sipic Technology Co Ltd
Priority to CN202410749014.7A priority Critical patent/CN118588059A/en
Publication of CN118588059A publication Critical patent/CN118588059A/en
Pending legal-status Critical Current

Links

Landscapes

  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

The invention provides an intelligent auxiliary diagnosis analysis method for a voice recognition problem, which comprises the steps of obtaining a calling number of a voice recognition engine and an intermediate result log corresponding to the calling number, wherein the intermediate result log comprises an acoustic intermediate result of an acoustic model in the voice recognition engine and a language intermediate result corresponding to a plurality of language models, the acoustic intermediate result comprises a plurality of acoustic candidates, and the language intermediate result comprises a plurality of language candidates; performing simulation diagnosis on the acoustic model and the multipath language model to obtain confidence degrees corresponding to the acoustic candidates and confidence degrees corresponding to the language candidates; comparing the language intermediate results corresponding to the multiple language models to obtain a language fusion result; and displaying the acoustic candidates and the confidence degrees corresponding to the acoustic candidates, the confidence degrees corresponding to the language candidates and the multiple language candidates, and the language intermediate results and the language fusion results corresponding to the multipath language models. The invention reduces the threshold of the voice recognition problem diagnosis and improves the diagnosis efficiency.

Description

Intelligent auxiliary diagnosis and analysis method, system and storage medium for voice recognition problem
Technical Field
The invention relates to the technical field of voice recognition, in particular to an intelligent auxiliary diagnosis and analysis method, system and storage medium for voice recognition problems.
Background
In the field of speech recognition, the problem of incorrect recognition results usually requires intervention of sophisticated algorithm engineers, and the speech recognition system is examined and analyzed based on the recognition problem, so that the problem in the speech recognition process is diagnosed and analyzed with a higher technical threshold. Furthermore, even algorithm engineers need to carefully and gradually review detailed logs in the speech recognition process, sometimes even to simulate the execution of the entire speech recognition engine. Thus, diagnosis of a problem typically requires a considerable amount of time to perform diagnostic analysis.
Disclosure of Invention
In view of the above, the present invention is directed to providing a method, a system and a storage medium for intelligently assisting in diagnosing and analyzing a speech recognition problem, which are used for solving the technical problems of high technical threshold, long time consuming diagnosis, low diagnosis efficiency and the like of diagnosing and analyzing a speech recognition system when the speech recognition result is problematic in the prior art.
In a first aspect, the present invention provides a method for intelligently assisting in diagnosing and analyzing a speech recognition problem, the method comprising the steps of:
acquiring a call number of a voice recognition engine and an intermediate result log corresponding to the call number, wherein the intermediate result log comprises an acoustic intermediate result of an acoustic model in the voice recognition engine and a language intermediate result corresponding to a multi-path language model, the acoustic intermediate result comprises a plurality of acoustic candidates, and the language intermediate result comprises a plurality of language candidates;
performing simulation diagnosis on the acoustic model and the multipath language model to obtain confidence degrees corresponding to the acoustic candidates and confidence degrees corresponding to the language candidates;
comparing the language intermediate results corresponding to the multipath language models to obtain a language fusion result;
And displaying the confidence degrees of the acoustic candidates and the corresponding confidence degrees of the acoustic candidates, the confidence degrees of the language candidates and the corresponding confidence degrees of the language candidates of the multi-path language model, and the language intermediate result and the language fusion result corresponding to the multi-path language model.
Optionally, comparing the language intermediate results corresponding to the multiple language models to obtain a language fusion result, including the following steps:
Comparing the confidence degrees corresponding to the plurality of language candidate items to obtain the language candidate item with the highest confidence degree score;
And taking the language candidate with the highest confidence score as the language intermediate result of the language model corresponding to the candidate.
Optionally, comparing the language intermediate results corresponding to the multiple language models to obtain a language fusion result, and further including the following steps:
Comparing the confidence scores of the language intermediate results corresponding to the multipath language models to obtain the language intermediate result with the highest confidence score;
and taking the language intermediate result with the highest confidence score as the language fusion result.
Optionally, the method further comprises the steps of:
acquiring input corresponding to a calling number of the voice recognition engine, wherein the input parameters comprise request parameters in the upper layer application programming interface request process;
The request parameters are shown.
Optionally, the method further comprises the steps of:
Acquiring the output corresponding to the calling number of the voice recognition engine and the original voice file of the voice recognition;
and aggregating the intermediate result log, the original audio file of the voice recognition and the input and output corresponding to the calling number of the voice recognition engine to obtain an aggregation result.
Optionally, the multi-path language model comprises a one-path model, a Domain path model, a two-path model and a three-path model.
In a second aspect, the present invention provides a voice recognition problem intelligent auxiliary diagnosis analysis system, the voice recognition problem intelligent auxiliary diagnosis system comprising:
A diagnostic data layer configured to store an intermediate result log of a speech recognition engine, the intermediate result log including acoustic intermediate results of acoustic models in the speech recognition engine and language intermediate results corresponding to multiple language models, the acoustic intermediate results including multiple acoustic candidates, the language intermediate results including multiple language candidates;
The diagnosis logic layer is configured to perform simulation diagnosis on the acoustic model and the multi-path language model to obtain confidence degrees corresponding to the plurality of acoustic candidates and confidence degrees corresponding to the plurality of language candidates, and compare language intermediate results corresponding to the multi-path language model to obtain a language fusion result;
And the diagnosis interface layer is configured to display the confidence degrees of the acoustic candidates and the acoustic candidates, the confidence degrees of the language candidates and the language candidates, and the language intermediate results and the language fusion results corresponding to the multi-path language model.
Optionally, the voice recognition problem intelligent auxiliary diagnosis system further comprises:
A diagnostic data layer further configured to store inputs to the speech recognition engine, the inputs to the speech recognition engine including request parameters during a request by an upper layer application programming interface;
A diagnostic interface layer further configured to present the request parameters.
In a third aspect, the present invention provides an apparatus comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;
The memory is configured to store at least one executable instruction, where the executable instruction causes the processor to perform operations corresponding to the method for intelligent auxiliary diagnostic analysis of speech recognition problems as described above.
In a fourth aspect, the present invention provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and when the computer program is executed by a processor, the steps of the intelligent auxiliary diagnosis and analysis method for speech recognition problems are implemented.
According to the first aspect of the invention, through simulation diagnosis on the acoustic model and the multi-path language model of the voice recognition engine, the confidence degrees corresponding to the multiple candidate items of the acoustic model and the confidence degrees corresponding to the multiple candidate items of the multi-path language model are obtained, and the candidate items and the corresponding confidence degrees are displayed. The algorithm engineer can clearly and definitely check various recognition results of the acoustic model and the language model and the corresponding confidence coefficient thereof through the display interface, and can more conveniently and rapidly analyze the next step and repair the problems according to the confidence coefficient score, so that the workload and time consumption of the algorithm engineer for checking the voice recognition models one by one are greatly reduced, and the efficiency of diagnosing the voice recognition problems is improved. Meanwhile, intermediate results and fusion results of the multi-path language model are obtained in the simulation diagnosis process, and at the moment, common clients on the upper layer of the voice recognition engine, namely application development engineers, product managers or project managers, can also check and analyze part of problems by checking the intermediate results of the multi-path language model.
Furthermore, by displaying the request parameters of the speech recognition engine, the common user can also conduct investigation analysis and repair of related speech recognition problems according to the request parameter page. Meanwhile, various information of a voice recognition engine in the voice recognition process is aggregated, and the obtained aggregation result can provide a data basis for the subsequent steps of the diagnosis and analysis of the voice recognition problem.
The foregoing description is only an overview of the present invention, and is intended to provide a more thorough understanding of the present invention, and is to be accorded the full scope of the present invention.
Drawings
FIG. 1 shows a schematic flow chart of a method of diagnostic analysis of speech recognition problems according to one embodiment of the invention;
FIG. 2 is a schematic flow chart of a method for comparing the language intermediate results corresponding to the multiple language models in step S130 shown in FIG. 1 to obtain a language fusion result;
FIG. 3 shows a schematic flow chart of a method of diagnostic analysis of speech recognition problems in accordance with another embodiment of the invention;
FIG. 4 shows a schematic flow chart of a method of diagnostic analysis of speech recognition problems in accordance with another embodiment of the invention;
FIG. 5 shows a block diagram of a voice recognition problem-aiding diagnostic analysis system in accordance with an embodiment of the present invention;
fig. 6 is a block diagram showing the construction of a voice recognition problem-aiding diagnosis analysis apparatus according to an embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the application will be readily understood, a more particular description of the application will be rendered by reference to the appended drawings. It is to be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present application are shown in the drawings. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "comprising" and "having" and any variations thereof herein are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
FIG. 1 shows a schematic flow chart of a method of diagnosing and analyzing speech recognition problems according to one embodiment of the present invention. As shown in fig. 1, the voice recognition problem diagnosis analysis method includes:
Step S110, obtaining a call number of a voice recognition engine and an intermediate result log corresponding to the call number, wherein the intermediate result log comprises an acoustic intermediate result of an acoustic model in the voice recognition engine and a language intermediate result corresponding to a multi-path language model, the acoustic intermediate result comprises a plurality of acoustic candidates, and the language intermediate result comprises a plurality of language candidates.
In some embodiments, the multiple language models include a one-way model, a Domain-way model, a two-way model, and a three-way model. One path of model, namely a general model representing recognition, is suitable for general speech recognition tasks, covers a wide range of vocabulary and language modes, and may not achieve the optimal recognition effect in a specific field or scene. The Domain model and the two-way model are language models which are used for strengthening and optimizing specific fields or industries and are used for improving the accuracy and performance of voice recognition in the fields. The three-way model is a language model for carrying out enhanced optimization on a specific project or application scene and comprises specific words, phrases and expression modes in the project so as to further improve the recognition accuracy.
In the voice recognition process, an upper application program sends a request, an audio file is input, a voice recognition engine carries out voice recognition on the audio file and outputs a recognition text result, at this time, each call of the voice recognition engine corresponds to a call number, and each call number has a corresponding record log, wherein the record log comprises an intermediate result log. The intermediate result log comprises an acoustic intermediate result of an acoustic model in the speech recognition engine and a language intermediate result corresponding to each language model, and the acoustic intermediate result and the language intermediate result comprise a plurality of candidates corresponding to the acoustic intermediate result and the language intermediate result.
And step S120, performing simulation diagnosis on the acoustic model and the multipath language model to obtain the confidence degrees corresponding to the acoustic candidates and the confidence degrees corresponding to the language candidates.
And performing simulation diagnosis scoring on the acoustic model of the voice recognition engine and each path of voice model to obtain candidate items of the acoustic model and confidence scores corresponding to the candidate items, and the candidate items and the confidence scores of each candidate item corresponding to the one path, the Domain path, the two paths and the three paths of voice models respectively.
And step S130, comparing the language intermediate results corresponding to the multiple language models to obtain a language fusion result.
Step S140, displaying the confidence levels of the plurality of acoustic candidates and the plurality of acoustic candidates, the confidence levels of the plurality of language candidates and the plurality of language candidates of the multi-path language model, and the language intermediate result and the language fusion result of the multi-path language model.
According to the scheme of the embodiment of the invention, the confidence degrees corresponding to the multiple candidate items of the acoustic model and the confidence degrees corresponding to the multiple candidate items of the multiple language model are obtained by carrying out simulation diagnosis on the acoustic model and the multiple language models of the voice recognition engine, and the candidate items and the corresponding confidence degrees are displayed. The algorithm engineer can clearly and definitely check various recognition results of the acoustic model and the language model and the corresponding confidence coefficient thereof through the display interface, and can more conveniently and rapidly analyze the next step and repair the problems according to the confidence coefficient score, so that the workload and time consumption of the algorithm engineer for checking the voice recognition models one by one are greatly reduced, and the efficiency of diagnosing the voice recognition problems is improved. Meanwhile, intermediate results and fusion results of the multi-path language model are obtained in the simulation diagnosis process, and at the moment, common clients on the upper layer of the voice recognition engine, namely application development engineers, product managers or project managers, can also check and analyze part of problems by checking the intermediate results of the multi-path language model.
Fig. 2 is a schematic flowchart of a method for comparing the language intermediate results corresponding to the multiple language models in step S130 shown in fig. 1 to obtain a language fusion result. As shown in fig. 2, the step S130 includes:
Step S131, comparing the confidence degrees corresponding to the language candidates to obtain the language candidate with the highest confidence score.
Step S132, the language candidate with the highest confidence score is used as the language intermediate result of the language model corresponding to the candidate.
The multi-path language model is provided with different intermediate results, the intermediate results of one path of model comprise a plurality of candidate items, the simulation diagnosis is carried out on one path of model to obtain the confidence scores of the candidate items, the confidence scores of the candidate items are compared, and one candidate item with the highest confidence score is obtained to be used as the intermediate result of one path of model. And similarly, comparing confidence scores of candidates corresponding to the Domain model, the two-way model and the three-way model respectively to obtain an intermediate result of each way model.
And step S133, comparing the confidence degrees of the language intermediate results corresponding to the multiple paths of language models to obtain the language intermediate result with the highest confidence degree.
And comparing the confidence scores of the intermediate results corresponding to the obtained language models to obtain an intermediate result with the highest confidence score and the language model corresponding to the intermediate result.
Step S134, the language intermediate result with the highest confidence is used as a language fusion result.
And the intermediate result with the highest confidence score is the language fusion result of the language model. For example, assume that the intermediate result of one path model is the candidate item with the highest confidence score in all path language models, that is, the intermediate result of one path model is used as the language fusion result of the language model to be displayed on the screen.
Fig. 3 shows a schematic flow chart of a method of diagnosing and analyzing a speech recognition problem according to another embodiment of the present invention. As shown in fig. 3, the voice recognition problem diagnosis analysis method further includes:
step S310, input corresponding to the calling number of the speech recognition engine is obtained, and the input parameters comprise request parameters in the upper layer application programming interface request process.
In the voice recognition process, each call of the voice recognition engine corresponds to a call number, each call number is provided with a corresponding record log, and the record log also comprises detailed input of the call of the voice recognition engine, wherein the detailed input comprises detailed parameters in the upper application programming interface request process.
Step S320, the request parameters are displayed.
Table 1 below shows a presentation of some of the request parameters.
In practical application, the user can check and modify the parameter settings during voice recognition according to the parameter information shown in table 1. The request parameters shown are explained below with the second row and the third row in table 1.
The second row in table 1 indicates that the request parameter of the upper layer application in speech recognition is named "context. Product id", and the type is "string", indicating that the value of the parameter should be of the string type. At the same time, whether a column of "yes" is necessary indicates that this parameter is necessary, i.e., the value of this parameter must be provided in the request of the upper layer application when performing speech recognition. The parameter represents an ID of the open platform for identifying a particular application or item, with a value example of 278578090. The third row in table 1 indicates that the request parameter of the upper layer application in speech recognition is named "request. Asr. Enablevad", which is of the type "bol", and that the value of the parameter should be of the boolean type. Meanwhile, whether a column of "no" is necessary indicates whether the value of this parameter needs to be provided when making a request of an upper application, which is optional according to the opinion of the user. The parameter is used to indicate whether the cloud VAD (VoiceActivity Detection ) is used, which is on by default, but can be turned off by setting false. The value is exemplified by true or false, which are used to illustrate optional values of the parameter.
Fig. 4 shows a schematic flow chart of a method of diagnosing and analyzing a speech recognition problem according to another embodiment of the present invention. As shown in fig. 4, the voice recognition problem diagnosis analysis method further includes:
step S410, obtaining the output corresponding to the calling number of the voice recognition engine and the original voice file of the voice recognition.
Step S420, aggregating the intermediate result log, the voice recognition original audio file and the input and output corresponding to the calling number of the voice recognition engine to obtain an aggregation result.
In the voice recognition process, the record log corresponding to the calling number of each recognition engine comprises the original audio file called at the time and the output called by the voice recognition engine besides the intermediate result log and the calling input. And aggregating all the information recorded in the log to obtain an aggregation result, and providing a data base for the subsequent step of diagnosis and analysis of the voice recognition problem.
According to the embodiment, by displaying the request parameters of the speech recognition engine, the common user can also conduct investigation analysis and repair of the related speech recognition problems according to the request parameter page. Meanwhile, various information of a voice recognition engine in the voice recognition process is aggregated, and the obtained aggregation result can provide a data basis for the subsequent steps of the diagnosis and analysis of the voice recognition problem.
The embodiment of the invention also provides a system for diagnosing and analyzing the voice recognition problem. As shown in fig. 5, the voice recognition problem diagnosis analysis system includes a diagnosis data layer 101, a diagnosis logic layer 102, and a diagnosis interface layer 103.
The diagnostic data layer 101 is configured to store an intermediate result log of the speech recognition engine, the intermediate result log comprising acoustic intermediate results of the acoustic model in the speech recognition engine and language intermediate results corresponding to the multiple language models, the acoustic intermediate results comprising a plurality of acoustic candidates, the language intermediate results comprising a plurality of language candidates.
The diagnosis logic layer 102 is configured to perform simulation diagnosis on the acoustic model and the multiple language models to obtain confidence degrees corresponding to the multiple acoustic candidates and confidence degrees corresponding to the multiple language candidates, and compare the language intermediate results corresponding to the multiple language models to obtain a language fusion result.
The diagnostic interface layer 103 is configured to present the plurality of acoustic candidates and the confidence levels corresponding to the plurality of acoustic candidates, the plurality of language candidates and the confidence levels corresponding to the plurality of language candidates, and the language intermediate results and the language fusion results corresponding to the multiple language models.
In some alternative embodiments, the diagnostic data layer 101 is further configured to store inputs to the speech recognition engine including request parameters during requests from the upper layer application programming interface. Diagnostic interface layer 103 is also configured to present the request parameters stored within diagnostic data layer 101.
In some alternative embodiments, the diagnostic data layer 101 is further configured to store the original audio file of the speech recognition and the output corresponding to the call number of the speech recognition engine. The diagnostic logic layer 102 is further configured to aggregate the intermediate result log, the original audio file of the speech recognition, and the input and output information corresponding to the call number of the speech recognition engine, so as to obtain an aggregate result, and provide a data base for the subsequent step of the diagnostic analysis of the speech recognition problem.
In practical application, when the user is not satisfied with the voice recognition result, the call number generated by the recognition can be input into the voice recognition problem auxiliary diagnosis analysis system to check the results of various auxiliary diagnosis analyses displayed by the diagnosis interface layer, so that the user can analyze and repair the voice recognition problem in the next step.
The embodiment of the invention also provides voice recognition problem auxiliary diagnosis analysis equipment, which comprises: the processor 201, the memory 202, and a computer program stored in the memory 202 and configured to be executed by the processor 201, the processor 201 implementing the voice recognition problem-aiding diagnosis analysis method of any of the embodiments described above when executing the computer program.
The steps of the above-described embodiment of the method for assisting in diagnosing and analyzing a speech recognition problem, such as all the steps of the method for assisting in diagnosing and analyzing a speech recognition problem shown in fig. 1, are implemented when the processor 201 executes a computer program. Or the processor 201, when executing the computer program, implements the functions of the modules/units in the above-described voice recognition problem-aiding diagnosis analysis system, such as the functions of the layers of the voice recognition problem-aiding diagnosis analysis system shown in fig. 5.
By way of example, a computer program may be split into one or more modules, which are stored in the memory 202 and executed by the processor 201 to perform the present invention. One or more of the modules may be a series of computer program instruction segments capable of performing particular functions for describing the execution of a computer program in a speech recognition problem diagnosis and analysis system.
The Processor 201 may be a central processing unit (Central Processing Unit, CPU), other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATEARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 201 is a control center of the voice recognition problem-aiding diagnosis analysis system, and connects the respective parts of the entire voice recognition problem-aiding diagnosis analysis system using various interfaces and lines.
The memory 202 may be used to store computer programs and/or modules, and the processor 201 implements various functions of the speech recognition problem-aiding diagnostic analysis system by running or executing the computer programs and/or modules stored in the memory 202 and invoking data stored in the memory 202. The memory 202 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for at least one function, and the like; the storage data area may store data created from the use of the speech recognition problem-aiding diagnostic analysis system, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart memory card (SMART MEDIA CARD, SMC), secure Digital (SD) card, flash memory card (FLASH CARD), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
Wherein the modules/units of the speech recognition problem diagnosis assisting analysis system may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a USB flash disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-only Memory (ROM), a random access Memory (RAM, randomAccess Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.
The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (10)

1. An intelligent auxiliary diagnosis and analysis method for voice recognition problems is characterized by comprising the following steps:
acquiring a call number of a voice recognition engine and an intermediate result log corresponding to the call number, wherein the intermediate result log comprises an acoustic intermediate result of an acoustic model in the voice recognition engine and a language intermediate result corresponding to a multi-path language model, the acoustic intermediate result comprises a plurality of acoustic candidates, and the language intermediate result comprises a plurality of language candidates;
performing simulation diagnosis on the acoustic model and the multipath language model to obtain confidence degrees corresponding to the acoustic candidates and confidence degrees corresponding to the language candidates;
comparing the language intermediate results corresponding to the multipath language models to obtain a language fusion result;
And displaying the confidence degrees of the acoustic candidates and the corresponding confidence degrees of the acoustic candidates, the confidence degrees of the language candidates and the corresponding confidence degrees of the language candidates of the multi-path language model, and the language intermediate result and the language fusion result corresponding to the multi-path language model.
2. The intelligent auxiliary diagnosis and analysis method for voice recognition problems according to claim 1, wherein the comparison of the language intermediate results corresponding to the multiple language models to obtain a language fusion result comprises the following steps:
Comparing the confidence degrees corresponding to the plurality of language candidate items to obtain the language candidate item with the highest confidence degree score;
And taking the language candidate with the highest confidence score as the language intermediate result of the language model corresponding to the candidate.
3. The intelligent auxiliary diagnosis and analysis method for voice recognition problems according to claim 2, wherein the comparison of the language intermediate results corresponding to the multiple language models to obtain a language fusion result, further comprises the following steps:
comparing the confidence degrees of the language intermediate results corresponding to the multipath language models to obtain the language intermediate result with the highest confidence degree;
and taking the language intermediate result with the highest confidence as the language fusion result.
4. The intelligent aided diagnosis and analysis method of speech recognition problem according to claim 1, further comprising the steps of:
acquiring input corresponding to a calling number of the voice recognition engine, wherein the input parameters comprise request parameters in the upper layer application programming interface request process;
The request parameters are shown.
5. The intelligent aided diagnosis and analysis method of speech recognition problem according to claims 1-4, further comprising the steps of:
Acquiring the output corresponding to the calling number of the voice recognition engine and the original voice file of the voice recognition;
and aggregating the intermediate result log, the original audio file of the voice recognition and the input and output corresponding to the calling number of the voice recognition engine to obtain an aggregation result.
6. The method of claim 1, wherein the multiple language models include a one-way model, a Domain-way model, a two-way model, and a three-way model.
7. An intelligent aided diagnosis and analysis system for speech recognition problems as set forth in any one of claims 1-6, characterized in that said intelligent aided diagnosis system for speech recognition problems comprises:
a diagnostic data layer configured to store an intermediate result log of a speech recognition engine, the intermediate result log including acoustic intermediate results of acoustic models in the speech recognition engine and language intermediate results corresponding to the multiple language models, the acoustic intermediate results including a plurality of acoustic candidates, the language intermediate results including a plurality of language candidates;
The diagnosis logic layer is configured to perform simulation diagnosis on the acoustic model and the multi-path language model to obtain confidence degrees corresponding to the plurality of acoustic candidates and confidence degrees corresponding to the plurality of language candidates, and compare language intermediate results corresponding to the multi-path language model to obtain a language fusion result;
And the diagnosis interface layer is configured to display the confidence degrees of the acoustic candidates and the acoustic candidates, the confidence degrees of the language candidates and the language candidates, and the language intermediate results and the language fusion results corresponding to the multi-path language model.
8. The voice recognition problem intelligent aided diagnosis analysis system of claim 7, wherein the voice recognition problem intelligent aided diagnosis system further comprises:
A diagnostic data layer further configured to store inputs to the speech recognition engine, the inputs to the speech recognition engine including request parameters during a request by an upper layer application programming interface;
A diagnostic interface layer further configured to present the request parameters.
9. An apparatus, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;
the memory is configured to store at least one executable instruction that causes the processor to perform operations corresponding to the method for intelligent auxiliary diagnostic analysis of speech recognition problems as set forth in any one of claims 1-6.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the steps of the intelligent auxiliary diagnostic analysis method for speech recognition problems as claimed in any one of claims 1 to 6.
CN202410749014.7A 2024-06-11 2024-06-11 Intelligent auxiliary diagnosis and analysis method, system and storage medium for voice recognition problem Pending CN118588059A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410749014.7A CN118588059A (en) 2024-06-11 2024-06-11 Intelligent auxiliary diagnosis and analysis method, system and storage medium for voice recognition problem

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410749014.7A CN118588059A (en) 2024-06-11 2024-06-11 Intelligent auxiliary diagnosis and analysis method, system and storage medium for voice recognition problem

Publications (1)

Publication Number Publication Date
CN118588059A true CN118588059A (en) 2024-09-03

Family

ID=92534885

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410749014.7A Pending CN118588059A (en) 2024-06-11 2024-06-11 Intelligent auxiliary diagnosis and analysis method, system and storage medium for voice recognition problem

Country Status (1)

Country Link
CN (1) CN118588059A (en)

Similar Documents

Publication Publication Date Title
US11392752B2 (en) Visualized editing method, device and apparatus, and storage medium
CN108595583B (en) Dynamic graph page data crawling method, device, terminal and storage medium
EP3620988B1 (en) Method, device for optimizing simulation data, and computer-readable storage medium
US10078502B2 (en) Verification of a model of a GUI-based application
JP2010002370A (en) Pattern extraction program, technique, and apparatus
US11288845B2 (en) Information processing apparatus for coloring an image, an information processing program for coloring an image, and an information processing method for coloring an image
US10311053B2 (en) Efficient processing of data extents
CN108874665A (en) A kind of test result method of calibration, device, equipment and medium
TW202121206A (en) Method and system for automatically identifying valid data acquisition module
CN107562710B (en) Chart processing device and method
CN110377692B (en) Method and device for training robot to imitate learning manual customer service
CN115344805A (en) Material auditing method, computing equipment and storage medium
US11599743B2 (en) Method and apparatus for obtaining product training images, and non-transitory computer-readable storage medium
CN117609060A (en) Recording script generation method, device, computer equipment and storage medium
US20160292174A1 (en) File scanning method and device
JP6508327B2 (en) Text visualization system, text visualization method, and program
CN116701215A (en) Interface test case generation method, system, equipment and storage medium
CN118588059A (en) Intelligent auxiliary diagnosis and analysis method, system and storage medium for voice recognition problem
US11762939B2 (en) Measure GUI response time
CN115757174A (en) Database difference detection method and device
KR101870658B1 (en) System and method for distributed realtime processing of linguistic intelligence moduel
CN108920695B (en) A kind of data query method, apparatus, equipment and storage medium
CN112035513A (en) SQL statement performance optimization method, device, terminal and storage medium
CN111027196A (en) Simulation analysis task processing method and device for power equipment and storage medium
CN111027667A (en) Intention category identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination