CN113782001A - Specific field voice recognition method and device, electronic equipment and storage medium - Google Patents

Specific field voice recognition method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113782001A
CN113782001A CN202111341004.2A CN202111341004A CN113782001A CN 113782001 A CN113782001 A CN 113782001A CN 202111341004 A CN202111341004 A CN 202111341004A CN 113782001 A CN113782001 A CN 113782001A
Authority
CN
China
Prior art keywords
domain
language model
field
model
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111341004.2A
Other languages
Chinese (zh)
Other versions
CN113782001B (en
Inventor
蒋志燕
黄石磊
陈诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Raisound Technology Co ltd
Original Assignee
Shenzhen Raisound Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Raisound Technology Co ltd filed Critical Shenzhen Raisound Technology Co ltd
Priority to CN202111341004.2A priority Critical patent/CN113782001B/en
Publication of CN113782001A publication Critical patent/CN113782001A/en
Application granted granted Critical
Publication of CN113782001B publication Critical patent/CN113782001B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Abstract

The invention relates to an artificial intelligence technology, and discloses a specific field speech recognition method, which comprises the following steps: acquiring a general language model and a language model set comprising a plurality of domain language models, wherein the language model set further comprises first domain information which is in one-to-one correspondence with the domain language models; acquiring a voice to be recognized and corresponding second field information, selecting first field information similar to the second field information from the language model set, and acquiring a corresponding field language model as a target field language model; performing model combination on the general language model and the target field language model to obtain a target identification model; and carrying out voice recognition on the voice to be recognized by utilizing the target recognition model to obtain a recognition result. The invention also provides a specific field voice recognition device, equipment and medium. The invention can improve the accuracy of voice recognition.

Description

Specific field voice recognition method and device, electronic equipment and storage medium
Technical Field
The present invention relates to artificial intelligence technology, and in particular, to a method and an apparatus for speech recognition in a specific field, an electronic device, and a storage medium.
Background
The speech recognition is realized based on a language model, and most of the current speech recognition schemes are recognition schemes based on a general language model.
However, the recognition scheme based on the universal language model can accurately recognize the universal words, but when the speech to be recognized in a specific field (such as professional medical field meeting records) is recognized, the professional words in some specific fields in the speech to be recognized are easily recognized as the universal words with similar pronunciation by mistake, that is, the recognition accuracy of the existing recognition scheme based on the universal language model to the professional words in the specific field is low, so that the accuracy of speech recognition is low.
Disclosure of Invention
The invention provides a specific field voice recognition method, a specific field voice recognition device, electronic equipment and a storage medium, and mainly aims to improve the accuracy of voice recognition.
In order to achieve the above object, the present invention provides a specific field speech recognition method, including:
acquiring a general language model and a language model set comprising a plurality of domain language models, wherein the language model set further comprises first domain information which is in one-to-one correspondence with the domain language models;
acquiring a voice to be recognized and corresponding second field information, selecting first field information similar to the second field information from the language model set, and acquiring a corresponding field language model as a target field language model; performing model combination on the general language model and the target field language model to obtain a target identification model;
and carrying out voice recognition on the voice to be recognized by utilizing the target recognition model to obtain a recognition result.
Optionally, the selecting, from the language model set, first domain information similar to the second domain information, and acquiring a corresponding domain language model as a target domain language model includes:
vectorizing the first domain information respectively to obtain corresponding first domain vectors, and vectorizing the second domain information to obtain second domain vectors;
respectively calculating the similarity of the second domain vector and each first domain vector;
and selecting the first domain information corresponding to the maximum similarity, and acquiring a domain language model corresponding to the first domain information as the target domain language model.
Optionally, the vectorizing the second domain information to obtain a second domain vector includes:
converting each character in the second field information into a vector to obtain a corresponding character vector;
and carrying out vector splicing on the character vectors according to the sequence of the corresponding characters in the second field information to obtain the second field vector.
Optionally, the model combining the general language model and the target domain language model to obtain a target recognition model includes:
acquiring n-gram entries which appear in the target domain language model but do not appear in the general language model, and performing probability interpolation on the acquired n-gram entries;
converting the n-gram items after the probability interpolation into a weighted finite state machine to obtain a domain decoding network;
converting the universal language model into a weighted finite state machine to obtain a universal decoding network;
and splicing the field decoding network and the general decoding network to obtain the target identification model.
Optionally, the splicing the domain decoding network and the general decoding network to obtain the target identification model includes:
adding virtual nodes in the general decoding network and the field decoding network respectively, wherein the virtual nodes comprise a starting node and an end node;
and connecting a general decoding network and a field decoding network in series by using the starting node and the ending node to obtain the target identification model.
Optionally, the obtaining the target recognition model by using the starting node and the ending node to connect a general decoding network and a domain decoding network in series includes:
carrying out directed connection on an end node added in the general decoding network and a start node added in the field decoding network according to the direction of the end node pointing to the start node; or
And carrying out directed connection on the end node added in the field decoding network and the start node added in the general decoding network according to the direction of the end node pointing to the start node.
Optionally, the performing speech recognition on the speech to be recognized by using the target recognition model to obtain a recognition result includes:
framing the voice to be recognized to obtain a plurality of voice frames;
and inputting all the voice frames into the target recognition model in sequence according to time to obtain the recognition result.
In order to solve the above problem, the present invention also provides a domain-specific speech recognition apparatus, including:
the model screening module is used for acquiring a general language model and a language model set comprising a plurality of field language models, wherein the language model set further comprises first field information which corresponds to the field language models one by one; acquiring a voice to be recognized and corresponding second field information, selecting first field information similar to the second field information from the language model set, and acquiring a corresponding field language model as a target field language model;
the model combination module is used for carrying out model combination on the general language model and the target field language model to obtain a target identification model;
and the voice recognition module is used for carrying out voice recognition on the voice to be recognized by utilizing the target recognition model to obtain a recognition result.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one computer program; and
and a processor executing the computer program stored in the memory to implement the domain-specific speech recognition method.
In order to solve the above problem, the present invention also provides a computer-readable storage medium, in which at least one computer program is stored, the at least one computer program being executed by a processor in an electronic device to implement the domain-specific speech recognition method described above.
In the specific field speech recognition method provided by the embodiment of the invention, first field information similar to the second field information is selected from the language model set, and a corresponding field language model is obtained as a target field language model; the general language model and the target field language model are combined to obtain a target recognition model, and the general language model for recognizing the general vocabulary is combined with the field language model in the same field of the voice to be recognized, so that the general vocabulary can be recognized, professional vocabularies in a specific field can be recognized, and the accuracy of voice recognition is improved.
Drawings
FIG. 1 is a flowchart illustrating a domain-specific speech recognition method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a domain-specific speech recognition apparatus according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an internal structure of an electronic device implementing a domain-specific speech recognition method according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the invention provides a specific field voice recognition method. The execution subject of the specific field speech recognition method includes, but is not limited to, at least one of electronic devices such as a server and a terminal that can be configured to execute the method provided by the embodiments of the present application. In other words, the domain-specific speech recognition method may be performed by software or hardware installed in the terminal device or the server device, and the software may be a block chain platform. The server includes but is not limited to: the cloud server can be an independent server, or can be a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.
Referring to fig. 1, a flow diagram of a specific field speech recognition method according to an embodiment of the present invention is shown, where in the embodiment of the present invention, the specific field speech recognition method includes:
s1, obtaining a general language model and a language model set comprising a plurality of domain language models, wherein the language model set further comprises first domain information corresponding to the domain language models one by one.
In detail, in the embodiment of the present invention, the universal language model is a speech recognition model obtained by normal corpus training, and the domain language model is a speech recognition model obtained by corpus training in a specific domain. For example, the first domain information corresponding to the domain language model is medical domain information, and then the domain language model is obtained by corpus training in the medical domain.
The first domain information is language background information of a corpus used by the domain speech model obtained by training, such as: the medical field.
In the embodiment of the invention, the language model is centralized with a plurality of domain language models, and each domain language model corresponds to unique first domain information.
In another embodiment of the present invention, the language model set may have only one domain language model and corresponding first domain information.
Optionally, the generic language model and the target domain language model in the embodiment of the present invention may be n-gram models.
S2, obtaining the speech to be recognized and the corresponding second domain information, selecting the first domain information similar to the second domain information from the language model set, and obtaining the corresponding domain language model as the target domain language model.
In the embodiment of the present invention, the speech to be recognized is a speech that needs to be subjected to speech recognition, and the second domain information is a text of a domain background corresponding to the speech to be recognized, for example: the speech to be recognized is discussion speech at a medical seminar, and then the second domain information is a medical domain.
Furthermore, because the first field information and the second field information have the condition that characters with the same semantics are different, in order to more accurately find the first field information similar to the second field information, the first field information is respectively vectorized to obtain a first field vector, and the second field information is vectorized to obtain a second field vector.
Further, in the embodiment of the present invention, the similarity between the second domain vector and each of the first domain vectors is calculated respectively; and selecting the first domain information corresponding to the maximum similarity, and acquiring a domain language model corresponding to the first domain information to obtain a target domain language model.
In detail, in the embodiment of the present invention, vectorizing the second domain information to obtain the second domain vector includes:
converting each character in the second field information into a vector to obtain a corresponding character vector;
and carrying out vector splicing on the character vectors according to the sequence of the corresponding characters in the second field information to obtain the second field vector.
In another embodiment of the present invention, vector average calculation is performed on all the character vectors to obtain the speech field vector.
For example: a total of two character vectors are respectively
Figure 698156DEST_PATH_IMAGE001
Figure 93365DEST_PATH_IMAGE002
Then the corresponding second domain vector is
Figure 845420DEST_PATH_IMAGE003
In another embodiment of the invention, all elements in the character vector are subjected to average calculation to obtain a vector characteristic value; sorting the character vectors according to the sequence of the corresponding characters in the second field information; and combining all the vector characteristic values according to the sorted sequence of the corresponding character vectors to obtain the second field vector.
In another embodiment of the present invention, 50% scores, maximum values, modes, etc. of all elements in the character vector may be selected as the vector feature values.
And S3, performing model combination on the general language model and the target field language model to obtain a target recognition model.
The embodiment of the application carries out model combination on the general language model and the target field language model, so that the finally obtained target recognition model can accurately recognize the general vocabulary and the professional vocabulary in the specific field. Compared with a general language model, the target recognition model constructed in the embodiment of the application can improve the speech recognition accuracy of professional vocabularies in a specific field.
In detail, the embodiment of the present invention performs model combination on the general language model and the target domain language model, and includes:
and A, acquiring n-gram entries which appear in the target domain language model but do not appear in the general language model, and performing probability interpolation on the acquired n-gram entries.
Optionally, the general language model and the target domain language model may be n-gram models, and based on this, when the target domain language model and the general language model are interpolated according to this embodiment, the interpolation part is n-gram entries that only appear in the target domain language model and the general language model, that is, the interpolation part only includes n-gram entries that appear in the target domain language model, and does not include n-gram entries that do not appear in the target domain language model, and performs probability interpolation. Wherein the n-gram item is a word or a word combination in the model corpus.
In detail, in the embodiment of the present invention, probability interpolation is performed on the n-gram entries to change the probability corresponding to the n-gram entries.
Optionally, in the embodiment of the present invention, a preset weighting coefficient and a probability corresponding to the n-gram entry are used for performing weighted calculation, so as to obtain a modified probability.
For example: the probability of the n-gram entry appearing in the general language model after being added into the general language model is p1, the probability of the n-gram entry appearing in the domain language model is p2, and then the probability after interpolation is a p1+ (1+ a) p2, wherein a is a preset interpolation coefficient.
In the embodiment of the invention, the n-gram entries and the universal language model are in the same probability frame through interpolation, so that the models can be combined conveniently in the follow-up process.
And step B, converting the n-gram items after the probability interpolation into a weighted finite state machine to obtain a domain decoding network.
Since the number of n-gram entries of the interpolation section (i.e., n-gram entries appearing only in the target domain language model) is small, the generation of the domain decoding network from the interpolation section takes little time and occupies little memory resources.
In detail, in the embodiment of the present invention, each n-gram entry is converted into a weighted finite state machine, so as to obtain a corresponding decoding path, and all the decoding paths are summarized to obtain the domain decoding network.
And step C, converting the universal language model into a weighted finite state machine to obtain a universal decoding network.
And D, splicing the field decoding network and the general decoding network to obtain the target identification model.
Optionally, the target recognition network obtained by connecting the domain decoding network and the universal language network in series in the embodiment of the present invention can accurately recognize the speech including the universal vocabulary and the professional vocabulary simultaneously, and is suitable for continuous speech recognition including a large number of vocabularies.
In detail, the embodiment of the present invention splices the domain decoding network and the general decoding network to obtain the target identification model, and includes:
step I: adding virtual nodes for the generic decoding network and the domain decoding network, respectively.
The virtual nodes comprise a starting node and an end node.
Optionally, in the embodiment of the present invention, a start node is added at the beginning of each decoding path in the general decoding network and the domain decoding network, and an end node is added at the end of each decoding path.
Step II: and connecting the general decoding network and the field decoding network in series by using the starting node and the ending node to obtain the target identification model.
Optionally, in the embodiment of the present invention, there are two ways in which the general decoding network and the domain decoding network are connected in series, where one is that the general decoding network is before the domain decoding network and the other is that the general decoding network is before the domain decoding network and the general decoding network is behind the domain decoding network.
Specifically, in the embodiment of the present invention, when the general decoding network is connected in series after the domain decoding network is connected in series, the process of connecting the general decoding network and the domain decoding network in series by using the start node and the end node includes: and carrying out directed connection on the end node added in the general decoding network and the start node added in the field decoding network according to the direction of the end node pointing to the start node.
For example: the general decoding network comprises a starting node A and an ending node B, the field decoding network comprises a starting node C and an ending node D, and then the ending node B of the general decoding network is directionally connected with the field decoding network C of the field decoding network according to the direction from B to C, namely the nodes in the obtained target identification model are sequentially from A to B to C to D.
In another embodiment of the present invention, when the general decoding network is connected in series after the domain decoding network is connected in series, the process of connecting the general decoding network and the domain decoding network in series by using the start node and the end node includes:
and carrying out directed connection on the end node added in the field decoding network and the start node added in the general decoding network according to the direction of the end node pointing to the start node.
For example: the general decoding network comprises a starting node A and an ending node B, the field decoding network comprises a starting node C and an ending node D, then the starting node A of the general decoding network and the ending node D of the field decoding network are directionally connected according to the direction from D to A, and the nodes in the obtained target identification model are sequentially from C to D to A to B.
When a target recognition model obtained by serially connecting a domain decoding network and a general decoding network is used for decoding a speech to be recognized, if the speech to be recognized comprises both general words and professional words, a final decoding path consists of a decoding path of the general decoding network and a decoding path of the domain decoding network, for example, the speech to be recognized is that a decoding path corresponding to 'we start to perform speech recognition' and 'we start to perform' exists in the general decoding network, the decoding path corresponding to 'speech recognition' exists in the domain decoding network, and the final decoding path consists of a decoding path corresponding to 'we start to perform' and a decoding path corresponding to 'speech recognition'.
And S4, performing voice recognition on the voice to be recognized by using the target recognition model to obtain a recognition result.
In detail, the decoding path of the target recognition model in the embodiment of the present invention may be represented by a lattice, where the lattice is a weighted undirected graph, each node in the lattice represents an acoustic unit, each arc includes two weights, i.e., an acoustic weight and a language weight, any decoding path from left to right on the lattice constitutes a speech recognition result, the acoustic weights of each edge on the decoding path are added, and the language weight corresponding to the decoding path is added to form a score of the whole decoding path, and the path with the highest score is used as the recognition result.
Specifically, the process of decoding the speech to be recognized by using the target recognition model to obtain the decoding path of the speech to be recognized may include: framing the voice to be recognized to obtain a plurality of voice frames; and sequentially inputting all the voice frames into the target recognition model according to time to decode, and obtaining the recognition result, namely obtaining a decoding path with the highest score in the target recognition model.
For example: and when the target recognition model is obtained by the general decoding network in front of the field decoding network and then in series, each voice frame of the voice to be recognized enters the general decoding network through the starting node of the general decoding network for decoding, and after the decoding path in the general decoding network is finished, the starting node of the field decoding network connected with the finishing node of the general network enters the field decoding network for continuously decoding, and so on until the voice frame is finished. Further, the embodiment of the invention determines the voice recognition result of the voice to be recognized according to the decoding path of the voice to be recognized.
Optionally, in the embodiment of the present invention, a plurality of decoding paths corresponding to the speech to be recognized in the target recognition model are obtained, each decoding path corresponding to the speech to be recognized in the target recognition model and a weight score corresponding to each decoding path are obtained, and the decoding path with the highest weight score is selected to be determined as the speech recognition result corresponding to the speech to be recognized.
In the specific field speech recognition method provided by the embodiment of the invention, a field language model corresponding to field information of the language field information similar model is selected from the field language model set to obtain a target field language model; and combining the general language model with the target field language model to obtain a target recognition model, and combining the general language model for recognizing general words with the field language model in the same field of the voice to be recognized, so that the recognition of the general words and the professional words in the specific field can be realized, and the accuracy of voice recognition is improved.
Fig. 2 is a functional block diagram of a speech recognition apparatus according to a specific embodiment of the present invention.
The domain-specific speech recognition apparatus 100 according to the present invention may be installed in an electronic device. Depending on the implemented functions, the domain-specific speech recognition apparatus may include a model filtering module 101, a model combining module 102, and a speech recognition module 103, which may also be referred to as a unit, and refers to a series of computer program segments that can be executed by a processor of an electronic device and can perform fixed functions, and are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the model screening module 101 is configured to obtain a general language model and a language model set including a plurality of domain language models, where the language model set further includes first domain information corresponding to the domain language models one to one; acquiring a voice to be recognized and corresponding second field information, selecting first field information similar to the second field information from the language model set, and acquiring a corresponding field language model as a target field language model;
the model combination module 102 is configured to perform model combination on the general language model and the target domain language model to obtain a target recognition model;
the voice recognition module 103 is configured to perform voice recognition on the voice to be recognized by using the target recognition model to obtain a recognition result.
In detail, in the embodiment of the present invention, when the modules in the specific-domain speech recognition apparatus 100 are used, the same technical means as the specific-domain speech recognition method described in fig. 1 are adopted, and the same technical effects can be produced, which is not described herein again.
Fig. 3 is a schematic structural diagram of an electronic device for implementing the domain-specific speech recognition method according to the present invention.
The electronic device may comprise a processor 10, a memory 11, a communication bus 12 and a communication interface 13, and may further comprise a computer program, such as a domain specific speech recognition program, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, for example a removable hard disk of the electronic device. The memory 11 may also be an external storage device of the electronic device in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used not only to store application software installed in the electronic device and various types of data, such as codes of a voice recognition program, etc., but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules (e.g., voice recognition programs, etc.) stored in the memory 11 and calling data stored in the memory 11.
The communication bus 12 may be a PerIPheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The bus may be divided into an address bus, a data bus, a control bus, etc. The communication bus 12 is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
Fig. 3 shows only an electronic device having components, and those skilled in the art will appreciate that the structure shown in fig. 3 does not constitute a limitation of the electronic device, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management and the like are realized through the power management device. The power source may also include any component of one or more dc or ac power sources, recharging devices, power failure classification circuits, power converters or inverters, power status indicators, and the like. The electronic device may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Optionally, the communication interface 13 may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which is generally used to establish a communication connection between the electronic device and other electronic devices.
Optionally, the communication interface 13 may further include a user interface, which may be a Display (Display), an input unit (such as a Keyboard (Keyboard)), and optionally, a standard wired interface, or a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The speech recognition program stored in the memory 11 of the electronic device is a combination of a plurality of computer programs, which when run in the processor 10, enable:
acquiring a general language model and a language model set comprising a plurality of domain language models, wherein the language model set further comprises first domain information which is in one-to-one correspondence with the domain language models;
acquiring a voice to be recognized and corresponding second field information, selecting first field information similar to the second field information from the language model set, and acquiring a corresponding field language model as a target field language model; performing model combination on the general language model and the target field language model to obtain a target identification model;
and carrying out voice recognition on the voice to be recognized by utilizing the target recognition model to obtain a recognition result.
Specifically, the processor 10 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the computer program, which is not described herein again.
Further, the electronic device integrated module/unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. The computer readable medium may be non-volatile or volatile. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
Embodiments of the present invention may also provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor of an electronic device, the computer program may implement:
acquiring a general language model and a language model set comprising a plurality of domain language models, wherein the language model set further comprises first domain information which is in one-to-one correspondence with the domain language models;
acquiring a voice to be recognized and corresponding second field information, selecting first field information similar to the second field information from the language model set, and acquiring a corresponding field language model as a target field language model; performing model combination on the general language model and the target field language model to obtain a target identification model;
and carrying out voice recognition on the voice to be recognized by utilizing the target recognition model to obtain a recognition result.
Further, the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A domain-specific speech recognition method, the method comprising:
acquiring a general language model and a language model set comprising a plurality of domain language models, wherein the language model set further comprises first domain information which is in one-to-one correspondence with the domain language models;
acquiring a voice to be recognized and corresponding second field information, selecting first field information similar to the second field information from the language model set, and acquiring a corresponding field language model as a target field language model;
performing model combination on the general language model and the target field language model to obtain a target identification model;
and carrying out voice recognition on the voice to be recognized by utilizing the target recognition model to obtain a recognition result.
2. The method of claim 1, wherein the selecting first domain information similar to the second domain information from the language model set and obtaining a corresponding domain language model as a target domain language model comprises:
vectorizing the first domain information respectively to obtain corresponding first domain vectors, and vectorizing the second domain information to obtain second domain vectors;
respectively calculating the similarity of the second domain vector and each first domain vector;
and selecting the first domain information corresponding to the maximum similarity, and acquiring a domain language model corresponding to the first domain information as the target domain language model.
3. The method of claim 2, wherein the vectorizing the second domain information to obtain a second domain vector comprises:
converting each character in the second field information into a vector to obtain a corresponding character vector;
and carrying out vector splicing on the character vectors according to the sequence of the corresponding characters in the second field information to obtain the second field vector.
4. The domain-specific speech recognition method of claim 1 wherein said model combining said generic language model with said target domain language model to obtain a target recognition model comprises:
acquiring n-gram entries which appear in the target domain language model but do not appear in the general language model, and performing probability interpolation on the acquired n-gram entries;
converting the n-gram items after the probability interpolation into a weighted finite state machine to obtain a domain decoding network;
converting the universal language model into a weighted finite state machine to obtain a universal decoding network;
and splicing the field decoding network and the general decoding network to obtain the target identification model.
5. The method of claim 4, wherein the concatenating the domain-specific decoding network and the generic decoding network to obtain the target recognition model comprises:
adding virtual nodes in the general decoding network and the field decoding network respectively, wherein the virtual nodes comprise a starting node and an end node;
and connecting a general decoding network and a field decoding network in series by using the starting node and the ending node to obtain the target identification model.
6. The method of claim 5, wherein the using the start node and the end node to concatenate a generic decoding network with a domain decoding network to obtain the target recognition model comprises:
carrying out directed connection on an end node added in the general decoding network and a start node added in the field decoding network according to the direction of the end node pointing to the start node; or
And carrying out directed connection on the end node added in the field decoding network and the start node added in the general decoding network according to the direction of the end node pointing to the start node.
7. The method according to any one of claims 1 to 6, wherein the performing speech recognition on the speech to be recognized by using the target recognition model to obtain a recognition result comprises:
framing the voice to be recognized to obtain a plurality of voice frames;
and inputting all the voice frames into the target recognition model in sequence according to time to obtain the recognition result.
8. A domain-specific speech recognition apparatus, comprising:
the model screening module is used for acquiring a general language model and a language model set comprising a plurality of field language models, wherein the language model set further comprises first field information which corresponds to the field language models one by one; acquiring a voice to be recognized and corresponding second field information, selecting first field information similar to the second field information from the language model set, and acquiring a corresponding field language model as a target field language model;
the model combination module is used for carrying out model combination on the general language model and the target field language model to obtain a target identification model;
and the voice recognition module is used for carrying out voice recognition on the voice to be recognized by utilizing the target recognition model to obtain a recognition result.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor;
wherein the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the domain-specific speech recognition method of any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a domain-specific speech recognition method according to any one of claims 1 to 7.
CN202111341004.2A 2021-11-12 2021-11-12 Specific field voice recognition method and device, electronic equipment and storage medium Active CN113782001B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111341004.2A CN113782001B (en) 2021-11-12 2021-11-12 Specific field voice recognition method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111341004.2A CN113782001B (en) 2021-11-12 2021-11-12 Specific field voice recognition method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113782001A true CN113782001A (en) 2021-12-10
CN113782001B CN113782001B (en) 2022-03-08

Family

ID=78873864

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111341004.2A Active CN113782001B (en) 2021-11-12 2021-11-12 Specific field voice recognition method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113782001B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106328147A (en) * 2016-08-31 2017-01-11 中国科学技术大学 Speech recognition method and device
US20170125013A1 (en) * 2015-10-29 2017-05-04 Le Holdings (Beijing) Co., Ltd. Language model training method and device
US20180053502A1 (en) * 2016-08-19 2018-02-22 Google Inc. Language models using domain-specific model components
CN110111780A (en) * 2018-01-31 2019-08-09 阿里巴巴集团控股有限公司 Data processing method and server
CN110610700A (en) * 2019-10-16 2019-12-24 科大讯飞股份有限公司 Decoding network construction method, voice recognition method, device, equipment and storage medium
CN111428485A (en) * 2020-04-22 2020-07-17 深圳市华云中盛科技股份有限公司 Method and device for classifying judicial literature paragraphs, computer equipment and storage medium
CN112002310A (en) * 2020-07-13 2020-11-27 苏宁云计算有限公司 Domain language model construction method and device, computer equipment and storage medium
CN112017645A (en) * 2020-08-31 2020-12-01 广州市百果园信息技术有限公司 Voice recognition method and device
CN112712804A (en) * 2020-12-23 2021-04-27 哈尔滨工业大学(威海) Speech recognition method, system, medium, computer device, terminal and application
CN112951206A (en) * 2021-02-08 2021-06-11 天津大学 Tibetan Tibet dialect spoken language identification method based on deep time delay neural network

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170125013A1 (en) * 2015-10-29 2017-05-04 Le Holdings (Beijing) Co., Ltd. Language model training method and device
US20180053502A1 (en) * 2016-08-19 2018-02-22 Google Inc. Language models using domain-specific model components
CN106328147A (en) * 2016-08-31 2017-01-11 中国科学技术大学 Speech recognition method and device
CN110111780A (en) * 2018-01-31 2019-08-09 阿里巴巴集团控股有限公司 Data processing method and server
CN110610700A (en) * 2019-10-16 2019-12-24 科大讯飞股份有限公司 Decoding network construction method, voice recognition method, device, equipment and storage medium
CN111428485A (en) * 2020-04-22 2020-07-17 深圳市华云中盛科技股份有限公司 Method and device for classifying judicial literature paragraphs, computer equipment and storage medium
CN112002310A (en) * 2020-07-13 2020-11-27 苏宁云计算有限公司 Domain language model construction method and device, computer equipment and storage medium
CN112017645A (en) * 2020-08-31 2020-12-01 广州市百果园信息技术有限公司 Voice recognition method and device
CN112712804A (en) * 2020-12-23 2021-04-27 哈尔滨工业大学(威海) Speech recognition method, system, medium, computer device, terminal and application
CN112951206A (en) * 2021-02-08 2021-06-11 天津大学 Tibetan Tibet dialect spoken language identification method based on deep time delay neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩程程 等: "语义文本相似度计算方法", 《华东师范大学学报(自然科学版)》 *

Also Published As

Publication number Publication date
CN113782001B (en) 2022-03-08

Similar Documents

Publication Publication Date Title
CN110287479B (en) Named entity recognition method, electronic device and storage medium
CN112667800A (en) Keyword generation method and device, electronic equipment and computer storage medium
CN111681681A (en) Voice emotion recognition method and device, electronic equipment and storage medium
CN112988963A (en) User intention prediction method, device, equipment and medium based on multi-process node
CN113064994A (en) Conference quality evaluation method, device, equipment and storage medium
CN113722483A (en) Topic classification method, device, equipment and storage medium
CN113378970A (en) Sentence similarity detection method and device, electronic equipment and storage medium
CN113360654B (en) Text classification method, apparatus, electronic device and readable storage medium
CN113344125B (en) Long text matching recognition method and device, electronic equipment and storage medium
CN114020886A (en) Speech intention recognition method, device, equipment and storage medium
CN113887941A (en) Business process generation method and device, electronic equipment and medium
CN113869456A (en) Sampling monitoring method and device, electronic equipment and storage medium
CN112579781A (en) Text classification method and device, electronic equipment and medium
CN114548114B (en) Text emotion recognition method, device, equipment and storage medium
CN113782001B (en) Specific field voice recognition method and device, electronic equipment and storage medium
CN115510188A (en) Text keyword association method, device, equipment and storage medium
CN115346095A (en) Visual question answering method, device, equipment and storage medium
CN114219367A (en) User scoring method, device, equipment and storage medium
CN114186028A (en) Consult complaint work order processing method, device, equipment and storage medium
CN113626605A (en) Information classification method and device, electronic equipment and readable storage medium
CN112632264A (en) Intelligent question and answer method and device, electronic equipment and storage medium
CN113515591A (en) Text bad information identification method and device, electronic equipment and storage medium
CN111414452A (en) Search word matching method and device, electronic equipment and readable storage medium
CN111680513B (en) Feature information identification method and device and computer readable storage medium
CN115203374A (en) Text abstract generating method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant