CN112287112A - Method, device and storage medium for constructing special pronunciation dictionary - Google Patents

Method, device and storage medium for constructing special pronunciation dictionary Download PDF

Info

Publication number
CN112287112A
CN112287112A CN201910678978.6A CN201910678978A CN112287112A CN 112287112 A CN112287112 A CN 112287112A CN 201910678978 A CN201910678978 A CN 201910678978A CN 112287112 A CN112287112 A CN 112287112A
Authority
CN
China
Prior art keywords
words
special
dictionary
word
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910678978.6A
Other languages
Chinese (zh)
Inventor
高星
赵立军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongguancun Kejin Technology Co Ltd
Original Assignee
Beijing Zhongguancun Kejin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongguancun Kejin Technology Co Ltd filed Critical Beijing Zhongguancun Kejin Technology Co Ltd
Priority to CN201910678978.6A priority Critical patent/CN112287112A/en
Publication of CN112287112A publication Critical patent/CN112287112A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a method, a device and a storage medium for constructing a proprietary pronunciation dictionary. Wherein, the method comprises the following steps: acquiring voice information, wherein the voice information is used for recording voice audio related to a specific specialty; determining text information corresponding to the voice information according to a predetermined algorithm; matching the text information with a universal dictionary, and determining words which are not included in the universal dictionary from the text information; determining a special word for constructing a special pronunciation dictionary from the uncategorized words; and adding the special words to the universal dictionary to construct a special pronunciation dictionary.

Description

Method, device and storage medium for constructing special pronunciation dictionary
Technical Field
The present application relates to the field of speech recognition and finance, and more particularly, to a method, an apparatus, and a storage medium for constructing a pronunciation-specific dictionary.
Background
Speech is the most natural way of interaction for humans. After the invention of the computer, the aim of people is to enable the machine to understand human language, understand the inherent meaning of the language and make correct answers. ASR is the first step to achieve this goal and is also a critical step. The decoder is a core module of the speech recognition system, and the pronunciation dictionary is an essential part of the decoder. The more the pronunciation dictionary covers, the higher the speech recognition accuracy for words that are likely to occur in a speech dialog. Especially for speech recognition systems in a specific domain, it is necessary to have as many personalized pronunciation dictionaries as possible containing proper nouns in the domain to make it possible to obtain speech recognition results with a high accuracy.
Most of the pronunciation dictionaries used in the current speech recognition systems are general dictionaries, and the vocabulary in the general pronunciation dictionaries has a wide coverage, such as common words in the fields including economy, news, finance, and literature, which is not enough to cover all proper nouns in each field. Taking the internet financial field as an example, the field is used for providing financial services for a large number of client groups, an IVR-based call center (CallCenter) mode is a main mode for providing consulting services for clients, a large number of special words are accumulated for a long time in the process of service interaction between seat personnel and the clients, and if the special words are not in a general pronunciation dictionary, the text accuracy rate recognized by ASR is not high when the voice data are recognized.
In view of the above technical problem that the universal pronunciation dictionary in the voice recognition system in the prior art cannot cover proper nouns in various fields, which affects the accuracy of the recognition result, no effective solution has been proposed at present.
Disclosure of Invention
Embodiments of the present disclosure provide a method, an apparatus, and a storage medium for constructing a proper pronunciation dictionary, so as to at least solve the technical problem that a universal pronunciation dictionary in a speech recognition system in the prior art cannot cover proper nouns in various fields, thereby affecting accuracy of recognition results.
According to an aspect of an embodiment of the present disclosure, there is provided a method of constructing a pronunciation-specific dictionary, including: acquiring voice information, wherein the voice information is used for recording voice audio related to a specific specialty; determining text information corresponding to the voice information according to a predetermined algorithm; matching the text information with a universal dictionary, and determining words which are not included in the universal dictionary from the text information; determining a special word for constructing a special pronunciation dictionary from the uncategorized words; and adding the special words to the universal dictionary to construct a special pronunciation dictionary.
According to another aspect of the embodiments of the present disclosure, there is also provided a storage medium including a stored program, wherein the method of any one of the above is performed by a processor when the program is run.
According to another aspect of the embodiments of the present disclosure, there is also provided an apparatus for constructing a pronunciation-specific dictionary, including: the acquisition module is used for acquiring voice information, and the voice information is used for recording voice audio related to a specific specialty; the determining module is used for determining text information corresponding to the voice information according to a preset algorithm; the matching module is used for matching the text information with the general dictionary and determining the words which are not included in the general dictionary from the text information; the screening module is used for determining a special word for constructing a special pronunciation dictionary from the uncarpeted words; and the construction module adds the special words to the universal dictionary to construct a special pronunciation dictionary.
According to another aspect of the embodiments of the present disclosure, there is also provided an apparatus for constructing a pronunciation-specific dictionary, including: a processor; and a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring voice information, wherein the voice information is used for recording voice audio related to a specific specialty; determining text information corresponding to the voice information according to a predetermined algorithm; matching the text information with a universal dictionary, and determining words which are not included in the universal dictionary from the text information; determining a special word for constructing a special pronunciation dictionary from the uncategorized words; and adding the special words to the universal dictionary to construct a special pronunciation dictionary.
In the embodiment of the disclosure, the corresponding text information can be determined according to the specific professional voice information, then the text information is matched with the general dictionary to determine the special word, and finally the special word is added to the general dictionary to complete the construction of the special pronunciation dictionary. Therefore, the purpose that a special professional utilizes the special pronunciation dictionary to perform voice recognition can be achieved, and the technical effects of accelerating recognition speed and improving recognition accuracy are achieved. And the technical problem that the accuracy of a recognition result is influenced because the universal pronunciation dictionary in the voice recognition system in the prior art cannot cover proper nouns in various fields is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the disclosure and together with the description serve to explain the disclosure and not to limit the disclosure. In the drawings:
fig. 1 is a hardware configuration block diagram of a [ computer terminal (or mobile device) ] for implementing the method according to embodiment 1 of the present disclosure;
FIG. 2 is a flowchart illustrating a method for constructing a lexicon of proprietary pronunciations according to a first aspect of embodiment 1 of the present disclosure;
FIG. 3 is a schematic overall flow chart of constructing a proprietary pronunciation dictionary according to the first aspect of embodiment 1 of the present disclosure;
FIG. 4 is a schematic diagram of an apparatus for constructing a proprietary pronunciation dictionary according to embodiment 2 of the present disclosure; and
fig. 5 is a schematic diagram of an apparatus for constructing a pronunciation-specific dictionary according to embodiment 3 of the present disclosure.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure. It is to be understood that the described embodiments are merely exemplary of some, and not all, of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
According to the present embodiment, there is provided an embodiment of a method of constructing a lexicon of proprietary pronunciations, it being noted that the steps illustrated in the flow chart of the drawings may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flow chart, in some cases the steps illustrated or described may be performed in an order different than here.
The method provided by the embodiment can be executed in a mobile terminal, a computer terminal or a similar operation device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing a method of constructing a lexicon of proprietary pronunciations. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission module 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the disclosed embodiments, the data processing circuit acts as a processor control (e.g., selection of a variable resistance termination path connected to the interface).
The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the method for constructing the proprietary pronunciation dictionary in the embodiment of the present disclosure, and the processor 102 executes various functional applications and data processing by executing the software programs and modules stored in the memory 104, that is, implementing the method for constructing the proprietary pronunciation dictionary of the application program. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).
It should be noted here that in some alternative embodiments, the computer device (or mobile device) shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.
In the operating environment, according to a first aspect of the present embodiment, there is provided a method for constructing a proprietary pronunciation dictionary, and fig. 2 shows a flowchart of the method, and with reference to fig. 2, the method includes:
s202: acquiring voice information, wherein the voice information is used for recording voice audio related to a specific specialty;
s204: determining text information corresponding to the voice information according to a predetermined algorithm;
s206: matching the text information with a universal dictionary, and determining words which are not included in the universal dictionary from the text information;
s208: determining a special word for constructing a special pronunciation dictionary from the uncategorized words; and
s210: and adding the special words to the universal dictionary to construct a special pronunciation dictionary.
As described in the background, the pronunciation dictionary used in most current speech recognition systems is a general dictionary, which has a wide vocabulary coverage, and common words in fields including economy, news, finance, and literature, for example, are not sufficient to cover all proper nouns in each field. Taking the internet financial field as an example, the field is used for providing financial services for a large number of client groups, an IVR-based call center (CallCenter) mode is a main mode for providing consulting services for clients, a large number of special words are accumulated for a long time in the process of service interaction between seat personnel and the clients, and if the special words are not in a general pronunciation dictionary, the text accuracy rate recognized by ASR is not high when the voice data are recognized.
For the technical problems in the background art, the method for constructing the special pronunciation dictionary provided by the technical solution of the embodiment can be applied to the field of internet finance, for example, but not limited to. Specifically, referring to fig. 2, in actual practice, the method first obtains voice information for recording voice audio associated with a particular specialty. Taking the internet financial profession as an example, customer service personnel of the internet financial company may generate some voice records (i.e., voice audio) related to the financial profession during the process of providing the customer with the telephone sales, the hotline service and the return visit service.
Further, text information corresponding to the voice information is determined according to a predetermined algorithm. For example, the speech information may be converted into a text by a speech recognition Algorithm (ASR) in the prior art, and then the text is subjected to a word segmentation operation to obtain a plurality of words, and the text information is composed according to the plurality of words.
Further, the text information is matched with the general dictionary, and the words which are not included in the general dictionary are determined from the text information. The universal dictionary may be, for example, a dictionary in the qinghua open source item thchs30, and the vocabulary of the universal dictionary is wide in coverage range, but only contains common words in each field and does not contain special words of each field or each unit. The unrecorded words may be, for example, words specific to internet finance companies. Matching the text information with the universal dictionary to obtain words contained in the text information but not contained in the special dictionary, namely: words not included. For example: the unrecorded words are "ease flower", "immediately finance", "immediately staging", etc. Then, the proper words used to construct the proper pronunciation dictionary are determined from the uncalendered words, such as: the exclusive word is "ease flower". Finally, the special words are supplemented into the general dictionary, thereby completing the construction of the special pronunciation dictionary.
Therefore, by the mode, the corresponding text information can be determined according to the specific professional voice information, then the text information is matched with the general dictionary to determine the special words, and finally the special words are added to the general dictionary to complete the construction of the special pronunciation dictionary. Therefore, the purpose of performing voice recognition work by using the special pronunciation dictionary in the specific field can be achieved, and the technical effects of accelerating recognition speed and improving recognition accuracy are achieved. And the technical problem that the accuracy of a recognition result is influenced because the universal pronunciation dictionary in the voice recognition system in the prior art cannot cover all proper nouns in each field is solved.
Optionally, the operation of determining a proper word for constructing a proper pronunciation dictionary from the uncategorized words comprises: calculating the word frequency of words contained in the text information; selecting words with word frequency larger than a first threshold value from the text information as candidate words; and selecting the words with the word number smaller than a second threshold value from the candidate word words as the special words.
Specifically, in the operation of determining a proper word for constructing a proper pronunciation dictionary from the uncategorized words, the word frequency of the words contained in the text information is first calculated, for example: the non-included words and the corresponding word frequencies are: the word frequency corresponding to "ease flower" is 900, the word frequency corresponding to "finance at once" is 800, and the word frequency corresponding to "staging at once" is 600. Further, screening words with a word frequency greater than a first threshold from the uncategorized words as candidate words, for example: the first threshold is 700, thus yielding the candidate words "ease flower" and "immediately finance". Then, screening the candidate words for words with a word number less than a second threshold as the proprietary words, for example: the second threshold is 4, so the proprietary word ultimately determined is "ease flower". Therefore, by the mode, the special words can be screened according to the word frequency and the word number of the words which are not included, the importance degree and the accuracy of the determined special words are reflected, and the constructed special pronunciation dictionary is more rigorous.
Optionally, the operation of adding the proprietary word to the universal dictionary and constructing the proprietary pronunciation dictionary includes: determining phonemes corresponding to the proprietary words; and adding the special words and the corresponding phonemes to the general dictionary to construct a special pronunciation dictionary.
Specifically, in the operation of adding the exclusive word to the general dictionary and constructing the exclusive pronunciation dictionary, the phoneme corresponding to the exclusive word is first determined, that is: and determining a phoneme corresponding to the special word 'ease flower', and adding the special word and the corresponding phoneme to the universal dictionary to construct a special pronunciation dictionary. It should be noted that there is a mapping relationship between the exclusive word and the phoneme added to the general dictionary.
Optionally, the operation of generating phonemes corresponding to the proprietary word includes: determining pinyin corresponding to the special words; and determining the phoneme corresponding to the special word according to the initial consonant and the final sound contained in the pinyin corresponding to the special word.
Specifically, in the operation of generating the phoneme corresponding to the exclusive word, the pinyin corresponding to the exclusive word (ayihua) is first determined as anyihua, and then the phoneme corresponding to the exclusive word is determined from the initial consonant and the final sound included in the pinyin (anyihua).
Furthermore, fig. 3 shows an overall flow diagram for constructing a proprietary pronunciation dictionary, first acquiring proprietary speech data (corresponding to the speech information mentioned above), and then acquiring recognized text data by using an ASR module (a speech recognition algorithm in the prior art), namely: the private voice data is converted into text data (corresponding to text information). Further, performing word segmentation operation on the text data, and then judging whether the word segmentation result is a new word which does not exist in the dictionary, namely: and matching the text data with the universal dictionary to determine the words which are not included in the universal dictionary. The oov (out of vocabularies) file in the figure is used to record the above-mentioned unrecorded words in the general dictionary, and the unrecorded words generated during matching of the textual information with the general dictionary are added to the oov file. In addition, when it is determined that no new word is present (no unrecorded word), the word segmentation operation is performed on the text data again. Further, the special words used for constructing the special pronunciation dictionary are screened from the oov file, then phonemes corresponding to the special words are generated, and finally the special words and the corresponding phonemes are added to the general dictionary to construct the special pronunciation dictionary (corresponding to the personalized dictionary in fig. 3).
Further, referring to fig. 1, according to a second aspect of the present embodiment, a storage medium 104 is provided. The storage medium 104 comprises a stored program, wherein the method of any of the above is performed by a processor when the program is run.
Therefore, according to the embodiment, the corresponding text information can be determined according to the specific professional voice information, then the text information is matched with the general dictionary to determine the special word, and finally the special word is added to the general dictionary to complete the construction of the special pronunciation dictionary. Therefore, the purpose that a special professional utilizes the special pronunciation dictionary to perform voice recognition can be achieved, and the technical effects of accelerating recognition speed and improving recognition accuracy are achieved. And the technical problem that the accuracy of a recognition result is influenced because the universal pronunciation dictionary in the voice recognition system in the prior art cannot cover all proper nouns in each field is solved.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
Fig. 4 shows an apparatus 400 for constructing a lexicon of proprietary pronunciation according to the present embodiment, the apparatus 400 corresponding to the method according to the first aspect of embodiment 1. Referring to fig. 4, the apparatus 400 includes: an obtaining module 410, configured to obtain voice information, where the voice information is used to record voice audio related to a specific specialty; a determining module 420, configured to determine text information corresponding to the voice information according to a predetermined algorithm; the matching module 430 is configured to match the text information with the general dictionary, and determine words that are not included in the general dictionary from the text information; a screening module 440 for determining a proper word for constructing a proper pronunciation dictionary from the unrecorded words; and a construction module 450 for adding the special words to the universal dictionary to construct a special pronunciation dictionary.
Optionally, the screening module 440 includes: the word frequency calculation submodule is used for calculating the word frequency of the words contained in the text information; the first screening submodule is used for selecting words with word frequency larger than a first threshold value from the text information as candidate words; and the second screening submodule is used for selecting the words with the word number smaller than a second threshold value from the candidate word languages as the special words.
Optionally, the building block 450 comprises: the phoneme calculation submodule is used for determining phonemes corresponding to the special words; and the construction submodule is used for adding the special words and the corresponding phonemes to the universal dictionary to construct a special pronunciation dictionary.
Optionally, the phoneme computation submodule includes: the first determining unit is used for determining pinyin corresponding to the special words; and the second determining unit is used for determining the phoneme corresponding to the special word according to the initial consonant and the final sound contained in the pinyin corresponding to the special word.
Therefore, according to the embodiment, by the apparatus 400 for constructing the pronunciation-specific dictionary, the corresponding text information can be determined according to the specific professional voice information, then the text information and the general dictionary are matched to determine the special words, and finally the special words are added to the general dictionary to complete the construction of the pronunciation-specific dictionary. Therefore, the purpose that a special professional utilizes the special pronunciation dictionary to perform voice recognition can be achieved, and the technical effects of accelerating recognition speed and improving recognition accuracy are achieved. And the technical problem that the accuracy of a recognition result is influenced because the universal pronunciation dictionary in the voice recognition system in the prior art cannot cover all proper nouns in each field is solved.
Example 3
Fig. 5 shows an apparatus 500 for constructing a lexicon of exclusive pronunciation according to the present embodiment, the apparatus 500 corresponding to the method according to the first aspect of embodiment 1. Referring to fig. 5, the apparatus 500 includes: a processor 510; and a memory 520 coupled to processor 510 for providing processor 510 with instructions to process the following process steps: acquiring voice information, wherein the voice information is used for recording voice audio related to a specific specialty; determining text information corresponding to the voice information according to a predetermined algorithm; matching the text information with a universal dictionary, and determining words which are not included in the universal dictionary from the text information; determining a special word for constructing a special pronunciation dictionary from the uncategorized words; and adding the special words to the universal dictionary to construct a special pronunciation dictionary.
Optionally, the operation of determining a proper word for constructing a proper pronunciation dictionary from the uncategorized words comprises: calculating the word frequency of words contained in the text information; selecting words with word frequency larger than a first threshold value from the text information as candidate words; and selecting the words with the word number smaller than a second threshold value from the candidate word words as the special words.
Optionally, the operation of adding the proprietary word to the universal dictionary and constructing the proprietary pronunciation dictionary includes: determining phonemes corresponding to the proprietary words; and adding the special words and the corresponding phonemes to the general dictionary to construct a special pronunciation dictionary.
Optionally, the operation of generating phonemes corresponding to the proprietary word includes: determining pinyin corresponding to the special words; and determining the phoneme corresponding to the special word according to the initial consonant and the final sound contained in the pinyin corresponding to the special word.
Therefore, according to the embodiment, by the apparatus 500 for constructing the pronunciation-specific dictionary, the corresponding text information can be determined according to the specific professional voice information, then the text information and the general dictionary are matched to determine the special words, and finally the special words are added to the general dictionary to complete the construction of the pronunciation-specific dictionary. Therefore, the purpose that a special professional utilizes the special pronunciation dictionary to perform voice recognition can be achieved, and the technical effects of accelerating recognition speed and improving recognition accuracy are achieved. And the technical problem that the accuracy of a recognition result is influenced because the universal pronunciation dictionary in the voice recognition system in the prior art cannot cover all proper nouns in each field is solved.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A method of constructing a proprietary pronunciation dictionary, comprising:
acquiring voice information, wherein the voice information is used for recording voice audio related to a specific specialty;
determining text information corresponding to the voice information according to a preset algorithm;
matching the text information with a general dictionary, and determining words which are not included in the general dictionary from the text information;
determining a proper word for constructing a proper pronunciation dictionary from the unlisted words; and
and adding the special words to the universal dictionary to construct the special pronunciation dictionary.
2. The method of claim 1, wherein determining a proprietary word for building a proprietary pronunciation dictionary from the unlisted words comprises:
calculating the word frequency of words contained in the text information;
selecting words with word frequency larger than a first threshold value from the text information as candidate words; and
selecting words from the candidate words having a word count less than a second threshold as the proprietary words.
3. The method of claim 1, wherein appending the proprietary terms to the universal dictionary, the operation of constructing the proprietary pronunciation dictionary, comprises:
determining a phoneme corresponding to the proprietary word; and
and adding the special words and the corresponding phonemes to the general dictionary to construct the special pronunciation dictionary.
4. The method of claim 3, wherein the operation of generating phonemes corresponding to the proprietary word comprises:
determining pinyin corresponding to the special word; and
and determining the phoneme corresponding to the special word according to the initial consonant and the final sound contained in the pinyin corresponding to the special word.
5. A storage medium comprising a stored program, wherein the method of any one of claims 1 to 4 is performed by a processor when the program is run.
6. An apparatus for constructing a lexicon of proprietary pronunciations, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring voice information which is used for recording voice audio related to a specific specialty;
the determining module is used for determining text information corresponding to the voice information according to a preset algorithm;
a matching module for matching the text information with a general dictionary and determining the words which are not included in the general dictionary from the text information
The screening module is used for determining a special word for constructing a special pronunciation dictionary from the unlisted words; and
and the construction module is used for adding the special words to the universal dictionary and constructing the special pronunciation dictionary.
7. The apparatus of claim 6, wherein the screening module comprises:
the word frequency calculation submodule is used for calculating the word frequency of the words contained in the text information;
the first screening submodule is used for selecting words with word frequency larger than a first threshold value from the text information as candidate words; and
and the second screening submodule is used for selecting the words with the word number smaller than a second threshold value from the candidate words as the special words.
8. The apparatus of claim 6, wherein the building block comprises:
a phoneme calculation submodule for determining phonemes corresponding to the proprietary word; and
and the construction submodule is used for adding the special words and the corresponding phonemes to the universal dictionary to construct the special pronunciation dictionary.
9. The apparatus of claim 8, wherein the phoneme computation submodule comprises:
the first determining unit is used for determining the pinyin corresponding to the special word; and
and the second determining unit is used for determining the phoneme corresponding to the special word according to the initial consonant and the final sound contained in the pinyin corresponding to the special word.
10. An apparatus for constructing a lexicon of proprietary pronunciations, comprising:
a processor; and
a memory coupled to the processor for providing instructions to the processor for processing the following processing steps:
acquiring voice information, wherein the voice information is used for recording voice audio related to a specific specialty;
determining text information corresponding to the voice information according to a preset algorithm;
matching the text information with a general dictionary, and determining words which are not included in the general dictionary from the text information;
determining a proper word for constructing a proper pronunciation dictionary from the unlisted words; and
and adding the special words to the universal dictionary to construct the special pronunciation dictionary.
CN201910678978.6A 2019-07-25 2019-07-25 Method, device and storage medium for constructing special pronunciation dictionary Pending CN112287112A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910678978.6A CN112287112A (en) 2019-07-25 2019-07-25 Method, device and storage medium for constructing special pronunciation dictionary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910678978.6A CN112287112A (en) 2019-07-25 2019-07-25 Method, device and storage medium for constructing special pronunciation dictionary

Publications (1)

Publication Number Publication Date
CN112287112A true CN112287112A (en) 2021-01-29

Family

ID=74419631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910678978.6A Pending CN112287112A (en) 2019-07-25 2019-07-25 Method, device and storage medium for constructing special pronunciation dictionary

Country Status (1)

Country Link
CN (1) CN112287112A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103680498A (en) * 2012-09-26 2014-03-26 华为技术有限公司 Speech recognition method and speech recognition equipment
CN104462071A (en) * 2013-09-19 2015-03-25 株式会社东芝 SPEECH TRANSLATION APPARATUS and SPEECH TRANSLATION METHOD
US20150127326A1 (en) * 2013-11-05 2015-05-07 GM Global Technology Operations LLC System for adapting speech recognition vocabulary
CN105389349A (en) * 2015-10-27 2016-03-09 上海智臻智能网络科技股份有限公司 Dictionary updating method and apparatus
US20170018268A1 (en) * 2015-07-14 2017-01-19 Nuance Communications, Inc. Systems and methods for updating a language model based on user input
CN107660303A (en) * 2015-06-26 2018-02-02 英特尔公司 The language model of local speech recognition system is changed using remote source
CN109036377A (en) * 2018-07-26 2018-12-18 中国银联股份有限公司 A kind of phoneme synthesizing method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103680498A (en) * 2012-09-26 2014-03-26 华为技术有限公司 Speech recognition method and speech recognition equipment
CN104462071A (en) * 2013-09-19 2015-03-25 株式会社东芝 SPEECH TRANSLATION APPARATUS and SPEECH TRANSLATION METHOD
US20150127326A1 (en) * 2013-11-05 2015-05-07 GM Global Technology Operations LLC System for adapting speech recognition vocabulary
CN107660303A (en) * 2015-06-26 2018-02-02 英特尔公司 The language model of local speech recognition system is changed using remote source
US20170018268A1 (en) * 2015-07-14 2017-01-19 Nuance Communications, Inc. Systems and methods for updating a language model based on user input
CN105389349A (en) * 2015-10-27 2016-03-09 上海智臻智能网络科技股份有限公司 Dictionary updating method and apparatus
CN109036377A (en) * 2018-07-26 2018-12-18 中国银联股份有限公司 A kind of phoneme synthesizing method and device

Similar Documents

Publication Publication Date Title
US20230214847A1 (en) Third-party service for suggesting a response to a received message
CN108737667B (en) Voice quality inspection method and device, computer equipment and storage medium
US10902189B2 (en) System and method to minimally reduce characters in character limiting scenarios
US20110165912A1 (en) Personalized text-to-speech synthesis and personalized speech feature extraction
JP2001273283A (en) Method for identifying language and controlling audio reproducing device and communication device
CN109979450B (en) Information processing method and device and electronic equipment
CN111523305A (en) Text error correction method, device and system
CN111310440A (en) Text error correction method, device and system
CN111698552A (en) Video resource generation method and device
CN115129878A (en) Conversation service execution method, device, storage medium and electronic equipment
CN110717012A (en) Method, device, equipment and storage medium for recommending grammar
CN113436614A (en) Speech recognition method, apparatus, device, system and storage medium
CN111128130B (en) Voice data processing method and device and electronic device
CN108595141A (en) Pronunciation inputting method and device, computer installation and computer readable storage medium
US20100067670A1 (en) Voice response unit harvesting
CN114283777A (en) Speech synthesis method, apparatus and storage medium
CN112287112A (en) Method, device and storage medium for constructing special pronunciation dictionary
CN116303937A (en) Reply method, reply device, electronic equipment and readable storage medium
CN113392175A (en) Method, apparatus and storage medium for predicting dialect combinations
CN113314122B (en) Method, apparatus, and medium for determining optimal dialect using single voice robot
CN114510556A (en) Method, apparatus and storage medium for determining dialogs
CN113971947A (en) Speech synthesis method, apparatus and storage medium
CN111161706A (en) Interaction method, device, equipment and system
US9369566B2 (en) Providing visual cues for a user interacting with an automated telephone system
CN113190665B (en) Intention recognition method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination