CN112905024B - Syllable recording method and device for word - Google Patents

Syllable recording method and device for word Download PDF

Info

Publication number
CN112905024B
CN112905024B CN202110079369.6A CN202110079369A CN112905024B CN 112905024 B CN112905024 B CN 112905024B CN 202110079369 A CN202110079369 A CN 202110079369A CN 112905024 B CN112905024 B CN 112905024B
Authority
CN
China
Prior art keywords
letter
word
syllable
target
string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110079369.6A
Other languages
Chinese (zh)
Other versions
CN112905024A (en
Inventor
李博林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202110079369.6A priority Critical patent/CN112905024B/en
Publication of CN112905024A publication Critical patent/CN112905024A/en
Application granted granted Critical
Publication of CN112905024B publication Critical patent/CN112905024B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The application is suitable for the technical field of letter input methods, and provides a syllable recording method of words, which comprises the steps of obtaining at least one word from a plurality of words to be processed, splitting the word according to a preset word spelling rule to obtain a training set, wherein the training set comprises syllable indexes of the preset word spelling rule and position information of each letter in the syllable indexes, splitting the training set to determine the conditional probability of the position information corresponding to at least two letters in the word and character strings of the two letter combination, determining target syllable indexes corresponding to target words and target syllables based on the conditional probability, and the target syllable indexes meet the preset word spelling rule. The application also provides a syllable recording device of the words, which ensures that the words and syllables of the words are effectively managed in the database, thereby greatly improving the speed of word recording.

Description

Syllable recording method and device for word
Technical Field
The application belongs to the technical field of letter input methods, and particularly relates to a syllable recording method and device of words.
Background
Disclosure of Invention
The embodiment of the application provides a syllable recording method and device for words, which can solve the problem of users only.
In a first aspect, an embodiment of the present application provides a syllable recording method for a word, including:
acquiring at least one word from a plurality of words to be processed, wherein the word comprises at least three letters;
splitting the words according to a preset word spelling rule to obtain a training set, wherein the training set comprises syllable indexes of the preset word spelling rule and position information of each letter in the syllable indexes;
the training set is segmented to determine the conditional probability of the character strings of at least two letters in the word and the two letter combination corresponding to the position information;
and determining a target word and a target syllable index corresponding to the target syllable based on the conditional probability, wherein the target syllable index meets the preset word spelling rule.
As a further improvement of the above technical solution, the determining, based on the conditional probability, a target word and a target syllable index corresponding to the target syllable includes:
acquiring first letters in the character string and first position information corresponding to the first letters;
determining a second letter associated with the first letter according to the first letter and the first position information;
and calculating the conditional probability that the first letter, the first position information and the second letter are simultaneously present in the target word.
As a further improvement of the above technical solution, the calculating the conditional probability that the first letter, the first location information, and the second letter exist in the target word at the same time includes:
segmenting the target word according to the preset spelling rule to obtain a segmentation result, wherein the segmentation result comprises the target word and target syllables corresponding to the target word;
judging whether the segmentation result exists in a database or not;
if not, storing the target syllable, the target word and the target syllable index into the database.
As a further improvement of the above technical solution, the syllable index includes a beginning, a middle, and an ending of a syllable, the beginning, the middle, and the ending of the syllable corresponding to at least one letter in the word.
As a further improvement of the above technical solution, when the number of syllable indexes corresponding to the preset word spelling rule of the word is three, traversing each letter of the word to obtain second position information of each letter, and generating a first training set corresponding to the word;
obtaining a first character string, a second character string and a third character string and letter values in the first character string, the second character string and the third character string according to the second position information and the syllable index, wherein each character string comprises at least two letters;
and marking the first letter string, the second letter string and the third letter string as a beginning syllable, a middle syllable and an ending syllable respectively, and marking each letter correspondingly according to the letter numerical value in each letter string and the second position information.
As a further improvement of the above solution, after generating the first training set of words, it comprises:
and automatically segmenting the first training set by adopting a hidden Markov model, wherein the conditional probability corresponding to the first training set is expressed as follows:
taking the first letter of the first training set existing in the first letter string as an example, A, B is expressed as:
a: the letter is the initial of syllable;
b: the letter is a first letter and one letter after the letter is a second letter;
the first letter is counted as the first letter, the probability of the second letter next to the first letter is recorded as P (AB), and the probability of the first letter as any letter and the word behind the first letter as the second letter is counted as P (B).
As a further improvement of the above solution, after generating the first training set of words, it comprises:
and automatically segmenting the first training set by adopting a hidden Markov model, wherein the conditional probability corresponding to the first training set is expressed as follows:
taking the first letter of the first training set existing in the first letter string as an example, A, B is expressed as:
a: the letter is the initial of syllable;
b: the letter is a first letter and one letter following the letter is a second letter;
the first letter is counted as the first letter, the probability of the second letter next to the first letter is recorded as P (AB), and the probability of the first letter as any letter and the word behind the first letter as the second letter is counted as P (B).
In a second aspect, an embodiment of the present application provides a syllable recording apparatus for a word, including:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring at least one word from a plurality of words to be processed, and the word comprises at least three letters;
the splitting module is used for splitting the words according to a preset word spelling rule to obtain a training set, wherein the training set comprises syllable indexes of the preset word spelling rule and position information of each letter in the syllable indexes;
the calculation module is used for segmenting the training set to determine the conditional probability of the position information corresponding to at least two letters in the word and the character strings combined by the two letters;
and the recording module is used for determining a target word and a target syllable index corresponding to the target syllable based on the conditional probability, wherein the target syllable index meets the preset word spelling rule.
In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the method when executing the computer program.
In a fourth aspect, an embodiment of the present application provides a computer readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the above method.
Compared with the prior art, the embodiment of the application has the beneficial effects that:
the method comprises the steps of obtaining at least one word from a plurality of words to be processed, wherein the word comprises at least three letters, splitting the word according to a preset word spelling rule to obtain a training set, wherein the training set comprises syllable indexes of the preset word spelling rule and position information of each letter in the syllable indexes, segmenting the training set to determine conditional probability that at least two letters in the word and character strings of the combination of the two letters correspond to the position information, and determining target words and target syllable indexes corresponding to the target syllables based on the conditional probability, wherein the target syllable indexes meet the preset word spelling rule. According to the application, multi-syllable words can be split according to the preset word spelling rule to obtain a training set, a large number of words are automatically segmented by the training set to determine the conditional probability of the corresponding position information of at least two letters in the words and the character strings combining the two letters, the position information is obtained by marking each letter according to syllable index in advance, thus each letter in unified words can be precisely positioned, the word segmentation efficiency can be improved, a new word can be given at will, the position information, namely the state, of all letters in the word can be judged, the words can be divided into a plurality of syllables through the state of the letters and recorded in a database, the effective management of the words and the syllables of the words in the database is ensured, and the word recording speed is greatly improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a syllable recording method of a word according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a syllable recording method of a word according to a second embodiment of the present application;
FIG. 3 is a flowchart illustrating a syllable recording method of a word according to another embodiment of the present application;
FIG. 4 is a schematic diagram of a syllable recording device for words according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Description of main reference numerals:
300-syllable recording device of word; 310-an acquisition module; 320-splitting the module; 330-a computing module; 340-a recording module; 400-terminal equipment; 410-a memory; 420-a processor; 430-computer program.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a condition or event described is determined" or "if a condition or event described is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a condition or event described" or "in response to detection of a condition or event described".
Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.
As shown in fig. 1, fig. 1 shows a schematic implementation process of a syllable recording method of a word according to the first embodiment of the present application, where the syllable recording method of a word may include the following steps S101 to S103:
s101: acquiring at least one word from a plurality of words to be processed, wherein the word comprises at least three letters;
in this embodiment, english words are typically composed of letters that form syllables, how many vowels there are in a word, and words that include only one syllable, and words that include two, three, or more syllables, respectively referred to as mono-, di-and multi-syllable words. Phones with three or more vowels in syllables are called multi-syllable words. The words may contain English phrases, characters, symbols and the like, so that all words can be split quickly.
S102: splitting the words according to a preset word spelling rule to obtain a training set, wherein the training set comprises syllable indexes of the preset word spelling rule and position information of each letter in the syllable indexes;
in this embodiment, the letters are the smallest unit for writing english words, the phonemes are the smallest phonetic unit in the language, the letters are divided into vowels and consonants, the letters in the words and each phoneme in the pronunciation of the words do not have a correspondence, sometimes one letter corresponds to one phoneme, sometimes two or three letters correspond to one phoneme. The syllable division means that when a word is divided into syllables, sometimes according to its phonetic form, sometimes according to its writing form, taking a test and a nice as examples, from the writing form, there are two vowels in the test, but these two vowels ea are combined together to send a single sound, so the test calculates a syllable intelligently, there are two vowels in the nice, but the e of the word tail does not sound, so the nice also can be a monosyllabic word. For example, taking the english word diffent as an example, splitting it into syllable indexes of "dif-fe-ret", dif can be used as the beginning of syllable indexes of the word diffent, fe can be used as the middle of syllable indexes of the word diffent, and ret can be used as the ending of syllable indexes of the word diffent, wherein the position information of letter d refers to the first letter at the beginning of syllable indexes, the training set comprises the syllable indexes of "dif-fe-ret" and the position information of each letter in the syllable indexes, and the more training sets, the more training results are obtained, so that the syllable splitting training process is more effective.
S103: the training set is segmented to determine the conditional probability that at least two letters in the word and the character strings of the two letter combinations correspond to the position information;
optionally, when the number of syllable indexes corresponding to the preset word spelling rule of the word is three, traversing each letter of the word to obtain second position information of each letter, and generating a first training set corresponding to the word;
obtaining a first character string, a second character string and a third character string and letter values in the first character string, the second character string and the third character string according to the second position information and the syllable index, wherein each character string comprises at least two letters;
and marking the first letter string, the second letter string and the third letter string as a beginning syllable, a middle syllable and an ending syllable respectively, and marking each letter correspondingly according to the letter numerical value in each letter string and the second position information.
In this embodiment, segmenting word and syllable indexes in a training set, and splitting the condition according to the preset word spelling rule to obtain con-di-position when the word is the condition;
three letters in con are labeled B, M, E in turn, two letters in di are labeled B, E in turn, and four letters in tion are labeled B, M, M, E in turn to generate a first training set in which the word is a condition, where B represents a Begin character, M represents a Middle character, and E represents an End character.
Optionally, the first training set is automatically segmented by using a hidden markov model, and the conditional probability corresponding to the first training set is expressed as:
taking the first letter of the first training set existing in the first letter string as an example, A, B is expressed as:
a: the letter is the initial of syllable;
b: the letter is a first letter, and one letter behind the letter is a second letter;
the first letter is counted as the first letter, the probability of the second letter next to the first letter is recorded as P (AB), and the probability of the first letter as any letter and the word behind the first letter as the second letter is counted as P (B).
And automatically segmenting the first training set by adopting a hidden Markov model, wherein the conditional probability corresponding to the first training set is expressed as follows: the method comprises the steps of carrying out a first treatment on the surface of the
Taking the letter c in the first training set as an example, A, B is expressed as:
a: the letter is the initial of syllable;
b: the letter is the letter c, and one letter after this letter is o;
the probability of counting the letter c as the first letter and the word after the letter c as o is denoted as P (AB), and the probability of counting the letter c as any letter and the word after the letter as o is denoted as P (B).
It should be noted that, the state of the letter is judged by using the letter after the letter c, and the letter o after the letter can be used simultaneously in actual operation; or taking the letter o as an example, the letter o is preceded by the letter o, and the like, so as to judge the state of one letter in the word more accurately.
It will be appreciated that the hidden Markov model (Hidden Markov Model, HMM) is a statistical model having a double stochastic process consisting of a hidden Markov chain of states and a stochastic process describing the corresponding observations for each state, respectively, which is used to describe a Markov process containing hidden unknown parameters, determine hidden parameters of the process from the observable parameters, and then use these parameters for further analysis. Its state cannot be observed directly, but can be observed by a sequence of observation vectors, each of which is represented by a certain probability density distribution as various states, each of which is generated by a sequence of states having a corresponding probability density distribution. The training set can be automatically segmented through the hidden Markov model to obtain various results, and the training set can be perfected, so that the segmentation of the training set, namely syllable training, is more accurate.
S104: and determining a target word and a target syllable index corresponding to the target syllable based on the conditional probability, wherein the target syllable index meets the preset word spelling rule.
In this embodiment, taking the word condition, con-di-tion as an example again, the first letter cdt in the syllable index is selected, that is, the letter c is the first letter used for the beginning of the syllable index, the letter d is the first letter in the middle of the syllable index, the letter t is the first letter used for the end of the syllable index, and the letters c, d and t are marked by the letter B.
Fig. 2 shows a schematic implementation process of the syllable recording method of the word provided in the second embodiment of the present application, further, S104 further includes S1041 to S1043, and specifically includes the following steps:
s1041: acquiring first position information corresponding to a first letter in the character string;
s1042: determining a second letter associated with the first letter according to the first letter and the first position information;
s1043: and calculating the conditional probability that the first letter, the first position information and the second letter coexist in the target word.
Optionally, when the number of syllable indexes corresponding to the preset word spelling rule of the word is three, traversing each letter of the word to obtain second position information of each letter, and generating a first training set corresponding to the word;
obtaining a first character string, a second character string and a third character string and letter values in the first character string, the second character string and the third character string according to the second position information and the syllable index, wherein the letter value of the first character string is one, and the second character string and the third character string comprise at least two letters;
and marking letters in the first letter string as single characters corresponding to single syllables, marking the second letter string and the third letter string as beginning characters, middle characters and ending characters respectively, and marking each letter according to the letter values in the second character string and the third character string and the second position information.
In this embodiment, when the word is abandon, splitting abandon according to the preset word spelling rule to obtain a-ban-don;
a is a second training set with a Single letter labeled S, three letters in ban labeled B, M, E in turn, three letters in don labeled B, M, E in turn, and the word abandon has been generated, where S represents a Single character, B represents a Begin character, M represents a Middle character, and E represents an End character. For example, dictionary B stores all words with begin identification, and training word abandon results in dictionary B as { aba=1, ndn=1 }. The first letter may be any one of the beginning, middle or end of the syllable index, the first position information is the state of the first letter in the whole word, and the second letter has an association, i.e. a position relationship, with the first letter, such as the letter B following the first letter a, and the second position information of B in the syllable index is identified as B.
It should be noted that, the initial letters of all syllables of each word are sequentially formed into character strings, the sequentially arranged character strings are integrated into a total syllable initial index, the character strings in the alphabetic sequential index formed by 26 groups of total syllables are correspondingly formed into word indexes with character strings sequentially arranged from the total syllable initial index, all the words in the word indexes are respectively corresponding to all the character strings of the total syllable initial index, some character strings comprise letters with the same character strings of the total syllable initial index, the letter ordering of the words in the word indexes is also completely the same as the letter ordering of one character string of the total syllable initial index, the total syllable initial index is used as a first-level index to be associated with the word indexes, and the word indexes are second-level indexes subordinate to the total syllable initial index, so that a plurality of words can be conveniently associated through the syllable indexes and the use is convenient.
Fig. 3 is a schematic diagram showing an implementation process of a syllable recording method of a word according to another embodiment of the present application, and further, S1043 further includes E1 to E3, specifically including the following steps:
e1: segmenting the target word according to the preset spelling rule to obtain a segmentation result, wherein the segmentation result comprises the target word and target syllables corresponding to the target word;
e2: judging whether the segmentation result exists in a database or not;
e3: if not, storing the target syllable, the target word and the target syllable index into the database.
In this embodiment, several words may be manually segmented and recorded in the database, a hidden markov chain is used as a principle, two different letters and the probabilities of the combination of the two different letters as the beginning, the middle and the ending of syllables in the word are calculated, segmentation is performed according to the probabilities that the word is the beginning, the middle and the ending of syllables, syllables which are not recorded in the database exist in the segmentation result, and syllable indexes which are syllable components of new syllables, words and words are added into the database; when all the cut words are judged to exist in the database, syllable components of the words and the syllables of the words are added into the database, the repeated syllables can be removed, and the storage space of the database is saved.
Referring to fig. 4, the present application also provides a syllable recording apparatus 300 of a word, comprising:
an obtaining module 310, configured to obtain at least one word from a plurality of words to be processed, where the word includes at least three letters;
the splitting module 320 is configured to split the word according to a preset word spelling rule to obtain a training set, where the training set includes syllable indexes of the preset word spelling rule and position information of each letter in the syllable indexes;
a calculation module 330, configured to segment the training set to determine a conditional probability that at least two letters in the word and a character string of the two-letter combination correspond to the location information;
and the recording module 340 is configured to determine a target word and a target syllable index corresponding to the target syllable based on the conditional probability, where the target syllable index meets the preset word spelling rule.
The application provides a syllable recording method and device of words, which are characterized in that at least one word is obtained from a plurality of words to be processed, the word comprises at least three letters, the word is split according to a preset word spelling rule to obtain a training set, the training set comprises syllable indexes of the preset word spelling rule and position information of each letter in the syllable indexes, the training set is segmented to determine conditional probability of the position information corresponding to at least two letters in the word and character strings of the two letter combination, and the target syllable indexes corresponding to target words and the target syllables are determined based on the conditional probability, wherein the target syllable indexes meet the preset word spelling rule. According to the application, multi-syllable words can be split according to the preset word spelling rule to obtain a training set, a large number of words are automatically segmented by the training set to determine the conditional probability of the corresponding position information of at least two letters in the words and the character strings combining the two letters, the position information is obtained by marking each letter according to syllable indexes in advance, so that each letter in unified words can be accurately positioned, the word segmentation efficiency can be improved, a new word can be given at will, the position information, namely the state, of all the letters in the word can be judged, the words can be divided into a plurality of syllables through the state of the letters and recorded in a database, the effective management of the words and the syllables of the words in the database is ensured, and the word recording speed is greatly improved.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application, and as shown in fig. 5, the terminal device 400 includes a memory 410, at least one processor 420, and a computer program 430 stored in the memory 410 and capable of running on the processor 420, wherein the processor 420 implements the syllable recording method of words when executing the computer program 430.
The terminal device 400 may be a desktop computer, a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a notebook computer, a super mobile personal computer (ultra-mobile personal computer, UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA), or the like, and the specific type of the terminal device is not limited in the embodiments of the present application.
The terminal device 400 may include, but is not limited to, a processor 420, a memory 410. It will be appreciated by those skilled in the art that fig. 5 is merely an example of terminal device 400 and is not intended to limit terminal device 400, and may include more or fewer components than shown, or may combine certain components, or may include different components, such as may also include input-output devices, etc.
The processor 420 may be a central processing unit (Central Processing Unit, CPU), the processor 420 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable Cheng Menzhen columns (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the like, but in the alternative, the processor may be any conventional processor or the like.
The memory 410 may in some embodiments be an internal storage unit of the terminal device 400, such as a hard disk or a memory of the terminal device 400. The memory 410 may also be an external storage device of the terminal device 400, such as a plug-in hard disk provided on the terminal device 400, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. in other embodiments. Further, the memory 410 may also include both an internal storage unit and an external storage device of the terminal device 400. The memory 410 is used to store an operating system, application programs, boot Loader (Boot Loader), data, and other programs, such as program code for the computer programs. The memory 410 may also be used to temporarily store data that has been output or is to be output.
It should be noted that, because the content such as information interaction and execution process between the expression packet generating device/unit is based on the same conception as the embodiment of the method of the present application, specific functions and technical effects thereof may be found in the embodiment of the method, and will not be described herein.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a hard element form or a software functional unit form. In addition, the specific names of the functional units and the modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the auxiliary shooting device may refer to the corresponding process in the foregoing method embodiment, which is not described herein.
Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the various method embodiments described above.
Embodiments of the present application provide a computer program product which, when run on a mobile terminal, causes the mobile terminal to perform steps that may be performed in the various method embodiments described above.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in the form of source code, object code, executable files or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the foregoing embodiments, the descriptions of the embodiments are focused on, and the details or descriptions of other embodiments may be referred to for the parts of one embodiment that are not described or depicted in detail.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various embodiments described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more, but not all, embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or elements is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some of the technical features can be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments, and are intended to be included in the scope of the present application.

Claims (9)

1. A syllable recording method of a word, comprising:
acquiring at least one word from a plurality of words to be processed, wherein the word comprises at least three letters;
splitting the words according to a preset word spelling rule to obtain a training set, wherein the training set comprises syllable indexes of the preset word spelling rule and position information of each letter in the syllable indexes;
the training set is segmented to determine the conditional probability that at least two letters in the word and the letter strings of the two letter combinations correspond to the position information;
determining a target word and a target syllable index corresponding to the target word based on the conditional probability, wherein the target syllable index meets the preset word spelling rule;
wherein the determining the target word and the target syllable index corresponding to the target word based on the conditional probability includes:
acquiring first letters in the letter strings and first position information corresponding to the first letters;
determining a second letter associated with the first letter according to the first letter and the first position information;
and calculating the conditional probability that the first letter, the first position information and the second letter exist in the target word simultaneously.
2. The method of syllable recording of a word according to claim 1, wherein the calculating a conditional probability that the first letter, the first location information, and the second letter are present in the target word at the same time comprises:
segmenting the target word according to the preset word spelling rule to obtain a segmentation result, wherein the segmentation result comprises the target word and a target syllable corresponding to the target word;
judging whether the segmentation result exists in a database or not;
if not, storing the target syllable, the target word and the target syllable index into the database.
3. The syllable recording method of word according to claim 1, comprising:
the syllable index includes a beginning, a middle, and an ending of a syllable corresponding to at least one letter in the word.
4. The syllable recording method of word according to claim 1, comprising:
when the number of syllable indexes corresponding to the preset word spelling rule of the word is three, traversing each letter of the word to obtain second position information of each letter, and generating a first training set corresponding to the word;
obtaining a first letter string, a second letter string and a third letter string and letter values in the first letter string, the second letter string and the third letter string according to the second position information and the syllable index, wherein each letter string comprises at least two letters;
and marking the first letter string, the second letter string and the third letter string as initial syllables, middle syllables and final syllables respectively, and marking each letter correspondingly according to the letter numerical value in each letter string and the second position information.
5. The method of syllable recording of words according to claim 4, comprising, after generating the first training set of words:
and automatically segmenting the first training set by adopting a hidden Markov model, wherein the conditional probability corresponding to the first training set is expressed as follows:
taking the first letter of the first training set existing in the first letter string as an example, A, B is expressed as:
a: the letter is the initial of syllable;
b: the letter is a first letter, and one letter behind the letter is a second letter;
the first letter is counted as the first letter, the probability of the second letter next to the first letter is recorded as P (AB), and the probability of the first letter as any letter and the word behind the first letter as the second letter is counted as P (B).
6. The syllable recording method of word as recited in claim 5, comprising:
obtaining a first letter string, a second letter string and a third letter string and letter values in the first letter string, the second letter string and the third letter string according to the second position information and the syllable index, wherein the letter values of the first letter string are one, and the second letter string and the third letter string contain at least two letters;
and marking the letters in the first letter string as single characters corresponding to single syllables, marking the first letter string, the second letter string and the third letter string as beginning characters, middle characters and ending characters respectively, and marking each letter correspondingly according to the letter values in the second letter string, the third letter string and the second position information.
7. A syllable recording device of a word, comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring at least one word from a plurality of words to be processed, and the word comprises at least three letters;
the splitting module is used for splitting the words according to a preset word spelling rule to obtain a training set, wherein the training set comprises syllable indexes of the preset word spelling rule and position information of each letter in the syllable indexes;
the calculation module is used for segmenting the training set to determine the conditional probability that at least two letters in the word and the letter strings of the two letter combinations correspond to the position information;
the recording module is used for determining target words and target syllable indexes corresponding to the target words based on the conditional probability, and the target syllable indexes meet the preset word spelling rule;
wherein the determining the target word and the target syllable index corresponding to the target word based on the conditional probability includes:
acquiring first letters in the letter strings and first position information corresponding to the first letters;
determining a second letter associated with the first letter according to the first letter and the first position information;
and calculating the conditional probability that the first letter, the first position information and the second letter exist in the target word simultaneously.
8. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 6 when executing the computer program.
9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 6.
CN202110079369.6A 2021-01-21 2021-01-21 Syllable recording method and device for word Active CN112905024B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110079369.6A CN112905024B (en) 2021-01-21 2021-01-21 Syllable recording method and device for word

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110079369.6A CN112905024B (en) 2021-01-21 2021-01-21 Syllable recording method and device for word

Publications (2)

Publication Number Publication Date
CN112905024A CN112905024A (en) 2021-06-04
CN112905024B true CN112905024B (en) 2023-10-27

Family

ID=76117301

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110079369.6A Active CN112905024B (en) 2021-01-21 2021-01-21 Syllable recording method and device for word

Country Status (1)

Country Link
CN (1) CN112905024B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239289A (en) * 2013-06-24 2014-12-24 富士通株式会社 Syllabication method and syllabication device
CN107967259A (en) * 2017-11-27 2018-04-27 传神语联网网络科技股份有限公司 The method and device of Thai syllable splitting
CN109754789A (en) * 2017-11-07 2019-05-14 北京国双科技有限公司 The recognition methods of phoneme of speech sound and device
CN109901727A (en) * 2019-03-06 2019-06-18 上海依智医疗技术有限公司 A kind of method and apparatus obtaining text error correction information
CN111862954A (en) * 2020-05-29 2020-10-30 北京捷通华声科技股份有限公司 Method and device for acquiring voice recognition model

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030088416A1 (en) * 2001-11-06 2003-05-08 D.S.P.C. Technologies Ltd. HMM-based text-to-phoneme parser and method for training same
CN108008832A (en) * 2016-10-31 2018-05-08 北京搜狗科技发展有限公司 A kind of input method and device, a kind of device for being used to input

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239289A (en) * 2013-06-24 2014-12-24 富士通株式会社 Syllabication method and syllabication device
CN109754789A (en) * 2017-11-07 2019-05-14 北京国双科技有限公司 The recognition methods of phoneme of speech sound and device
CN107967259A (en) * 2017-11-27 2018-04-27 传神语联网网络科技股份有限公司 The method and device of Thai syllable splitting
CN109901727A (en) * 2019-03-06 2019-06-18 上海依智医疗技术有限公司 A kind of method and apparatus obtaining text error correction information
CN111862954A (en) * 2020-05-29 2020-10-30 北京捷通华声科技股份有限公司 Method and device for acquiring voice recognition model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于分割识别的蒙古语语音关键词检测方法的研究;飞龙;高光来;闫学亮;王炜华;;计算机科学(第09期);全文 *
基于条件随机场的泰语音节切分方法;赵世瑜;线岩团;郭剑毅;余正涛;洪玄贵;王红斌;;计算机科学(第03期);全文 *

Also Published As

Publication number Publication date
CN112905024A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN111695352A (en) Grading method and device based on semantic analysis, terminal equipment and storage medium
CN108021545B (en) Case course extraction method and device for judicial writing
CN111159329B (en) Sensitive word detection method, device, terminal equipment and computer readable storage medium
CN111046142A (en) Text examination method and device, electronic equipment and computer storage medium
CN111859968A (en) Text structuring method, text structuring device and terminal equipment
CN108304377B (en) Extraction method of long-tail words and related device
CN110929520B (en) Unnamed entity object extraction method and device, electronic equipment and storage medium
CN110941951B (en) Text similarity calculation method, text similarity calculation device, text similarity calculation medium and electronic equipment
CN113076748B (en) Bullet screen sensitive word processing method, device, equipment and storage medium
CN114861635B (en) Chinese spelling error correction method, device, equipment and storage medium
CN111177375A (en) Electronic document classification method and device
CN115392235A (en) Character matching method and device, electronic equipment and readable storage medium
CN114970514A (en) Artificial intelligence based Chinese word segmentation method, device, computer equipment and medium
CN111354354B (en) Training method, training device and terminal equipment based on semantic recognition
CN114722199A (en) Risk identification method and device based on call recording, computer equipment and medium
CN111160445B (en) Bid file similarity calculation method and device
CN112905024B (en) Syllable recording method and device for word
CN115630643A (en) Language model training method and device, electronic equipment and storage medium
CN116127001A (en) Sensitive word detection method, device, computer equipment and storage medium
CN114842982A (en) Knowledge expression method, device and system for medical information system
CN115186647A (en) Text similarity detection method and device, electronic equipment and storage medium
CN113688240A (en) Threat element extraction method, device, equipment and storage medium
CN111967248A (en) Pinyin identification method and device, terminal equipment and computer readable storage medium
CN111540363B (en) Keyword model and decoding network construction method, detection method and related equipment
CN109344132B (en) User information merging method, computer readable storage medium and terminal device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant