A kind of syllable splitting method and apparatus
Technical field
The present invention relates to natural language processing technique field, in particular to a kind of syllable splitting method and apparatus.
Background technique
Syllable is most natural structural units in voice.Exactly, syllable is the smallest voice that phoneme combination is constituted
Structural units, a syllable are composed of according to certain rules one or several phonemes, and a letter is exactly a phoneme.In the Chinese
The pronunciation of a general Chinese character is a syllable in language, for example, including a () with the syllable that a is open, ai (sorrow), ao (are endured),
This 5 syllables of an (peace), ang (dirty).Common pinyin syllable table is substantially 405 without tuning section.
Currently, in information input field, keyboard input is still one either in mobile device either PC
The information input mode of kind mainstream, and for vast China Internet user, Pinyin Input is undoubtedly in keyboard input again
A kind of most popular input mode.Due to the pinyin character string that input content is letter composition, and exporting is Chinese character, this its
In require the decoding operate to hold water to the character string of input, this decoding process is just called syllable splitting.Syllable
Cutting has vital effect in the input field of the entire Chinese phonetic alphabet.It since Chinese Pinyin syllables are numerous, and include letter
Spelling and full form, while being also required to support the processing of fuzzy phoneme, have strictly applying upper to decoding performance and accuracy
It is required that so syllable splitting is a well-known difficulties.
Give some syllable splitting schemes in the prior art, comprising: (1) positive maximum cutting, so-called forward direction, refer to from
Left-to-right, " maximum " refers to the preferential maximum syllable of length for retaining and matching and, such as: character string " fangan " passes through forward direction
Maximum cutting can obtain syllable " fang ' an ", export Chinese character " scheme ", but it is " anti-to cut out the result that user may wish to
Sense ";(2) reverse maximum cutting, it is so-called reverse, refer to from right to left, " maximum " refers to that preferential reservation matches the length come most
Big syllable can obtain syllable " fan ' gan " for character string " fangan " by reverse maximum cutting, and output Chinese character is " anti-
Sense ", but cannot get " scheme ", equally will appear problem ";(3) two-way maximum cutting, it is so-called two-way, refer to from first doing forward direction
Maximum cutting, then reverse maximum cutting is done, it is after retaining cutting twice as a result, passing through two-way maximum for character string " fangan "
Cutting, available " scheme " and " dislike ", it appears that solve the problems, such as, but still have some exceptions.For example it encounters similar
" suiyueran " can only obtain " sui ' yue ' ran " (years are right), but cannot get " making the best of things " by two-way maximum cutting
This desired result, it is seen that the case where two-way maximum cutting scheme still has unreasonable cutting.
In conclusion being combined since Chinese Pinyin syllables are various, between word and word changeable, also it is not spaced between word,
In the case where not influencing result accuracy, how to realize quick, reasonably progress pinyin syllable cutting, be a urgent need to resolve
Technical problem.
Summary of the invention
In order to solve problems in the prior art, the embodiment of the invention provides a kind of syllable splitting method and apparatus, with
In the case where not influencing result accuracy, quick, reasonably progress pinyin syllable cutting is realized.
Technical solution provided in an embodiment of the present invention is as follows:
In a first aspect, providing a kind of syllable splitting method, which comprises
The even numbers group Trie tree construction of construction syllabary in advance;
Based on the even numbers group Trie tree construction, legal syllables are matched from the pinyin sequence of input;
Based on the legal syllables matched, the pinyin sequence is cut according to syllable with the preferential strategy of power and syllable
Point, to obtain a variety of syllable splitting schemes;
A variety of syllable splitting schemes are stored.
With reference to first aspect, in the first possible implementation, the even numbers group Trie tree construction includes base number
Group and check array, the preparatory even numbers group Trie tree construction for constructing syllabary include:
Construct the Trie tree construction of syllabary;
Multiple and different letters that the syllabary includes are encoded respectively, to obtain each of the Trie tree construction
The sequence of conditions code of state jump condition;
The base array and the check array are initialized, and carry out calculating institute according to default calculation method
Base array and the check array are stated, to construct the even numbers group Trie tree construction;
Wherein, the default calculation method is expressed as follows:
(1), base [s]=min k | base [s1+ k]=check [s1+ k]=base [s2+ k]=check [s2+k]
=...=0 and k >=1 };
(2), s be can terminal node, but be not leaf node, then base [s]=- base [s]
(3) if, s be leaf node, base [s]=- ∞;
(4), check [t]=s;
Wherein, s1,s2,…,snThe respectively corresponding sequence of conditions code of the n child node of state s.
The possible implementation of with reference to first aspect the first, it is in the second possible implementation, described to be based on
The even numbers group Trie tree construction, matching legal syllables from the pinyin sequence of input includes:
Determine the corresponding sequence of conditions code of each letter in the pinyin sequence;
Based in the pinyin sequence the corresponding sequence of conditions code of each letter and the default calculation method, press
According to the alphabetical input sequence of the pinyin sequence, the corresponding base array of the pinyin sequence and check array are calculated;
By the base in the corresponding base array of the pinyin sequence and check array and the even numbers group Trie tree construction
Array and check array are compared;
When comparing successfully, the legal syllables for including in the pinyin sequence are determined.
With reference to first aspect, in the third possible implementation, described based on the legal syllables matched, to described
Pinyin sequence carries out cutting with the preferential strategy of power and syllable according to syllable to obtain a variety of syllable splitting schemes
The pinyin sequence is expressed as sn, and by snThe subsequence table of i-th of position to -1 position of jth be shown as s
[i, j], wherein n snLength, snIndex position be 0 arrive n;
If s [i, j] is a legal syllables, then retain s [i, j];
If s [i, j] is an illegal syllable, and s [k, m] (0≤k≤i, j≤m≤n) is a legal syllables, then
S [i, j] is not retained;
If s [i, j] is an illegal syllable, and is not present and meets 0≤k≤i, j≤m≤n k and m, so that s [k,
M] it is a legal syllables, then retain s [i, j].
With reference to first aspect to the third any one possible implementation of first aspect, in the 4th kind of possible reality
It is described storage is carried out to a variety of syllable splitting schemes to include: in existing mode
A variety of syllable splitting schemes are stored based on the data structure of figure.
Second aspect, additionally provides a kind of syllable splitting device, and described device includes:
Constructing module, for constructing the even numbers group Trie tree construction of syllabary in advance;
Matching module matches legal sound for being based on the even numbers group Trie tree construction from the pinyin sequence of input
Section;
Cutting module, for being weighed together to the pinyin sequence according to syllable and syllable being excellent based on the legal syllables matched
First strategy carries out cutting, to obtain a variety of syllable splitting schemes;
Memory module, for being stored to a variety of syllable splitting schemes.
In conjunction with second aspect, in the first possible implementation, the even numbers group Trie tree construction includes base number
Group and check array, the constructing module include:
First construction submodule, for constructing the Trie tree construction of syllabary;
Encoding submodule, it is described to obtain for being encoded respectively to multiple and different letters that the syllabary includes
The sequence of conditions code of each state jump condition of Trie tree construction;
Second construction submodule, for being initialized to the base array and the check array, and according to default
Calculation method carries out calculating the base array and the check array, to construct the even numbers group Trie tree construction;
Wherein, the default calculation method is expressed as follows:
(1), base [s]=min k | base [s1+ k]=check [s1+ k]=base [s2+ k]=check [s2+k]
=...=0 and k >=1 };
(2), s be can terminal node, but be not leaf node, then base [s]=- base [s]
(3) if, s be leaf node, base [s]=- ∞;
(4), check [t]=s;
Wherein, s1,s2,…,snThe respectively corresponding sequence of conditions code of the n child node of state s.
In conjunction with the first possible implementation of second aspect, in the second possible implementation, the matching
Module includes:
First determines submodule, for determining the corresponding sequence of conditions of each letter in the pinyin sequence
Code;
Computational submodule, for based on the corresponding sequence of conditions code of each letter in the pinyin sequence and described
Default calculation method calculates the corresponding base array of the pinyin sequence according to the alphabetical input sequence of the pinyin sequence
With check array;
Second determines submodule, is used for the corresponding base array of the pinyin sequence and check array and the even numbers
Base array and check array in group Trie tree construction are compared, and when comparing successfully, determine in the pinyin sequence
The legal syllables for including.
In conjunction with second aspect, in the third possible implementation, the cutting module is specifically used for:
The pinyin sequence is expressed as sn, and by snThe subsequence table of i-th of position to -1 position of jth be shown as s
[i, j], wherein n snLength, snIndex position be 0 arrive n;
If s [i, j] is a legal syllables, then retain s [i, j];
If s [i, j] is an illegal syllable, and s [k, m] (0≤j≤i, j≤m≤n) is a legal syllables, then
S [i, j] is not retained;
If s [i, j] is an illegal syllable, and is not present and meets 0≤k≤i, j≤m≤n k and m, so that s [k,
M] it is a legal syllables, then retain s [i, j].
In conjunction with the third any one possible implementation of second aspect to second aspect, in the 4th kind of possible reality
In existing mode, the memory module is specifically used for:
A variety of syllable splitting schemes are stored based on the data structure of figure.
The third aspect provides a kind of electronic equipment, comprising:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing
Device realizes pinyin syllable cutting method as described in relation to the first aspect.
Fourth aspect provides a kind of computer readable storage medium, is stored thereon with computer program, and described program is located
Manage the pinyin syllable cutting method realized when device executes as described in relation to the first aspect.
Syllable splitting method and apparatus provided in an embodiment of the present invention, firstly, the even numbers group Trie of construction syllabary in advance
Then tree construction is based on even numbers group Trie tree construction, legal syllables are matched from the pinyin sequence of input, and based on matching
Legal syllables out carry out cutting with the preferential strategy of power and syllable according to syllable to pinyin sequence, are cut with obtaining a variety of syllables
Offshoot program, finally, storing to a variety of syllable splitting schemes, very efficient method is come on time or space as a result,
The syllable splitting work of pinyin character string is completed, so as to avoid using positive maximum cutting or reverse maximum cutting or two-way
The slit mode of the maximum cutting unavailability or irrationality that may be present that syllable splitting is carried out to pinyin sequence, realizes
Can in the case where not influencing result accuracy, achieve the purpose that quickly, reasonably carry out syllable splitting.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is the flow chart for the syllable splitting method that the embodiment of the present invention one provides;
Fig. 2 is the partial schematic diagram of the Trie tree construction for the syllabary that the embodiment of the present invention one provides;
Fig. 3 is the schematic diagram for the even numbers group Trie tree construction that the embodiment of the present invention one provides;
Fig. 4 a~4c carries out a variety of syllable splitting schemes based on the data structure of figure for what the embodiment of the present invention one provided
The schematic diagram of storage;
Fig. 5 is the block diagram of syllable splitting device provided by Embodiment 2 of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached in the embodiment of the present invention
Figure, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only this
Invention a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist
Every other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.
Embodiment one
Fig. 1 is the flow chart for the syllable splitting method that the embodiment of the present invention one provides, and shown referring to Fig.1, this method includes
Following steps:
Step S1: the even numbers group Trie tree construction of construction syllabary in advance.
Specifically, the process of step S1 may include:
Construct the Trie tree construction of syllabary;
Multiple and different letters that syllabary includes are encoded respectively, to obtain each state transfer of Trie tree construction
The sequence of conditions code of condition;
Base array and check array are initialized, and according to default calculation method carry out calculate base array and
Check array, to construct even numbers group Trie tree construction.
Specifically, Trie tree, dictionary tree are one kind of search tree, it be substantially a determining finite state from
Motivation, a state of each node on behalf automatic machine.Root node indicates original state, leaf node and middle node in Trie tree
Point indicates respectively a legal syllables, and the jump condition between node and node is letter.
Fig. 2 is that the partial schematic diagram of the Trie tree construction for the syllabary that the embodiment of the present invention one provides is opened in Fig. 2 with a
The pinyin syllable (a, ai, ao, an, ang) of head constitutes the part-structure of Trie tree, wherein filled circles indicate the node or the section
Point can be a legal syllables, and number 0,1,2,3,4,5 shows respectively different states, constitutes the alphabetical a of phonetic, i, o,
N, g are respectively the jump condition of state transfer.
State transitional function is described in Fig. 2 to be indicated are as follows:
G (s, c)=t (1)
Wherein, s is expressed as current state, and c is expressed as jump condition, and t is expressed as next state, in Fig. 3 from 0 to 1
The transformation of state can be described with state equation are as follows: g (0, ' a ')=1.
DAT (Double-Array Trie), i.e. even numbers group Trie are to indicate one using two one-dimensional integer arrays
Trie tree, one is base array, and one is check array, and base array is used to determine the transfer of state, and check array is used
In examine transfer correctness, examine the state whether there is, thus state transition equation (1) can be indicated with two equations as
Under:
T=base [s]+c (2)
Check [t]=s (3)
And construct even numbers group, therefore, to assure that the array position that next state t is occupied is not used, therefore, first
It first needs to encode multiple and different letters that syllabary includes respectively by pre-arranged code rule, to obtain Trie tree construction
The sequence of conditions code of each state jump condition.Wherein, the corresponding jump condition of a letter, it is assumed that letter shared m, that
It is 1,2 that each state jump condition, which can be separately encoded, ..., m.It is pre- due to including 26 English alphabets in syllabary
Letter is encoded according to the positive sequence or backward or out-of-order sequence of English alphabet if coding rule can be, for example, according to English
Positive sequence a, b, the c of text mother ..., z, being separately encoded is 1,2,3 ..., 26;In addition, pre-arranged code rule can also be according to sound
The positive sequence or backward or out-of-order sequence of section table encode letter, for example, according to positive sequence a, o, e ... the w of syllabary, respectively
1,2,3 are encoded to ..., 26.The embodiment of the present invention is not limited specific pre-arranged code rule.
In the present embodiment, base array and check array are initialized, when agreement base and check is 0, then
The state is represented as dummy status, the value of base [s] and check [t] are determined according to following default calculation method, i.e., it is default to calculate
Method are as follows:
(1), base [s]=min k | base [s1+ k]=check [s1+ k]=base [s2+ k]=check [s2+k]
=...=0 and k >=1 };
(2), s be can terminal node, but be not leaf node, then base [s]=- base [s]
(3) if, s be leaf node, base [s]=- ∞;
(4), check [t]=s;
Wherein, s1,s2,…,snThe respectively corresponding sequence of conditions code of the n child node of state s.
The process of the construction even numbers group Trie tree construction of the embodiment of the present invention is further described by taking Fig. 2 as an example below:
1) jump condition a, i, o are set, the corresponding sequence of conditions code of n, g is 1,2,3,4,5, and by base and
Check array is initialized to 0;
2) for node 0, received jump condition is a, and corresponding sequence of conditions code is 1, according to above-mentioned default calculating
Method is calculated:
Base [0]=min k | base [1+k]=check [1+k]=0, k >=1 }=1.
3) it is calculated according to above-mentioned default calculation method: check [base [0]+1]=0, that is: check [2]=
0。
4) for node 1, received condition is i, o, n, and corresponding sequence of conditions code is 2,3,4, according to above-mentioned default
Calculation method is calculated:
Base [2]=min k | base [2+k]=check [2+k]=base [3+k]=check [3+k]=base [4+
K]=check [4+k]=0, k >=1 }=1.
5) due to node 1 be one can terminal node: base [2]=- base [2]=- 1.
6) check [| base [2] |+2]=check [| base [2] |+3]=check [| base [2] |+4]=1, also
It is: check [3]=check [4]=check [5]=1.
7) since node 2 and node 3 are leaf nodes, so that base [3]=base [4]=- ∞.
8) for node 4, received jump condition is g, and corresponding sequential coding is 5, so that
Base [5]=min k | base [k+5]=check [k+5]=0, k >=1 }=1.
9) due to node 4 be one can terminal node: base [5]=- base [5]=- 1.
10) check [| base [5] |+5]=5, that is: check [6]=5.
11) since node 5 is a leaf node: base [6]=- ∞.
Therefore, for the even numbers group constructed as shown in figure 3, in Fig. 3, first row corresponds to the state value of even numbers group, secondary series
The value of corresponding base array, third arrange the value of corresponding check array.
It is worth noting that, in the embodiment of the present invention illustratively in the Chinese phonetic alphabet with a beginning serial syllable (ai,
Ao, an, ang) it is illustrated the process of construction even numbers group Trie tree construction and syllable is carried out based on even numbers group Trie tree construction
Matching process, and for pinyin syllable cutting, the Trie tree that actual needs constitutes all Chinese Pinyin syllables is all logical
The procedure construction of step S1 is crossed into the form of DAT, then carries out the retrieval matching process of pinyin syllable in step.
It should be noted that mixed if necessary to processing Chinese and English it is defeated, then only need for English word to be considered as a syllable,
It is added in syllabary, such as orange, it is only necessary to orange word be added in syllabary, and then by all Chinese
Language pinyin syllable and English word constitute Trie tree, then, by way of the procedure construction of step S1 is at DAT.
In the embodiment of the present invention, by constructing the even numbers group Trie tree construction of syllabary in advance, realizing will be in syllabary
Syllable storage to DAT purpose so that it is subsequent to syllable carry out retrieval matching when, can use the data structure of DAT
Complete the efficient matchings to syllable.
Step S2: it is based on even numbers group Trie tree construction, matches legal syllables from the pinyin sequence of input.
Specifically, the process of step S2 may include:
Determine the corresponding sequence of conditions code of each letter in pinyin sequence;
Based in pinyin sequence the corresponding sequence of conditions code of each letter and default calculation method, according to phonetic sequence
The alphabetical input sequence of column calculates the corresponding base array of pinyin sequence and check array;
By in the corresponding base array of pinyin sequence and check array and even numbers group Trie tree construction base array and
Check array is compared;
When comparing successfully, the legal syllables for including in pinyin sequence are determined.
In the specific implementation process, according to pre-arranged code rule, determine that each letter in pinyin sequence respectively corresponds
Sequence of conditions code.It is assumed that the corresponding sequence of conditions code of a, i, o, n, g is 1,2,3,4,5, as input " ang ", " ang "
Each phoneme corresponding sequence of conditions code difference 1,4,5.
Base array corresponding to pinyin sequence and check array initialize, and defeated according to the letter of pinyin sequence
Enter sequence, the corresponding sequence of conditions code of each letter in pinyin sequence is updated in default calculation method, is calculated
The corresponding base array of pinyin sequence and check array.By the corresponding base array of pinyin sequence and check array and even numbers group
Base array and check array in Trie tree construction are compared, when comparing successfully, determine include in pinyin sequence
Legal syllables.
Wherein, the default calculation method in step S2 is identical as the default calculation method in step S1, and details are not described herein again.
Illustratively, according to Fig.3, base array and check array further describes the matching process of syllable,
Such as the pinyin sequence of input is " agn ":
1) init state first, is calculated base [0]=1 according to default calculation formula, reads from dummy status
Jump condition is the initial a of pinyin sequence, and the corresponding sequence of conditions code of a is 1, therefore next state are as follows: t=base
[0]+1=2, and check [t]=check [2]=0, it is possible thereby to determine a in even numbers group Trie tree.
2) state becomes base [2]=- 1 at this time, illustrates the node and non-leaf nodes, therefore calculating input is under g
One state, the corresponding sequence of conditions code of g are 5, therefore next state is t=| base [2] |+5=1+5=6, check [t]
=check [6]=5 ≠ 2 may determine that agn is not present in even numbers group Trie so can not have child node g after a node at this time
In.
In the present embodiment, the pinyin sequence of input is matched by the even numbers group Trie tree construction of syllabary, not only
It can be realized the efficient matchings of constant rank time complexity, but also can reach and terminate matched effect early.
Step S3: based on the legal syllables matched, to pinyin sequence according to syllable with the preferential strategy of power and syllable into
Row cutting, to obtain a variety of syllable splitting schemes.
In the present embodiment, syllable refers to all syllables in Chinese syllables, equal, there is no excellent with weighing
It first selects longest syllable or screens syllable according to other modes, such as:
Xiao, can cutting be [xi ' ao, xiao, xi ' a ' o, xia ' o], due to a, ao, xia, xi, o are syllable, therefore
It gives and retains, other syllables can't be given up because of xiao syllable longest.
Syllable is preferential, refers to that syllable is compared with non-syllable, preferentially retains syllable, abandons non-syllable, such as:
Long meets Chinese Pinyin syllables, therefore not continuing cutting is l ' o ' n ' g.
Xi belongs to Chinese Pinyin syllables, therefore not continuing cutting is x ' i.
Xim, xi belong to Chinese Pinyin syllables, therefore retain xi, but xim is not belonging to Chinese Pinyin syllables, and therefore, m makees
It is retained for a simplicity, therefore cutting result is xi ' m.
Specifically, the process of step S3 may include:
Pinyin sequence is expressed as sn, and by snThe subsequence table of i-th of position to -1 position of jth be shown as s [i, j],
Wherein, n snLength, snIndex position be 0 arrive n;
If s [i, j] is a legal syllables, then retain s [i, j];
If s [i, j] is an illegal syllable, and s [k, m] (0≤k≤i, j≤m≤n) is a legal syllables, then
S [i, j] is not retained;
If s [i, j] is an illegal syllable, and is not present and meets 0≤k≤i, j≤m≤n k and m, so that s [k,
M] it is a legal syllables, then retain s [i, j].
Illustratively, syllable splitting is carried out to pinyin character string " suiyueran ", due to sui, yu, er, an, yue, ran
All be legal syllables, need to retain, therefore available two kinds of cutting schemes, i.e., " sui, yu, er, an " and " sui, yue,
ran”。
In the present embodiment, by being weighed together to pinyin sequence according to syllable and syllable being preferential based on the legal syllables matched
Strategy carry out cutting, to obtain a variety of syllable splitting schemes, avoided as a result, using positive maximum cutting or reverse maximum cutting
Or the slit mode unavailability or irrationality that may be present that syllable splitting is carried out to pinyin sequence of two-way maximum cutting,
Realize the purpose that reasonable Chinese Pinyin syllables combination is decoded into from pinyin sequence.
Step S4: a variety of syllable splitting schemes are stored.
Specifically, the process of step S4 may include:
A variety of syllable splitting schemes are stored based on the data structure of figure.
4a~4c further describes the step S4 in the embodiment of the present invention with reference to the accompanying drawing.
Since in step s3 according to syllable with power and the preferential cutting strategy of syllable, a string of pinyin character strings may go out
Now a large amount of syllable splitting scheme, the pinyin sequence of input is longer, and cutting scheme number is more, is with user input sequence length
For 64 pinyin string, if user's input is that pinyin sequence is:
Xiaoxiaoxiaoxiaoxiaoxiaoxiaoxiaoxiaoxiaoxiaoxiaoxiaoxiao xiaoxiao, 16
The corresponding cutting scheme of xiao, each xiao has 4 kinds [xiao, xia ' o, xi ' ao ', xi ' a ' o], then 16 xiao are corresponding
Cutting scheme number of combinations is 416Kind, if needing to occupy great space using chain structure or other modes to store this
A little cutting schemes.And if stored using figure, since a large amount of common node is utilized, it is only necessary to which the storage of very little is empty
Between the storages of tens kinds of cutting schemes can be completed, greatly saved memory space, cutting result carried out using figure
Storage is expressed using the data structure of figure with limited some nodes as shown in Fig. 4 a as long as can be seen that from Fig. 4 a
Thousands of cutting scheme out.
For another example, to pinyin character string " suiyueran " carry out syllable splitting, two kinds of obtained cutting schemes be " sui,
Yu, er, an " and " sui, yue, ran ", to cutting scheme carried out using figure store as shown in Fig. 4 b.
For another example, Chinese and English is mixed it is defeated, such as processing juzishiorange, three kinds of obtained cutting schemes be " ju, zi,
Shi, o, ran, ge ", " ju, zi, shi, o, rang, e " and " ju, zi, shi, orange " carry out cutting scheme using figure
Storage is as shown in Fig. 4 c.
Syllable splitting method provided in an embodiment of the present invention, firstly, the even numbers group Trie tree construction of construction syllabary in advance,
Then, it is based on even numbers group Trie tree construction, legal syllables are matched from the pinyin sequence of input, and legal based on what is matched
Syllable carries out cutting with the preferential strategy of power and syllable according to syllable to pinyin sequence, to obtain a variety of syllable splitting schemes, most
Afterwards, a variety of syllable splitting schemes are stored, very efficient method completes phonetic word on time or space as a result,
The syllable splitting work of symbol string, so as to avoid using positive maximum cutting or reverse maximum cutting or two-way maximum cutting
The slit mode unavailability or irrationality that may be present that syllable splitting is carried out to pinyin sequence, realizing can be in not shadow
Ring result accuracy in the case where, achieve the purpose that quickly, reasonably carry out syllable splitting.
Embodiment two
Fig. 5 is the block diagram of syllable splitting device provided by Embodiment 2 of the present invention, as shown in figure 5, cutting device includes:
Constructing module 51, for constructing the even numbers group Trie tree construction of syllabary in advance;
Matching module 52 matches legal syllables from the pinyin sequence of input for being based on even numbers group Trie tree construction;
Cutting module 53, for being weighed together to pinyin sequence according to syllable and syllable being preferential based on the legal syllables matched
Strategy carry out cutting, to obtain a variety of syllable splitting schemes;
Memory module 54, for being stored to a variety of syllable splitting schemes.
Further, even numbers group Trie tree construction includes base array and check array, and constructing module 51 includes:
First construction submodule 511, for constructing the Trie tree construction of syllabary;
Encoding submodule 512, for being encoded respectively to multiple and different letters that syllabary includes, to obtain Trie tree
The sequence of conditions code of each state jump condition of structure;
Second construction submodule 513, for being initialized to base array and check array, and according to default calculating
Method carries out calculating base array and check array, to construct even numbers group Trie tree construction;
Wherein, default calculation method is expressed as follows:
(1), base [s]=min k | base [s1+ k]=check [s1+ k]=base [s2+ k]=check [s2+k]
=...=0 and k >=1 };
(2), s be can terminal node, but be not leaf node, then base [s]=- base [s]
(3) if, s be leaf node, base [s]=- ∞;
(4), check [t]=s;
Wherein, s1,s2,…,snThe respectively corresponding sequence of conditions code of the n child node of state s.
Further, matching module 52 includes:
First determines submodule 521, for determining the corresponding sequence of conditions code of each letter in pinyin sequence;
Computational submodule 522, for according in pinyin sequence the corresponding sequence of conditions code of each letter and calculating
Formula calculates the corresponding base array of pinyin sequence and check array;
Second determines submodule 523, is used for the corresponding base array of pinyin sequence and check array and even numbers group Trie
Base array and check array in tree construction are compared, and when comparing successfully, that determines to include in pinyin sequence is legal
Syllable.
Further, cutting module 53 is specifically used for:
Pinyin sequence is expressed as sn, and by snThe subsequence table of i-th of position to -1 position of jth be shown as s [i, j],
Wherein, n snLength, snIndex position be 0 arrive n;
If s [i, j] is a legal syllables, then retain s [i, j];
If s [i, j] is an illegal syllable, and s [k, m] (0≤k≤i, j≤m≤n) is a legal syllables, then
S [i, j] is not retained;
If s [i, j] is an illegal syllable, and is not present and meets 0≤k≤i, j≤m≤n k and m, so that s [k,
M] it is a legal syllables, then retain s [i, j].
Further, memory module 54 is specifically used for:
A variety of syllable splitting schemes are stored based on the data structure of figure.
Syllable splitting device provided in an embodiment of the present invention, belongs to syllable splitting method provided by the embodiment of the present invention
Syllable splitting method provided by any embodiment of the invention can be performed in same inventive concept, has and executes syllable splitting method
Corresponding functional module and beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to the embodiment of the present invention
The syllable splitting method of offer, is not repeated here herein.
In addition, another embodiment of the present invention also provides a kind of electronic equipment, comprising:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing
Device realizes the syllable splitting method as described in embodiment one.
In addition, another embodiment of the present invention also provides a kind of computer readable storage medium, it is stored thereon with computer journey
Sequence realizes the syllable splitting method as described in above-described embodiment when described program is executed by processor.
It should be understood by those skilled in the art that, the embodiment in the embodiment of the present invention can provide as method, system or meter
Calculation machine program product.Therefore, complete hardware embodiment, complete software embodiment can be used in the embodiment of the present invention or combine soft
The form of the embodiment of part and hardware aspect.Moreover, being can be used in the embodiment of the present invention in one or more wherein includes meter
Computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, the optical memory of calculation machine usable program code
Deng) on the form of computer program product implemented.
It is referring to the method for middle embodiment, equipment (system) according to embodiments of the present invention and to calculate in the embodiment of the present invention
The flowchart and/or the block diagram of machine program product describes.It should be understood that can be realized by computer program instructions flow chart and/or
The combination of the process and/or box in each flow and/or block and flowchart and/or the block diagram in block diagram.It can mention
For the processing of these computer program instructions to general purpose computer, special purpose computer, Embedded Processor or other programmable datas
The processor of equipment is to generate a machine, so that being executed by computer or the processor of other programmable data processing devices
Instruction generation refer to for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram
The device of fixed function.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment in the embodiment of the present invention has been described, once a person skilled in the art knows
Basic creative concept, then additional changes and modifications may be made to these embodiments.So appended claims are intended to explain
Being includes preferred embodiment and all change and modification for falling into range in the embodiment of the present invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.