A kind of sequence restoring method and equipment
Technical field
The present invention relates to natural language processing technique field, more particularly to a kind of sequence restoring method and equipment.
Background technology
In order to obtain the optimal word segmentation result of a word, we first can be obtained using mechanical sheer it considers that best cuts
Divide result, then obtain optimal participle using tool box toolkits tool analysis result and the random cutting result of small part
As a result.
The cutting result of machinery sheer only has intelligent cutting and fine granularity cutting at present.Such as, sheer " will marry
And not yet marry " the words, be " marriage | Buddhist monk | not | marriage " according to intelligent mode cutting, according to fine granularity mould
Formula cutting is " | Buddhist monk | of marriage | marriage | not yet | | marriage | of not marrying ", and this is the scheme that current sheer is given.But
This scheme only gives the set of best result and its all word that can be syncopated as that sheer thinks, and does not give
Go out " marriage | and | not yet | marriage " and " marry | | and | not yet | marriages | " etc. reduce sequence, using toolkits
Go the optimal word segmentation result of analysis-reduction sequence.
Therefore, there is obvious shortcoming in existing cutting scheme:Mechanical sheer sequence cannot be reduced for
Toolkits is analyzed, so that the random effect that toolkits has to be obtained using all contaminations being syncopated as is divided
Analysis, so has a strong impact on the accuracy rate and analyze speed of the word segmentation result that toolkits is obtained.
The content of the invention
It is a primary object of the present invention to propose a kind of sequence restoring method and equipment, it is intended to solve the accurate of word segmentation result
Rate is low and the slow problem of analyze speed.
To reach above-mentioned purpose, the technical proposal of the invention is realized in this way:
In a first aspect, the embodiment of the invention provides a kind of sequence restoring method, methods described is reduced for a kind of sequence
Equipment, methods described includes:
Sub-sequence to be cut as list entries is obtained the sequence length m of sub-sequence to be cut by the equipment;Wherein, m is for just
Integer;
The sequence lead-in of sub-sequence to be cut and continuous n word composition word thereafter are obtained m phrase by the equipment, will
The m phrase is referred to as the first phrase;Wherein, n is 0 to m-1 integer;
The equipment is matched each phrase in first phrase with the advance fine granularity cutting phrase for obtaining, and is obtained
The phrase that the match is successful, the second phrase is referred to as by the phrase that the match is successful;
The equipment obtains each phrase sequence length in second phrase, by phrase sequence length in second phrase
Phrase most long as sub-sequence to be cut sequence of partitions.
In such scheme, the fine granularity cutting phrase that the equipment is obtained by each phrase in first phrase and in advance
Matched, obtained the phrase that the match is successful, the phrase that the match is successful is referred to as the second phrase, specifically included:
With the advance fine granularity cutting phrase for obtaining be compared each phrase in first phrase by the equipment;
When there is the fine granularity cutting phrase identical phrase with advance acquisition in first phrase, the equipment is obtained
Take in the first phrase with the advance fine granularity cutting phrase identical phrase for obtaining;
The equipment will be referred to as second in the first phrase with the advance fine granularity cutting all phrases of phrase identical for obtaining
Phrase.
In such scheme, the equipment obtains each phrase sequence length in second phrase, by second phrase
Middle phrase sequence length phrase most long is specifically included as the sequence of partitions of sub-sequence to be cut:
The equipment obtains the sequence length of each phrase in second phrase;
Be compared for the sequence length of each phrase in second phrase by the equipment, obtains sequence in second phrase
Row length phrase most long;
The equipment using the phrase most long of sequence length in second phrase as sub-sequence to be cut sequence of partitions.
In such scheme, using the phrase most long of phrase sequence length in second phrase as sub-sequence to be cut point
After cutting sequence, methods described also includes:The equipment obtains the sequence of partitions of the sequence of the sequence of partitions for removing sub-sequence to be cut.
Further, the equipment obtains the sequence of partitions of the sequence of the sequence of partitions for removing sub-sequence to be cut, specific bag
Include:
The equipment will remove the sequence of sequence of partitions as list entries;
The sequence length of equipment detection list entries each time, when the length of list entries is not zero, obtains
The sequence of partitions of corresponding list entries;
When the sequence length that the equipment detects list entries is zero, the equipment is by the sequence of partitions of list entries
According to the sequencing composition reduction sequence for obtaining, it is sent to toolkits and is analyzed.
Second aspect, the embodiment of the invention provides a kind of sequence reduction apparatus, and described device includes:First obtains mould
Block, the second acquisition module, the 3rd acquisition module and the 4th acquisition module;Wherein,
First acquisition module, for using sub-sequence to be cut as list entries, the sequence for obtaining sub-sequence to be cut to be long
Degree m;Wherein, m is positive integer;
Second acquisition module, for by the sequence lead-in of sub-sequence to be cut and continuous n word composition word thereafter, obtaining
M phrase is taken, the m phrase is referred to as the first phrase;Wherein, n is 0 to m-1 integer;
3rd acquisition module, for by each phrase in first phrase and in advance obtain fine granularity cutting phrase
Matched, obtained the phrase that the match is successful, the phrase that the match is successful is referred to as the second phrase;
4th acquisition module, for obtaining each phrase sequence length in second phrase, by second phrase
Middle phrase sequence length phrase most long as sub-sequence to be cut sequence of partitions.
In such scheme, the 3rd acquisition module is specifically included:Comparison sub-module, the first acquisition submodule and second
Acquisition submodule;Wherein,
The comparison sub-module, for each phrase in first phrase to be entered with the advance fine granularity cutting phrase for obtaining
Row compares;
First acquisition submodule, for when the fine granularity cutting phrase for existing in first phrase and obtain in advance
During identical phrase, with the advance fine granularity cutting phrase identical phrase for obtaining in the first phrase of acquisition;
Second acquisition submodule, for the equipment by the first phrase with the advance fine granularity cutting phrase for obtaining
The all phrases of identical are referred to as the second phrase.
In such scheme, the 4th acquisition module is specifically included:3rd acquisition submodule, the 4th acquisition submodule and
5th acquisition submodule;Wherein,
3rd acquisition submodule, the sequence length for obtaining each phrase in second phrase;
4th acquisition submodule, for the sequence length of each phrase in second phrase to be compared, obtains
Sequence length phrase most long in second phrase;
5th acquisition submodule, for using the phrase most long of sequence length in second phrase as sequence to be slit
The sequence of partitions of row.
In such scheme, described device also includes:5th acquisition module;Wherein, the 5th acquisition module, for obtaining
Remove the sequence of partitions of the sequence of the sequence of partitions of sub-sequence to be cut.
Further, the 5th acquisition module, specifically for
The sequence of sequence of partitions as list entries will be removed;
And, the sequence length of detection list entries each time, when the length of list entries is not zero, obtains corresponding
The sequence of partitions of list entries;
And, when the sequence length that the equipment detects list entries is zero, the sequence of partitions of list entries is pressed
According to the sequencing composition reduction sequence for obtaining, it is sent to toolkits and is analyzed.
A kind of sequence restoring method and equipment that the embodiment of the present invention is provided, methods described include:The equipment will be treated
Cutting sequence obtains the sequence length m of sub-sequence to be cut as list entries;Wherein, m is positive integer;The equipment will be to be cut
The sequence lead-in of sub-sequence and continuous n word composition word thereafter, obtain m phrase, and the m phrase is referred to as into the first word
Group;Wherein, n is 0 to m-1 integer;The fine granularity cutting that the equipment is obtained by each phrase in first phrase and in advance
Phrase is matched, and obtains the phrase that the match is successful, and the phrase that the match is successful is referred to as into the second phrase;The equipment obtains described
Each phrase sequence length in second phrase, using the phrase most long of phrase sequence length in second phrase as sub-sequence to be cut
Sequence of partitions, so as to the accuracy rate for solving the problems, such as word segmentation result is low and analyze speed is slow.
Brief description of the drawings
Fig. 1 is a kind of hardware architecture diagram of mobile terminal provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of sequence restoring method provided in an embodiment of the present invention;
Fig. 3 is the flow chart that a kind of equipment provided in an embodiment of the present invention obtains the second phrase;
Fig. 4 is the flow chart of the sequence of partitions that a kind of equipment provided in an embodiment of the present invention obtains sub-sequence to be cut;
Fig. 5 is a kind of detail flowchart of sequence restoring method provided in an embodiment of the present invention;
Fig. 6 is a kind of structured flowchart of sequence reduction apparatus provided in an embodiment of the present invention;
Fig. 7 is a kind of structured flowchart of 3rd acquisition module provided in an embodiment of the present invention;
Fig. 8 is a kind of structured flowchart of 4th acquisition module provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described.
The mobile terminal of realization each embodiment of the invention is described referring now to accompanying drawing 1.In follow-up description, make
With the suffix of such as " module ", " part " or " unit " for representing element only for being conducive to explanation of the invention, its
Body does not have specific meaning.Therefore, " module " can be used mixedly with " part ".
Mobile terminal can be implemented in a variety of manners.For example, the terminal described in the present invention can include such as moving
It is phone, smart phone, notebook computer, digit broadcasting receiver, personal digital assistant (PDA), panel computer (PAD), portable
The mobile terminal of formula multimedia player (PMP), guider etc. and the such as fixation of numeral TV, desktop computer etc.
Terminal.Hereinafter it is assumed that terminal is mobile terminal.However, it will be understood by those skilled in the art that, except being used in particular for moving mesh
Element outside, construction according to the embodiment of the present invention can also apply to the terminal of fixed type.
Fig. 1 is that the hardware configuration of the mobile terminal for realizing each embodiment of the invention is illustrated.
Mobile terminal 1 00 can include user input unit 130, output unit 150, memory 160, interface unit 170,
Controller 180 and power subsystem 190 etc..Fig. 1 shows the mobile terminal with various assemblies, it should be understood that simultaneously
All components for showing realistic should not be applied, can alternatively implement more or less component, movement will be discussed in more detail below
The element of terminal.
User input unit 130 can generate key input data to control each of mobile terminal according to the order of user input
Plant operation.User input unit 130 allows the various types of information of user input, and can include keyboard, metal dome, touch
Plate (for example, detection due to being touched caused by resistance, pressure, electric capacity etc. change sensitive component), roller, rocking bar etc.
Deng.Especially, when touch pad is superimposed upon on display unit 151 in the form of layer, touch-screen can be formed.
Interface unit 170 is connected the interface that can pass through with mobile terminal 1 00 as at least one external device (ED).For example,
External device (ED) can include wired or wireless head-band earphone port, external power source (or battery charger) port, wired or nothing
Line FPDP, memory card port, the port for connecting the device with identification module, audio input/output (I/O) end
Mouth, video i/o port, ear port etc..Identification module can be that storage uses each of mobile terminal 1 00 for verifying user
Kind of information and subscriber identification module (UIM), client identification module (SIM), Universal Subscriber identification module (USIM) can be included
Etc..In addition, the device (hereinafter referred to as " identifying device ") with identification module can take the form of smart card, therefore, know
Other device can be connected via port or other attachment means with mobile terminal 1 00.Interface unit 170 can be used for reception and come from
The input (for example, data message, electric power etc.) of the external device (ED) and input that will be received is transferred in mobile terminal 1 00
One or more elements can be used for transmitting data between mobile terminal and external device (ED).
In addition, when mobile terminal 1 00 is connected with external base, interface unit 170 can serve as allowing by it by electricity
Power provides to the path of mobile terminal 1 00 from base or can serve as allowing the various command signals being input into from base to pass through it
It is transferred to the path of mobile terminal.Be can serve as recognizing that mobile terminal is from the various command signals or electric power of base input
The no signal being accurately fitted within base.Output unit 150 is configured to provide defeated with vision, audio and/or tactile manner
Go out signal (for example, audio signal, vision signal, alarm signal, vibration signal etc.).Output unit 150 can include display
Unit 151, dio Output Modules 152, alarm unit 153 etc..
Display unit 151 may be displayed on the information processed in mobile terminal 1 00.For example, when mobile terminal 1 00 is in electricity
During words call mode, display unit 151 can show and converse or other communicate (for example, text messaging, multimedia file
Download etc.) related user interface (UI) or graphic user interface (GUI).When mobile terminal 1 00 is in video calling pattern
Or during image capture mode, display unit 151 can show the image of capture and/or the image of reception, show video or figure
UI or GUI of picture and correlation function etc..
Meanwhile, when display unit 151 and touch pad in the form of layer it is superposed on one another to form touch-screen when, display unit
151 can serve as input unit and output device.Display unit 151 can include liquid crystal display (LCD), thin film transistor (TFT)
In LCD (TFT-LCD), Organic Light Emitting Diode (OLED) display, flexible display, three-dimensional (3D) display etc. at least
It is a kind of.Some in these displays may be constructed such that transparence to allow user to be watched from outside, and this is properly termed as transparent
Display, typical transparent display can be, for example, TOLED (transparent organic light emitting diode) display etc..According to specific
Desired implementation method, mobile terminal 1 00 can include two or more display units (or other display devices), for example, moving
Dynamic terminal can include outernal display unit (not shown) and inner display unit (not shown).Touch-screen can be used to detect touch
Input pressure and touch input position and touch input area.
Dio Output Modules 152 can mobile terminal be in call signal reception pattern, call mode, logging mode,
It is that wireless communication unit 110 is received or in memory 160 when under the isotypes such as speech recognition mode, broadcast reception mode
The voice data transducing audio signal of middle storage and it is output as sound.And, dio Output Modules 152 can be provided and movement
The audio output (for example, call signal receives sound, message sink sound etc.) of the specific function correlation that terminal 100 is performed.
Dio Output Modules 152 can include loudspeaker, buzzer etc..
Alarm unit 153 can provide output and be notified to mobile terminal 1 00 with by event.Typical event can be with
Including calling reception, message sink, key signals input, touch input etc..In addition to audio or video is exported, alarm unit
153 can in a different manner provide output with the generation of notification event.For example, alarm unit 153 can be in the form of vibrating
Output is provided, when calling, message or some other entrance communication (Incoming Communication) are received, alarm list
Unit 153 can provide tactile output (for example, vibration) to notify to user.Exported by providing such tactile, even if
When the mobile phone of user is in the pocket of user, user also can recognize that the generation of various events.Alarm unit 153
The output of the generation of notification event can also be provided via display unit 151 or dio Output Modules 152.
Memory 160 can store software program for the treatment and control operation performed by controller 180 etc., Huo Zheke
Temporarily to store oneself data (for example, telephone directory, message, still image, video etc.) through exporting or will export.And
And, memory 160 can store the vibration of various modes on being exported when touching and being applied to touch-screen and audio signal
Data.
Memory 160 can include the storage medium of at least one type, and the storage medium includes flash memory, hard disk, many
Media card, card-type memory (for example, SD or DX memories etc.), random access storage device (RAM), static random-access storage
Device (SRAM), read-only storage (ROM), Electrically Erasable Read Only Memory (EEPROM), programmable read only memory
(PROM), magnetic storage, disk, CD etc..And, mobile terminal 1 00 can perform memory with by network connection
The network storage device cooperation of 160 store function.
The overall operation of the generally control mobile terminal of controller 180.For example, controller 180 is performed and voice call, data
Communication, video calling etc. related control and treatment.In addition, controller 180 can be included for reproducing (or playback) many matchmakers
The multi-media module 181 of volume data, multi-media module 181 can be constructed in controller 180, or can be structured as and control
Device 180 is separated.Controller 180 can be with execution pattern identifying processing, the handwriting input that will be performed on the touchscreen or picture
Draw input and be identified as character or image.
Power subsystem 190 receives external power or internal power under the control of controller 180 and provides operation each unit
Appropriate electric power needed for part and component.
Various implementation methods described herein can be with use such as computer software, hardware or its any combination of calculating
Machine computer-readable recording medium is implemented.Implement for hardware, implementation method described herein can be by using application-specific IC
(ASIC), digital signal processor (DSP), digital signal processing device (DSPD), programmable logic device (PLD), scene can
Programming gate array (FPGA), processor, controller, microcontroller, microprocessor, it is designed to perform function described herein
At least one in electronic unit is implemented, and in some cases, such implementation method can be implemented in controller 180.
For software implementation, the implementation method of such as process or function can with allow to perform the single of at least one function or operation
Software module is implemented.Software code can be come by the software application (or program) write with any appropriate programming language
Implement, software code can be stored in memory 160 and performed by controller 180.
So far, oneself according to its function through describing mobile terminal.Below, for the sake of brevity, will description such as folded form,
Slide type mobile terminal in various types of mobile terminals of board-type, oscillating-type, slide type mobile terminal etc. is used as showing
Example.Therefore, the present invention can be applied to any kind of mobile terminal, and be not limited to slide type mobile terminal.
Mobile terminal 1 00 as shown in Figure 1 may be constructed such that using via frame or packet transmission data it is all if any
Line and wireless communication system and satellite-based communication system are operated.
Based on above-mentioned mobile terminal hardware configuration, the inventive method each embodiment is proposed.
Embodiment one
Referring to Fig. 2, it illustrates a kind of sequence restoring method flow, methods described is used for a kind of sequence reduction apparatus, institute
The method of stating includes:
S201:Sub-sequence to be cut as list entries is obtained the sequence length m of sub-sequence to be cut by the equipment;Wherein,
M is positive integer;
S202:The sequence lead-in of sub-sequence to be cut and continuous n word composition word thereafter are obtained m word by the equipment
Group, the first phrase is referred to as by the m phrase;Wherein, n is 0 to m-1 integer;
S203:The fine granularity cutting phrase that the equipment is obtained by each phrase in first phrase and in advance is carried out
Match somebody with somebody, obtain the phrase that the match is successful, the phrase that the match is successful is referred to as the second phrase;
S204:The equipment obtains each phrase sequence length in second phrase, by phrase sequence in second phrase
Row length phrase most long as sub-sequence to be cut sequence of partitions.
For step S201, the equipment obtains the sequence length of sub-sequence to be cut, exactly obtains bag in sub-sequence to be cut
The sequence word for containing.
For step S202, specifically, the equipment constitutes the sequence lead-in of sub-sequence to be cut with zero word thereafter
Word, word is constituted with first character thereafter, and word is constituted with two words thereafter, with three words composition word ... thereafter with thereafter
M-1 word constitutes word, obtains the m phrase constituted with the sequence lead-in of sub-sequence to be cut, and this m phrase is referred to as into the first word
Group.
For step S203, referring to Fig. 3, the fine granularity that the equipment is obtained by each phrase in first phrase and in advance
Cutting phrase is matched, and obtains the phrase that the match is successful, and the phrase that the match is successful is referred to as into the second phrase, is specifically included:
S2031:The equipment is compared each phrase in first phrase with the advance fine granularity cutting phrase for obtaining
Compared with;
S2032:It is described when there is the fine granularity cutting phrase identical phrase with advance acquisition in first phrase
With the advance fine granularity cutting phrase identical phrase for obtaining in equipment the first phrase of acquisition;
S2033:The equipment will claim in the first phrase with the advance fine granularity cutting all phrases of phrase identical for obtaining
It is the second phrase.
For step S2031, specifically, the equipment cuts each phrase in the first phrase with the advance fine granularity for obtaining
Participle group is compared one by one.
For step S2032, it is necessary to explanation, the fine granularity cutting is used as a kind of prior art, and distribution is bright herein
Repeat no more;
Further, the fine granularity cutting phrase of the advance acquisition is to carry out fine granularity cutting to the sub-sequence to be cut
The fine granularity cutting phrase of acquisition.
For step S2033, specifically, the fine granularity cutting phrase for obtaining due to the phrase in the first phrase and in advance is all
It is that the number of phrase must be zero in second phrase by same retrieval to be slit.
For step S204, referring to Fig. 4, the equipment obtains each phrase sequence length in second phrase, will be described
Phrase sequence length phrase most long is specifically included as the sequence of partitions of sub-sequence to be cut in second phrase:
S2041:The equipment obtains the sequence length of each phrase in second phrase;
S2042:Be compared for the sequence length of each phrase in second phrase by the equipment, obtains second word
Sequence length phrase most long in group;
S2043:The equipment using the phrase most long of sequence length in second phrase as sub-sequence to be cut segmentation
Sequence.
For step S2041, the equipment obtains the sequence length of each phrase, be also obtain sequence word in each phrase
Number.
For step S2042, sequence length phrase most long in the second phrase of the equipment acquisition, that is, obtain second
The most phrase of sequence word in phrase.
For the technical scheme shown in Fig. 2, using the phrase most long of phrase sequence length in second phrase as to be cut
After the sequence of partitions of sub-sequence, methods described also includes:The equipment obtains the sequence of the sequence of partitions for removing sub-sequence to be cut
Sequence of partitions;
The equipment obtains the sequence of partitions of the sequence of the sequence of partitions for removing sub-sequence to be cut, specifically includes:
The equipment will remove the sequence of sequence of partitions as list entries;
The sequence length of equipment detection list entries each time, when the length of list entries is not zero, obtains
The sequence of partitions of corresponding list entries;
When the sequence length that the equipment detects list entries is zero, the equipment is by the sequence of partitions of list entries
According to the sequencing composition reduction sequence for obtaining, it is sent to toolkits and is analyzed.
A kind of sequence restoring method is present embodiments provided, the equipment obtains sub-sequence to be cut as list entries
The sequence length m of sub-sequence to be cut;Wherein, m is positive integer;The equipment by the sequence lead-in of sub-sequence to be cut with it is continuous thereafter
N word composition word, obtain m phrase, the m phrase is referred to as the first phrase;Wherein, n is 0 to m-1 integer;It is described
Equipment is matched each phrase in first phrase with the advance fine granularity cutting phrase for obtaining, and obtains the word that the match is successful
Group, the second phrase is referred to as by the phrase that the match is successful;The equipment obtains each phrase sequence length in second phrase, by institute
Sequence of partitions of the phrase most long of phrase sequence length in the second phrase as sub-sequence to be cut is stated, so as to solve word segmentation result
Accuracy rate is low and the slow problem of analyze speed.
Embodiment two
Referring to Fig. 5, it illustrates a kind of detailed process of sequence restoring method, the method detailed includes:
S501:Sub-sequence to be cut as list entries is obtained the sequence length m of sub-sequence to be cut by the equipment;Wherein,
M is positive integer;
S502:The sequence lead-in of sub-sequence to be cut and continuous n word composition word thereafter are obtained m word by the equipment
Group, the first phrase is referred to as by the m phrase;Wherein, n is 0 to m-1 integer;
S503:The equipment is compared each phrase in first phrase with the advance fine granularity cutting phrase for obtaining
Compared with;
S504:It is described when there is the fine granularity cutting phrase identical phrase with advance acquisition in first phrase
With the advance fine granularity cutting phrase identical phrase for obtaining in equipment the first phrase of acquisition;
S505:The equipment will be referred to as in the first phrase with the advance fine granularity cutting all phrases of phrase identical for obtaining
Second phrase;
S506:The equipment obtains the sequence length of each phrase in second phrase;
S507:Be compared for the sequence length of each phrase in second phrase by the equipment, obtains second word
Sequence length phrase most long in group;
S508:The equipment using the phrase most long of sequence length in second phrase as sub-sequence to be cut segmentation sequence
Row;
S509:The equipment will remove the sequence of sequence of partitions as list entries;
S510:The sequence length of equipment detection list entries each time, when the length of list entries is not zero,
Obtain the sequence of partitions of corresponding list entries;
S511:When the sequence length that the equipment detects list entries is zero, the equipment is divided list entries
Sequence is cut according to the sequencing composition reduction sequence for obtaining, toolkits is sent to and is analyzed.
For step S501, the equipment obtains the sequence length of sub-sequence to be cut, exactly obtains bag in sub-sequence to be cut
The sequence word for containing.
For step S502, specifically, the equipment constitutes the sequence lead-in of sub-sequence to be cut with zero word thereafter
Word, word is constituted with first character thereafter, and word is constituted with two words thereafter, with three words composition word ... thereafter with thereafter
M-1 word constitutes word, obtains the m phrase constituted with the sequence lead-in of sub-sequence to be cut, and this m phrase is referred to as into the first word
Group.
For step S502, exemplarily, it is assumed that sub-sequence to be cut is " marriage and not yet marrying ", and sequence lead-in is
" knot ", word is constituted by sequence lead-in and continuous 0 to 8 word thereafter, the first phrase of acquisition for knot, marries, marriage,
Marry sum, the Buddhist monk of marriage, marriage and not yet, marriage and not yet tie, marriage and not yet marry, marriage with not yet
Marry }.
For step S503, specifically, the equipment cuts each phrase in the first phrase with the advance fine granularity for obtaining
Participle group is compared one by one.
For step S504, it is necessary to explanation, the fine granularity cutting as a kind of prior art, distribute it is bright herein not
Repeat again;
Further, the fine granularity cutting phrase of the advance acquisition is to carry out fine granularity cutting to the sub-sequence to be cut
The fine granularity cutting phrase of acquisition.
For step S505, specifically, the fine granularity cutting phrase for obtaining due to the phrase in the first phrase and in advance is all
It is that the number of phrase must be zero in second phrase by same retrieval to be slit.
For step S506, the equipment obtains the sequence length of each phrase, be also obtain sequence word in each phrase
Number.
For step S507, sequence length phrase most long in the second phrase of the equipment acquisition, that is, obtain second
The most phrase of sequence word in phrase.
For step S503 to step S508, exemplarily, with reference to the example of step S502, the first phrase is { knot, knot
Wedding, marriage, sum of marriage, the Buddhist monk of marriage, marriage and not yet, marriage and not yet tie, marriage and not yet marry, knot
Wedding and not yet marry, fine granularity cutting phrase for marriage, marry, Buddhist monk not yet, does not marry, by the first word
Each phrase is compared with each phrase in fine granularity cutting phrase in group, and the second phrase of acquisition is { marriage, marry }, described
The sequence length that the sequence length of each phrase is respectively " marriage " sequence in equipment the second phrase of acquisition is 3, " marriage " sequence
Sequence length for the phrase most long of sequence length in 2, therefore the second phrase be " marriage ", by " marriage " as to be slit
The sequence of partitions of sequence " marriage and not yet marrying ".
For step S509, the sequence for removing sequence of partitions is part in the sub-sequence to be cut.
For step S510, when the equipment will be detected to the sequence length of list entries each time, only
When the sequence length position zero of list entries, the equipment obtains corresponding list entries by step S501 to step S508
Sequence of partitions.
For step S511, when the sequence length that the equipment detects list entries is zero, show sub-sequence to be cut
Completion is split, now the sequence of partitions of all list entries is reduced sequence by the equipment according to the tandem composition for obtaining
Row, are sent to toolkits and are analyzed, and finally obtain the optimal complete sequence of sub-sequence to be cut.
For step S509 to step S511, exemplarily, with reference to the example of step S503 to step S508, will be to be slit
The sequence of partitions " marriage " of sequence falls from sub-sequence to be cut " marriage and not yet marrying ", the first renewal sequence of acquisition
" and not yet marrying ", using the first renewal sequence as list entries, the sequence length of the first renewal sequence is not zero;
First renewal sequence ground sequence lead-in for " and ", sequence lead-in is constituted into word with continuous 0 to 5 word thereafter,
Obtain the first phrase of the first renewal sequence for and, Buddhist monk, and not yet, and not yet tying, and not yet marrying, and not yet marrying
, each phrase in the first phrase of the first renewal sequence is compared with each phrase in fine granularity cutting phrase, obtain first
Second phrase of renewal sequence is { Buddhist monk }, the sequence length of phrase in the second phrase of the first renewal sequence of the equipment acquisition
For the sequence length of " Buddhist monk " sequence is 2, because only one of which phrase in the first phrase of the first renewal sequence, therefore first more
Sequence length phrase most long is " Buddhist monk " in second phrase of new sequence, and " Buddhist monk " is updated into sub-sequence " Buddhist monk as first
Do not marry " sequence of partitions;
The sequence of partitions " Buddhist monk " of the first renewal sequence is fallen from the first renewal sub-sequence " and not yet marrying ", is obtained
The second renewal sequence " marriage ", using the second renewal sequence as list entries, the sequence length of the second renewal sequence
It is not zero;
Second renewal sequence ground sequence lead-in is " not ", sequence lead-in to be constituted into word with continuous 0 to 3 word thereafter,
The first phrase of the second renewal sequence is obtained for {, do not tie, do not marry, do not marry }, by the first word of the second renewal sequence
Each phrase is compared with each phrase in fine granularity cutting phrase in group, and the second phrase of the second renewal sequence of acquisition is not for {
Marry, the sequence length of phrase is the sequence of " marriage " sequence in the second phrase of the second renewal sequence of the equipment acquisition
Length is 3, because in the first phrase of the second renewal sequence in only one of which phrase, therefore the second phrase of the second renewal sequence
Sequence length phrase most long is " marriage ", " will not married " as the second segmentation sequence for updating sub-sequence " marriage "
Row;
The sequence of partitions " marriage " of the second renewal sequence is fallen from the second renewal sub-sequence " marriage ", acquisition
3rd renewal sequence for " ", using the 3rd renewal sequence as list entries, the sequence length of the 3rd renewal sequence is not zero;
3rd renewal sequence ground sequence lead-in for " ", sequence lead-in is constituted into word with continuous 0 word thereafter, obtain
First phrase of the 3rd renewal sequence for }, by each phrase in the first phrase of the 3rd renewal sequence and fine granularity cutting phrase
In each phrase be compared, the second phrase of the 3rd renewal sequence of acquisition for }, the equipment obtains the 3rd renewal sequence
The second phrase in phrase sequence length for " " sequence length of sequence is 1 because the first phrase of the 3rd renewal sequence
Middle only one of which phrase, thus in the second phrase of the 3rd renewal sequence sequence length phrase most long for " ", will " " make
For the 3rd update sub-sequence " " sequence of partitions;
By the sequence of partitions of the 3rd renewal sequence " " from the 3rd renewal sub-sequence " " in fall, the 4th renewal sequence of acquisition
" " is classified as, using the 4th renewal sequence as list entries, the sequence length of the 4th renewal sequence is zero, therefore the equipment stops
The sequence of partitions of the 4th renewal sequence is calculated, by sequence of partitions " marriage ", the segmentation of the first renewal sequence of sub-sequence to be cut
Sequence " Buddhist monk ", the sequence of partitions " marriage " of the second renewal sequence and the 4th renewal sequence sequence of partitions " " according to acquisition
Sequencing composition reduction sequence " marriage | Buddhist monk | do not marry | ", be sent to toolkits and be analyzed, it is final to obtain
The optimal reduction sequence of sub-sequence " marriage and not yet marrying " to be cut.
Present embodiments provide a kind of method detailed of sequence reduction, the equipment is using sub-sequence to be cut as input sequence
Row, obtain the sequence length m of sub-sequence to be cut;Wherein, m is positive integer;The equipment by the sequence lead-in of sub-sequence to be cut with
Thereafter continuous n word composition word, obtains m phrase, and the m phrase is referred to as into the first phrase;Wherein, n is 0 to arrive m-1's
Integer;With the advance fine granularity cutting phrase for obtaining be compared each phrase in first phrase by the equipment;When described
When there is the fine granularity cutting phrase identical phrase obtained with advance in the first phrase, the equipment obtain in the first phrase with
The fine granularity cutting phrase identical phrase for obtaining in advance;The equipment by the first phrase with the advance fine granularity cutting for obtaining
The all phrases of phrase identical are referred to as the second phrase;The equipment obtains the sequence length of each phrase in second phrase;Institute
State equipment to be compared the sequence length of each phrase in second phrase, sequence length is most long in obtaining second phrase
Phrase;The equipment using the phrase most long of sequence length in second phrase as sub-sequence to be cut sequence of partitions;Institute
State equipment and will remove the sequence of sequence of partitions as list entries;The sequence of equipment detection list entries each time is long
Degree, when the length of list entries is not zero, obtains the sequence of partitions of corresponding list entries;When the equipment detects input sequence
When the sequence length of row is zero, the equipment is by the sequence of partitions of list entries according to the sequencing composition reduction sequence for obtaining
Row, are sent to toolkits and are analyzed, so as to the accuracy rate for solving the problems, such as word segmentation result is low and analyze speed is slow.
Embodiment three
Based on above-described embodiment identical technology design, referring to Fig. 6, it illustrates a kind of structure of sequence reduction apparatus 60
Block diagram, described device 60 includes:First acquisition module 601, the second acquisition module 602, the 3rd acquisition module 603 and the 4th are obtained
Module 604;Wherein,
First acquisition module 601, for using sub-sequence to be cut as list entries, obtaining the sequence of sub-sequence to be cut
Length m;Wherein, m is positive integer;
Second acquisition module 602, for the sequence lead-in of sub-sequence to be cut to be constituted with continuous n word thereafter
Word, obtains m phrase, and the m phrase is referred to as into the first phrase;Wherein, n is 0 to m-1 integer;
3rd acquisition module 603, for by each phrase in first phrase and in advance obtain fine granularity cutting
Phrase is matched, and obtains the phrase that the match is successful, and the phrase that the match is successful is referred to as into the second phrase;
4th acquisition module 604, for obtaining each phrase sequence length in second phrase, by second word
In group phrase sequence length phrase most long as sub-sequence to be cut sequence of partitions.
For first acquisition module 601, the sequence length of sub-sequence to be cut is obtained, in exactly obtaining sub-sequence to be cut
Comprising sequence word.
For second acquisition module 602, specifically, second acquisition module 602, for by sub-sequence to be cut
Sequence lead-in and zero word composition word thereafter, word is constituted with first character thereafter, word is constituted with two words thereafter, with it
Three words composition word ... afterwards constitutes word with m-1 word thereafter, obtains the m word constituted with the sequence lead-in of sub-sequence to be cut
Group, the first phrase is referred to as by this m phrase.
For the 3rd acquisition module 603, referring to Fig. 7, the 3rd acquisition module 603 is specifically included:Compare submodule
Block 6031, the first acquisition submodule 6032 and the second acquisition submodule 6033;Wherein,
The comparison sub-module 6031, for by each phrase in first phrase and in advance obtain fine granularity segmenting word
Group is compared;
First acquisition submodule 6032, for when the fine granularity cutting for existing in first phrase and obtain in advance
During phrase identical phrase, with the advance fine granularity cutting phrase identical phrase for obtaining in the first phrase of acquisition;
Second acquisition submodule 6033, for the equipment by the first phrase with the advance fine granularity cutting for obtaining
The all phrases of phrase identical are referred to as the second phrase.
For the comparison sub-module 6031, specifically, the comparison sub-module 6031 is by each phrase in the first phrase
Compared one by one with the advance fine granularity cutting phrase for obtaining.
For first acquisition submodule 6032, it is necessary to explanation, the fine granularity cutting is used as a kind of existing skill
Art, distributes bright will not be repeated here;
Further, the fine granularity cutting phrase of the advance acquisition is to carry out fine granularity cutting to the sub-sequence to be cut
The fine granularity cutting phrase of acquisition.
For second acquisition submodule 6033, specifically, because the phrase in the first phrase is thin with advance acquisition
Granularity cutting phrase is all that the number of phrase must be in second phrase by same retrieval to be slit
Zero.
For the 4th acquisition module 604, referring to Fig. 8, the 4th acquisition module 604 is specifically included:3rd obtains
Submodule 6041, the 4th acquisition submodule 6042 and the 5th acquisition submodule 6043;Wherein,
3rd acquisition submodule 6041, the sequence length for obtaining each phrase in second phrase;
4th acquisition submodule 6042, for the sequence length of each phrase in second phrase to be compared,
Obtain the phrase most long of sequence length in second phrase;
5th acquisition submodule 6043, for using the phrase most long of sequence length in second phrase as to be cut
The sequence of partitions of sub-sequence.
For the 3rd acquisition submodule 6041, the sequence length for obtaining each phrase is also in obtaining each phrase
The number of sequence word.
For the 4th acquisition submodule 6042, the phrase most long for obtaining sequence length in the second phrase, also
It is to obtain the most phrase of sequence word in the second phrase.
For the structured flowchart shown in Fig. 6, described device also includes:5th acquisition module 605;Wherein, the described 5th obtain
Modulus block 605, the sequence of partitions of the sequence for obtaining the sequence of partitions for removing sub-sequence to be cut;
5th acquisition module 605, specifically for
The sequence of sequence of partitions as list entries will be removed;
And, the sequence length of detection list entries each time, when the length of list entries is not zero, obtains corresponding
The sequence of partitions of list entries;
And, when the sequence length that the equipment detects list entries is zero, the sequence of partitions of list entries is pressed
According to the sequencing composition reduction sequence for obtaining, it is sent to toolkits and is analyzed.
Present embodiments provide a kind of sequence reduction apparatus, first acquisition module 601, for sub-sequence to be cut to be made
It is list entries, obtains the sequence length m of sub-sequence to be cut;Wherein, m is positive integer;Second acquisition module 602, is used for
By the sequence lead-in of sub-sequence to be cut and the composition word of continuous n word thereafter, the m phrase is referred to as the by m phrase of acquisition
One phrase;Wherein, n is 0 to m-1 integer;3rd acquisition module 603, for by each phrase in first phrase with
The fine granularity cutting phrase for obtaining in advance is matched, and obtains the phrase that the match is successful, and the phrase that the match is successful is referred to as into second
Phrase;4th acquisition module 604, for obtaining each phrase sequence length in second phrase, by second phrase
Middle phrase sequence length phrase most long as sub-sequence to be cut sequence of partitions, so as to the accuracy rate for solving word segmentation result is low
The lower and slow problem of analyze speed.
It should be noted that herein, term " including ", "comprising" or its any other variant be intended to non-row
His property is included, so that process, method, article or device including a series of key elements not only include those key elements, and
And also include other key elements being not expressly set out, or also include for this process, method, article or device institute are intrinsic
Key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including this
Also there is other identical element in the process of key element, method, article or device.
The embodiments of the present invention are for illustration only, and the quality of embodiment is not represented.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases
The former is more preferably implementation method.Based on such understanding, technical scheme is substantially done to prior art in other words
The part for going out contribution can be embodied in the form of software product, and the computer software product is stored in a storage medium
In (such as ROM/RAM, magnetic disc, CD), including some instructions are used to so that a station terminal equipment (can be mobile phone, computer, clothes
Business device, air-conditioner, or network equipment etc.) perform method described by each embodiment of the invention.
The preferred embodiments of the present invention are these are only, the scope of the claims of the invention is not thereby limited, it is every to utilize this hair
Equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills
Art field, is included within the scope of the present invention.