CN106815204A - The segmentation method and device of judgement document - Google Patents
The segmentation method and device of judgement document Download PDFInfo
- Publication number
- CN106815204A CN106815204A CN201510867898.7A CN201510867898A CN106815204A CN 106815204 A CN106815204 A CN 106815204A CN 201510867898 A CN201510867898 A CN 201510867898A CN 106815204 A CN106815204 A CN 106815204A
- Authority
- CN
- China
- Prior art keywords
- document
- row
- regularity
- document row
- condition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application discloses the segmentation method and device of a kind of judgement document.The method includes:Branch's treatment is carried out to target judgement document, document row set is obtained, wherein, document row set is the set that multiple document rows are constituted for carrying out being obtained after branch's treatment to target judgement document;Corresponding paragraph mark is added to each the document row in document row set according to default regularity set respectively, wherein, default regularity set is the set of the rule composition counted according to many judgement documents;And segment processing is carried out to target judgement document based on the corresponding paragraph mark of each document row in document row set.By the application, solve the problems, such as that the accuracy that the paragraph of judgement document in correlation technique is divided is relatively low.
Description
Technical field
The application is related to text-processing technical field, in particular to the segmentation method and device of a kind of judgement document.
Background technology
Judgement document is the carrier for recording people's court's hearing process and result, is also that people's court determines and distribute to work as thing
People's substantive right and voluntary only voucher.The judgement document that a structural integrity, key element are complete, logic is rigorous, is both to work as thing
People enjoys rights and bears the voucher of obligation, is also the important of higher level people's court supervision People's Courts at lower levels civil adjudication
Foundation.
In correlation technique, need to carry out paragraph division to carry out the data analysis of correlation by judgement document often.Generally, will
It is to match line by line that judgement document's paragraph divides the technology for using, and full text is split the text chain of a line head and the tail connection in a row first
Table;Secondly text chained list is matched into existing regulation linked, wherein, chained list is a kind of linear list, but can't be by linear
Sequential storage data, but the pointer of next node is stored in each node.Text chained list and regulation linked be all by
Individual matching and Next Occurrence is jumped to after the match is successful, according to specific occurrence output to corresponding paragraph;Due to making
Two chained lists match and are unidirectionally to match forward, if somewhere above is after it fails to match, postorder all the elements are all
With ging wrong.I.e. paragraph is divided easily and mistake mistake everywhere at occurs, this serious related mistake.Therefore, judge is caused
The accuracy that paragraph is divided in document is relatively low.
For the relatively low problem of the accuracy of the paragraph division of judgement document in correlation technique, not yet propose at present effective
Solution.
The content of the invention
The main purpose of the application is the segmentation method and device for providing a kind of judgement document, with solving correlation technique
The relatively low problem of accuracy that the paragraph of judgement document is divided.
To achieve these goals, according to the one side of the application, there is provided a kind of segmentation method of judgement document.Should
Method includes:Branch's treatment is carried out to target judgement document, document row set is obtained, wherein, document row set is that target is cut out
Sentence the set that multiple document rows are constituted that document obtained after branch's treatment;According to default regularity set respectively to document
Each document row in row set adds corresponding paragraph mark, wherein, default regularity set is according to many judge's texts
The set of the rule composition that book is counted;And identified to target based on the corresponding paragraph of each document row in document row set
Judgement document carries out segment processing.
Further, it is corresponding to each the document row addition in document row set respectively according to default regularity set
Paragraph mark includes:;By the multiple regularity conditions in default regularity set gradually with document row set in each
The content of document row is matched;Obtain document row set in multiple regularity condition couplings on document row;And it is right
Paragraph mark corresponding with the document row addition in multiple regularity condition couplings.
Further, it is corresponding to each the document row addition in document row set respectively according to default regularity set
Paragraph mark includes:By the multiple regularity conditions in default regularity set gradually with document row set in each text
The content of book row is matched;Obtain document row set in not with multiple regularity condition couplings on document row, obtain to
A few non-identification instrument row;Determine that a upper document row of at least one non-identification instrument row is corresponding in document row set
Paragraph is identified;And identify as at least one not the corresponding paragraph of a upper document row of at least one non-identification instrument row
The paragraph mark of identification instrument row.
Further, multiple regularity conditions include the first regularity condition and the second regularity condition, wherein,
The condition that first regularity condition is currently matched with document row set, the second regularity condition is the first regularity
Condition and document row set are next in multiple regularity conditions to be matched with document row set in the case that it fails to match
Condition, by multiple regularity conditions in default regularity set gradually with document row set in each document row
Content carries out matching to be included:By each text in the first regularity condition in multiple regularity conditions and document row set
The content of book row is matched;Judge every in the first regularity condition in multiple regularity conditions and document row set
Whether the content of individual document row matches and terminates;If the first regularity condition and document row collection in multiple regularity conditions
The content of each the document row in conjunction has been matched and terminated, using the second regularity condition in multiple regularity conditions as work as
The condition that the content of each the document row in the preceding row set with document is matched;And by multiple regularity conditions
Two regularity conditions are matched with the content of each the document row in document row set.
Further, branch's treatment is carried out to target judgement document, obtaining document row set includes:Determine target judge's text
The Format Type of book;Determine the corresponding newline of Format Type of target judgement document;And according to the lattice of target judgement document
The corresponding newline of formula type carries out branch's treatment, obtains document row set.
Further, target judgement document is carried out based on each the document row corresponding paragraph mark in document row set
Segment processing includes:Determine each paragraph mark in the corresponding paragraph mark of each document row in document row set;It is based on
Each paragraph mark carries out paragraph division to the document row in target judgement document;And it is many by what is identified with identical paragraph
Individual document row merges into same paragraph.
To achieve these goals, according to the another aspect of the application, there is provided a kind of sectioning of judgement document.Should
Device includes:First processing units, for carrying out branch's treatment to target judgement document, obtain document row set, wherein, document
Row set is the set that multiple document rows are constituted for carrying out being obtained after branch's treatment to target judgement document;Adding device, is used for
Corresponding paragraph mark is added to each the document row in document row set according to default regularity set respectively, wherein, in advance
If regularity set is the set of the rule composition counted according to many judgement documents;And second processing unit, it is used for
Segment processing is carried out to target judgement document based on the corresponding paragraph mark of each document row in document row set.
Further, adding device includes:;First matching module, for by default regularity set it is multiple just
Then content of the rule condition gradually with each the document row in document row set is matched;First acquisition module, for obtaining
In document row set with multiple regularity condition couplings on document row;And add module, advised with multiple canonicals for Dui
Then the document row in condition coupling adds corresponding paragraph mark.
Further, adding device includes:Second matching module, for by the multiple canonicals in default regularity set
Content of the rule condition gradually with each the document row in document row set is matched;Second acquisition module, for obtaining text
In book row set not with multiple regularity condition couplings on document row, obtain at least one non-identification instrument row;First is true
Cover half block, the corresponding paragraph mark of a upper document row for determining at least one non-identification instrument row in document row set
Know;And second determining module, for the corresponding paragraph mark of a upper document row of at least one non-identification instrument row to be made
For the paragraph of at least one non-identification instrument row is identified.
Further, first processing units include:3rd determining module, the form class for determining target judgement document
Type;4th determining module, the corresponding newline of Format Type for determining target judgement document;And processing module, it is used for
The corresponding newline of Format Type according to target judgement document carries out branch's treatment, obtains document row set.
By the application, using following steps:Branch's treatment is carried out to target judgement document, document row set is obtained, its
In, document row set is the set that multiple document rows are constituted for carrying out being obtained after branch's treatment to target judgement document;According to pre-
If corresponding paragraph mark is added in regularity set to each the document row in document row set respectively, wherein, preset canonical
Regular collection is the set of the rule composition counted according to many judgement documents;And based on each text in document row set
The corresponding paragraph mark of book row carries out segment processing to target judgement document, and the paragraph for solving judgement document in correlation technique is drawn
Point the relatively low problem of accuracy, the corresponding paragraph of each document row in document row set identified to target judgement document
Segment processing is carried out, and then has reached the effect of the accuracy that lifting judgement document paragraph is divided.
Brief description of the drawings
The accompanying drawing for constituting the part of the application is used for providing further understanding of the present application, the schematic reality of the application
Apply example and its illustrate for explaining the application, do not constitute the improper restriction to the application.In the accompanying drawings:
Fig. 1 is the flow chart of the segmentation method of the judgement document according to the application first embodiment;
Fig. 2 is the flow chart of the segmentation method of the judgement document according to the application second embodiment;And
Fig. 3 is the schematic diagram of the sectioning of the judgement document according to the embodiment of the present application.
Specific embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase
Mutually combination.Describe the application in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application
Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present application, it is clear that described embodiment is only
The embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ordinary skill people
The every other embodiment that member is obtained under the premise of creative work is not made, should all belong to the model of the application protection
Enclose.
It should be noted that term " first ", " in the description and claims of this application and above-mentioned accompanying drawing
Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so using
Data can exchange in the appropriate case, so as to embodiments herein described herein.Additionally, term " including " and " tool
Have " and their any deformation, it is intended that covering is non-exclusive to be included, for example, containing series of steps or unit
Process, method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include without clear
It is listing to Chu or for these processes, method, product or other intrinsic steps of equipment or unit.
According to embodiments herein, there is provided a kind of segmentation method of judgement document.
Fig. 1 is the flow chart of the segmentation method of the judgement document according to the application first embodiment.As shown in figure 1, the party
Method is comprised the following steps:
Step S101, branch's treatment is carried out to target judgement document, obtains document row set, wherein, document row set is
The set that multiple document rows are constituted obtained after branch's treatment is carried out to target judgement document.
In order to lift the accuracy of judgement document's paragraph division, in the segmentation side of the judgement document of the application first embodiment
In method, branch's treatment is carried out to target judgement document first, the multiple document rows for obtaining are somebody's turn to do to obtain multiple document row composition document rows
Set.
Preferably, in order to lift the accuracy to the treatment of target judgement document branch, branch is carried out to target judgement document
Treatment, obtaining document row set can also be realized by following steps:Determine the Format Type of target judgement document;Determine target
The corresponding newline of Format Type of judgement document;And carried out according to the corresponding newline of Format Type of target judgement document
Branch is processed, and obtains document row set.
For example, the partial content in a table of contents mark judgement document is as follows:
Yunnan Province Zhenxiong county people's court
Criminal judgment
(2015) the first word the 150th of town punishment
Zhenxiong county people's procuratorate of Yunnan Province of public prosecution organ.
Defendant Xu so-and-so, man.
Detained for criminal act on December 22nd, 2014 because being accused of commission of a theft, the arrested of January 23 in 2015.Now detain in
Zhenxiong County detention house.
Yunnan Province Zhenxiong county people's procuratorate Yi Zhen inspection public prosecution punishment tell (2015) No. 80 indictments accuse defendants Xu so-and-so
Commission of a theft, prosecutes on March 30th, 2015 to the court.The court constitutes collegiate bench in accordance with the law, open on April 18th, 2015
This case is tried.Zhenxiong county people's procuratorate assigns acting prosecutor Pan Yong to appear in court support the public prosecution, and so-and-so arrives defendant slowly
Front yard third party claim.Termination is tried.
Partial content in above-mentioned target judgement document determines the format content class of above target judgement document
Type is text type, determines the corresponding newline of text type, and the content in target judgement document is entered by the newline
Row branch is processed, and obtains multiple document rows, such as:First document row:Yunnan Province Zhenxiong county people's court;Second document row:It is criminal
Court verdict;3rd document row:(2015) the first word the 150th of town punishment;4th document row:Yunnan Province of the public prosecution organ Zhenxiong County people examine
Cha Yuan;5th document row:Defendant Xu so-and-so, man;6th document row:Because being accused of commission of a theft on December 22nd, 2014 by punishment
Thing is detained, the arrested of January 23 in 2015.Now detain in Zhenxiong County detention house.7th document row:The Yunnan Province Zhenxiong County people examine
Cha Yuan tells that (2015) No. 80 indictments accuse so-and-so commission of a theft of defendants Xu with town inspection public prosecution punishment, on March 30th, 2015 to
Prosecute the court.The court constitutes collegiate bench in accordance with the law, and this case has been tried on April 18th, 2015.The Zhenxiong County people
Procuratorate assigns acting prosecutor Pan Yong to appear in court support the public prosecution, defendant Xu so-and-so present in court third party claim.Termination is tried.
Step S102, it is corresponding to each the document row addition in document row set respectively according to default regularity set
Paragraph is identified, wherein, default regularity set is the set of the rule composition counted according to many judgement documents.
Corresponding paragraph mark is added to each the document row in document row set according to default regularity set respectively.
Default regularity set includes multiple regularity conditions.For example, the first regularity condition is " defendant ^ [people]([\
U4e00- u9fa5a] [,]) { 0,5 } [men and women]{ 0,50 } $ ", the first regularity condition is represented:Opened with defendant or defendant
Head, is followed by 0 to 5 minor sentences words, then connect it is possible that sex mark, last it is possible that 50 characters.First canonical
Rule condition represents the defendant's party paragraph in matching target decision document, according to the first regularity condition to matching
Document row addition defendant's party paragraph paragraph mark.And for example, the 5th regularity condition for " violate [and u4e00-
U9fa5a] { 2,20 } crime, sentence [u4e00- u9fa5a] { 1,5 } punishment ", the 5th regularity condition represents matching target decision
Judgement paragraph in document, according to the 5th regularity condition to the document row addition for matching when the paragraph mark of judgement paragraph
Know, etc..
Step S103, is carried out based on the corresponding paragraph mark of each document row in document row set to target judgement document
Segment processing.
Alternatively, in the segmentation method of the judgement document that the application first embodiment is provided, based in document row set
Each document row corresponding paragraph mark segment processing is carried out to target judgement document can also be realized by following steps:Really
Determine each paragraph mark in the corresponding paragraph mark of each document row in document row set;Based on each paragraph mark to mesh
Document row in mark judgement document carries out paragraph division;And merge into together the multiple document rows identified with identical paragraph
One paragraph.
By step S101 to step S103, corresponding paragraph mark is added to each document row, segment processing is based on section
The mark that falls is not influenceed by other document rows, i.e., each document row looks for ownership paragraph relatively independent, and then has reached lifting judge
The accuracy that document paragraph is divided.
The segmentation method of the judgement document that the application first embodiment is provided, is carried out at branch by target judgement document
Reason, obtains document row set, wherein, document row set is the multiple documents for carrying out being obtained after branch's treatment to target judgement document
The set of row composition;Corresponding paragraph is added to each the document row in document row set according to default regularity set respectively
Mark, wherein, default regularity set is the set of the rule composition counted according to many judgement documents;And based on text
The corresponding paragraph mark of each document row in book row set carries out segment processing to target judgement document, solves correlation technique
The relatively low problem of accuracy that the paragraph of middle judgement document is divided, each the corresponding paragraph of document row in document row set
Mark carries out segment processing to target judgement document, and then has reached the effect of the accuracy that lifting judgement document paragraph is divided.
Fig. 2 is the flow chart of the segmentation method of the judgement document according to the application second embodiment.Fig. 2 can be as Fig. 1
A kind of preferred embodiment of illustrated embodiment.As shown in Fig. 2 the method is comprised the following steps:
Step S201, branch's treatment is carried out to target judgement document, obtains document row set, wherein, document row set is
The set that multiple document rows are constituted obtained after branch's treatment is carried out to target judgement document.
Step S201 will not be repeated here with above-mentioned steps S101.
Step S202, by the multiple regularity conditions in default regularity set gradually with document row set in it is every
The content of individual document row is matched.
For example, by multiple regularity conditions gradually with document row set in the first document row:Yunnan Province people from Zhenxiong County
Civil law institute;Second document row:Criminal judgment;3rd document row:(2015) the first word the 150th of town punishment;4th document row:Public prosecution
Machine-operated Yunnan Province Zhenxiong county people's procuratorate;5th document row:Defendant Xu so-and-so, man etc. is matched.
Alternatively, in the segmentation method of the judgement document of the application second embodiment, multiple regularity conditions include
First regularity condition and the second regularity condition, wherein, the first regularity condition is currently carried out with document row set
The condition of matching, the second regularity condition is the first regularity condition and document row set in the case that it fails to match, many
Next condition matched with document row set in individual regularity condition, by the multiple in default regularity set just
Then rule condition gradually carry out matching with the content of each the document row in document row set including:By multiple regularity conditions
In the first regularity condition matched with the content of each the document row in document row set;Judge multiple regularities
The first regularity condition in condition terminates with whether the content of each the document row in document row set matches;If multiple
The first regularity condition in regularity condition has been matched with the content of each the document row in document row set and terminated, will
The second regularity condition in multiple regularity conditions as currently with document row set in each document row content
The condition for being matched;And by each in the second regularity condition in multiple regularity conditions and document row set
The content of document row is matched.
For example, the first regularity condition is:" defendant ^ [people]([in u4e00- document row sets u9fa5a]
[,]) { 0,5 } [men and women]{ 0,50 } $ ";Second regularity condition is:" thinking ^ the courts ", the second regularity condition table
Show:To match the row that " thinking the court " starts, by each the document row in the first regularity condition and document row set
Appearance is matched.If the content matching of each the document row in the first regularity condition and document row set terminates, by
Two regularity conditions are:" thinking ^ the courts " is matched with the content of each the document row in document row set.
Step S203, obtain document row set in multiple regularity condition couplings on document row.
For example, defendant Xu in the 5th participle row set in document row set so-and-so, it is male with the first above-mentioned canonical
Rule condition is matched, i.e., defendant Xu so-and-so, the 5th document row in male corresponding document behavior document row set.
Step S204, pair paragraph corresponding with the document row addition in multiple regularity condition couplings is identified.
For example, defendant Xu in the 5th participle row set in document row set so-and-so, it is male with the first above-mentioned canonical
Rule condition is matched, and the first regularity condition represents the defendant's party paragraph in matching target decision document, i.e., right
The paragraph mark of the 5th document row addition defendant's party paragraph.
The content in the 20th document row in document row set:The court thinks, in the second regularity condition coupling,
Second regularity condition represents that paragraph is thought in this case in matching target decision document, i.e., to the 20th in document row set
Document row adds the paragraph mark that paragraph is thought in this case.
Step S205, is carried out based on the corresponding paragraph mark of each document row in document row set to target judgement document
Segment processing.
Step S205 will not be repeated here with above-mentioned steps S103.
Alternatively, in the segmentation method of the judgement document of the application second embodiment, according to default regularity set
Adding corresponding paragraph mark to each the document row in document row set respectively also includes:By in default regularity set
Content of multiple regularity conditions gradually with each the document row in document row set is matched;In acquisition document row set
Not with multiple regularity condition couplings on document row, obtain at least one non-identification instrument row;In document row set really
The corresponding paragraph mark of a upper document row of fixed at least one non-identification instrument row;And by least one non-identification instrument row
A upper document row corresponding paragraph mark identified as the paragraph of at least one non-identification instrument row.
For example, the content in the 21st document row in document row set is not all matched with all regularity conditions
On, the paragraph mark of the 20th document row in document row set is identified as the paragraph of the 20th a line document row.
The segmentation method of the judgement document that the application second embodiment is provided, is carried out at branch by target judgement document
Reason, obtains document row set, wherein, document row set is the multiple documents for carrying out being obtained after branch's treatment to target judgement document
The set of row composition;By the multiple regularity conditions in default regularity set gradually with document row set in each text
The content of book row is matched;Obtain document row set in multiple regularity condition couplings on document row;And pair with
Document row in multiple regularity condition couplings adds corresponding paragraph mark;And based on each text in document row set
The corresponding paragraph mark of book row carries out segment processing to target judgement document, and the paragraph for solving judgement document in correlation technique is drawn
Point the relatively low problem of accuracy, the corresponding paragraph of each document row in document row set identified to target judgement document
Segment processing is carried out, and then has reached the effect of the accuracy that lifting judgement document paragraph is divided.
It should be noted that can be in such as one group computer executable instructions the step of the flow of accompanying drawing is illustrated
Performed in computer system, and, although logical order is shown in flow charts, but in some cases, can be with not
The order being same as herein performs shown or described step.
The embodiment of the present application additionally provides a kind of sectioning of judgement document, it is necessary to explanation, the embodiment of the present application
The sectioning of judgement document can be used for performing the segmentation method for judgement document that is provided of the embodiment of the present application.With
Under the sectioning of judgement document that the embodiment of the present application is provided is introduced.
Fig. 3 is the schematic diagram of the sectioning of the judgement document according to the embodiment of the present application.As shown in figure 3, the device bag
Include:First processing units 10, adding device 20 and second processing unit 30.
First processing units 10, for carrying out branch's treatment to target judgement document, obtain document row set, wherein, text
Book row set is the set that multiple document rows are constituted for carrying out being obtained after branch's treatment to target judgement document.
Adding device 20, for being added to each the document row in document row set respectively according to the set of default regularity
Corresponding paragraph mark, wherein, default regularity set is the set of the rule composition counted according to many judgement documents.
Second processing unit 30, for being cut out to target based on the corresponding paragraph mark of each document row in document row set
Sentencing document carries out segment processing.
The sectioning of the judgement document that the embodiment of the present application is provided, by first processing units 10 to target judgement document
Branch's treatment is carried out, document row set is obtained, wherein, document row set is to be obtained after carrying out branch's treatment to target judgement document
Multiple document rows composition set;Adding device 20 is according to default regularity set respectively to each in document row set
Document row adds corresponding paragraph mark, wherein, it is the rule counted according to many judgement documents to preset regularity set
The set of composition;And the corresponding paragraph of each document row that second processing unit 30 is based in document row set is identified to target
Judgement document carries out segment processing, solves the problems, such as that the accuracy that the paragraph of judgement document in correlation technique is divided is relatively low, leads to
Cross second processing unit 30 is carried out based on the corresponding paragraph mark of each document row in document row set to target judgement document
Segment processing, and then reached the effect of the accuracy that lifting judgement document paragraph is divided.
Alternatively, in the sectioning of the judgement document that the embodiment of the present application is provided, adding device 20 includes:First
With module, for by the multiple regularity conditions in default regularity set gradually with document row set in each document
Capable content is matched;First acquisition module, for obtain in document row set with multiple regularity condition couplings on
Document row;And add module, identified for pair paragraph corresponding with the document row addition in multiple regularity condition couplings.
Alternatively, in the sectioning of the judgement document that the embodiment of the present application is provided, adding device 20 includes:Second obtains
Modulus block, for obtain in document row set not with multiple regularity condition couplings on document row, obtain at least one not
Identification instrument row;First determining module, the upper text for determining at least one non-identification instrument row in document row set
The corresponding paragraph mark of book row;And second determining module, for by a upper document row of at least one non-identification instrument row
Corresponding paragraph mark is identified as the paragraph of at least one non-identification instrument row.
Alternatively, in the sectioning of the judgement document that the embodiment of the present application is provided, first processing units 10 include:The
Three determining modules, the Format Type for determining target judgement document;4th determining module, for determining target judgement document's
The corresponding newline of Format Type;And processing module, for the corresponding newline of Format Type according to target judgement document
Branch's treatment is carried out, document row set is obtained.
The sectioning of the judgement document includes processor and memory, above-mentioned first processing units, adding device, the
Two processing units etc. are stored in memory as program unit, by computing device storage said procedure in memory
Unit realizes corresponding function.
Kernel is included in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can set one
Or more, judgement document is accurately segmented by adjusting kernel parameter.
Memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/
Or the form, such as read-only storage (ROM) or flash memory (flash RAM) such as Nonvolatile memory, memory is deposited including at least one
Storage chip.
Present invention also provides a kind of computer program product, when being performed on data processing equipment, it is adapted for carrying out just
The program code of beginningization there are as below methods step:Branch's treatment is carried out to target judgement document, document row set is obtained, wherein,
The document row set is the set that multiple document rows are constituted for carrying out being obtained after branch's treatment to the target judgement document;Root
Corresponding paragraph is added according to default regularity set to each the document row in the document row set respectively to identify, wherein,
The default regularity set is the set of the rule composition counted according to many judgement documents;And based on the document
The corresponding paragraph mark of each document row in row set carries out segment processing to the target judgement document.
It should be noted that for foregoing each method embodiment, in order to be briefly described, therefore it is all expressed as a series of
Combination of actions, but those skilled in the art should know, the application is not limited by described sequence of movement because
According to the application, some steps can sequentially or simultaneously be carried out using other.Secondly, those skilled in the art should also know
Know, embodiment described in this description belongs to preferred embodiment, involved action and module not necessarily the application
It is necessary.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion described in detail in certain embodiment
Point, may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed device, can be by another way
Realize.For example, device embodiment described above is only schematical, such as the division of described unit is only one kind
Division of logic function, can there is other dividing mode when actually realizing, such as multiple units or component can combine or can
To be integrated into another system, or some features can be ignored, or not perform.
The unit that is illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit
The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On NE.Some or all of unit therein can be according to the actual needs selected to realize the mesh of this embodiment scheme
's.
In addition, during each functional unit in the application each embodiment can be integrated in a processing unit, it is also possible to
It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.Above-mentioned integrated list
Unit can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
Obviously, those skilled in the art should be understood that each module or each step of above-mentioned the application can be with general
Computing device realize that they can be concentrated on single computing device, or be distributed in multiple computing devices and constituted
Network on, alternatively, the program code that they can be can perform with computing device be realized, it is thus possible to they are stored
Performed by computing device in the storage device, or they be fabricated to each integrated circuit modules respectively, or by they
In multiple modules or step single integrated circuit module is fabricated to realize.So, the application is not restricted to any specific
Hardware and software is combined.
The preferred embodiment of the application is the foregoing is only, the application is not limited to, for those skilled in the art
For member, the application can have various modifications and variations.All any modifications within spirit herein and principle, made,
Equivalent, improvement etc., should be included within the protection domain of the application.
Claims (10)
1. a kind of segmentation method of judgement document, it is characterised in that including:
Branch's treatment is carried out to target judgement document, document row set is obtained, wherein, the document row set is to the target
Judgement document carries out the set of the multiple document rows composition obtained after branch's treatment;
Corresponding paragraph mark is added to each the document row in the document row set according to default regularity set respectively,
Wherein, the default regularity set is the set of the rule composition counted according to many judgement documents;And
The target judgement document is segmented based on each the document row corresponding paragraph mark in the document row set
Treatment.
2. method according to claim 1, it is characterised in that according to default regularity set respectively to the document row
Each document row in set adds corresponding paragraph mark to be included:
By the multiple regularity conditions in default regularity set gradually with the document row set in each document row
Content matched;
Obtain in the document row set with the multiple regularity condition coupling on document row;And
Pair corresponding with the document row addition in the multiple regularity condition coupling paragraph mark.
3. method according to claim 1, it is characterised in that according to default regularity set respectively to the document row
Each document row in set adds corresponding paragraph mark to be included:
By the multiple regularity conditions in default regularity set gradually with the document row set in each document row
Content matched;
Obtain in the document row set not with the multiple regularity condition coupling on document row, obtain at least one not
Identification instrument row;
The corresponding paragraph mark of a upper document row of described at least one non-identification instrument row is determined in the document row set
Know;And
The corresponding paragraph of a upper document row of described at least one non-identification instrument row is identified as described at least one not
The paragraph mark of identification instrument row.
4. according to the method in claim 2 or 3, it is characterised in that the multiple regularity condition includes the first canonical
Rule condition and the second regularity condition, wherein, the first regularity condition is currently carried out with the document row set
The condition of matching, the second regularity condition is that it fails to match with the document row set for the first regularity condition
In the case of, next condition matched with the document row set in the multiple regularity condition will be preset just
Then the multiple regularity conditions in regular collection are gradually carried out with the content of each document row in the document row set
With including:
By each the document row in the first regularity condition in the multiple regularity condition and the document row set
Content matched;
Judge each document in the first regularity condition in the multiple regularity condition and the document row set
Whether capable content matches is terminated;
If each document in the first regularity condition in the multiple regularity condition and the document row set
Capable content has been matched and terminated, using the second regularity condition in the multiple regularity condition as currently with the text
The condition that the content of each the document row in book row set is matched;And
By each the document row in the second regularity condition in the multiple regularity condition and the document row set
Content matched.
5. method according to claim 1, it is characterised in that branch's treatment is carried out to target judgement document, document is obtained
Row set includes:
Determine the Format Type of the target judgement document;
Determine the corresponding newline of Format Type of the target judgement document;And
The corresponding newline of Format Type according to the target judgement document carries out branch's treatment, obtains the document row collection
Close.
6. method according to claim 1, it is characterised in that based on each the document row correspondence in the document row set
Paragraph mark segment processing carried out to the target judgement document include:
Determine each paragraph mark in the corresponding paragraph mark of each document row in the document row set;
Paragraph division is carried out to the document row in the target judgement document based on each paragraph mark;And
The multiple document rows identified with identical paragraph are merged into same paragraph.
7. a kind of sectioning of judgement document, it is characterised in that including:
First processing units, for carrying out branch's treatment to target judgement document, obtain document row set, wherein, the document
Row set is the set that multiple document rows are constituted for carrying out being obtained after branch's treatment to the target judgement document;
Adding device, for right to each the document row addition in the document row set respectively according to the set of default regularity
The paragraph answered is identified, wherein, the default regularity set is the collection of the rule composition counted according to many judgement documents
Close;And
Second processing unit, for being identified to the target based on the corresponding paragraph of each document row in the document row set
Judgement document carries out segment processing.
8. device according to claim 7, it is characterised in that the adding device includes:
First matching module, for by the multiple regularity conditions in default regularity set gradually with the document row collection
The content of each the document row in conjunction is matched;
First acquisition module, for obtain in the document row set with the multiple regularity condition coupling on document
OK;And
Add module, for pair paragraph mark corresponding with the document row addition in the multiple regularity condition coupling.
9. device according to claim 7, it is characterised in that the adding device includes:
Second matching module, for by the multiple regularity conditions in default regularity set gradually with the document row collection
The content of each the document row in conjunction is matched;
Second acquisition module, for obtain in the document row set not with the multiple regularity condition coupling on document
OK, at least one non-identification instrument row is obtained;
First determining module, the upper text for determining described at least one non-identification instrument row in the document row set
The corresponding paragraph mark of book row;And
Second determining module, for the corresponding paragraph mark of a upper document row of described at least one non-identification instrument row to be made
For the paragraph of described at least one non-identification instrument row is identified.
10. device according to claim 7, it is characterised in that the first processing units include:
3rd determining module, the Format Type for determining the target judgement document;
4th determining module, the corresponding newline of Format Type for determining the target judgement document;And
Processing module, branch's treatment is carried out for the corresponding newline of Format Type according to the target judgement document, is obtained
The document row set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510867898.7A CN106815204A (en) | 2015-12-01 | 2015-12-01 | The segmentation method and device of judgement document |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510867898.7A CN106815204A (en) | 2015-12-01 | 2015-12-01 | The segmentation method and device of judgement document |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106815204A true CN106815204A (en) | 2017-06-09 |
Family
ID=59108088
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510867898.7A Pending CN106815204A (en) | 2015-12-01 | 2015-12-01 | The segmentation method and device of judgement document |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106815204A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108984518A (en) * | 2018-06-11 | 2018-12-11 | 人民法院信息技术服务中心 | A kind of file classification method towards judgement document |
CN109145097A (en) * | 2018-06-11 | 2019-01-04 | 人民法院信息技术服务中心 | A kind of judgement document's classification method based on information extraction |
CN110750974A (en) * | 2019-09-20 | 2020-02-04 | 成都星云律例科技有限责任公司 | Structured processing method and system for referee document |
CN111104798A (en) * | 2018-10-27 | 2020-05-05 | 北京智慧正安科技有限公司 | Analysis method, system and computer readable storage medium for criminal plot in legal document |
CN113239682A (en) * | 2021-05-06 | 2021-08-10 | 吉林大学 | Method and device for correcting errors of referee documents |
CN113673255A (en) * | 2021-08-25 | 2021-11-19 | 北京市律典通科技有限公司 | Text function region splitting method and device, computer equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103246641A (en) * | 2013-05-16 | 2013-08-14 | 李营 | Text semantic information analyzing system and method |
CN104462073A (en) * | 2014-12-26 | 2015-03-25 | 武汉传神信息技术有限公司 | Processing method and system for file coordinated translation |
CN104714944A (en) * | 2015-04-14 | 2015-06-17 | 语联网(武汉)信息技术有限公司 | Document translation method and document translation system |
-
2015
- 2015-12-01 CN CN201510867898.7A patent/CN106815204A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103246641A (en) * | 2013-05-16 | 2013-08-14 | 李营 | Text semantic information analyzing system and method |
CN104462073A (en) * | 2014-12-26 | 2015-03-25 | 武汉传神信息技术有限公司 | Processing method and system for file coordinated translation |
CN104714944A (en) * | 2015-04-14 | 2015-06-17 | 语联网(武汉)信息技术有限公司 | Document translation method and document translation system |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108984518A (en) * | 2018-06-11 | 2018-12-11 | 人民法院信息技术服务中心 | A kind of file classification method towards judgement document |
CN109145097A (en) * | 2018-06-11 | 2019-01-04 | 人民法院信息技术服务中心 | A kind of judgement document's classification method based on information extraction |
CN111104798A (en) * | 2018-10-27 | 2020-05-05 | 北京智慧正安科技有限公司 | Analysis method, system and computer readable storage medium for criminal plot in legal document |
CN111104798B (en) * | 2018-10-27 | 2023-04-21 | 北京智慧正安科技有限公司 | Resolution method, system and computer readable storage medium for sentencing episodes in legal documents |
CN110750974A (en) * | 2019-09-20 | 2020-02-04 | 成都星云律例科技有限责任公司 | Structured processing method and system for referee document |
CN113239682A (en) * | 2021-05-06 | 2021-08-10 | 吉林大学 | Method and device for correcting errors of referee documents |
CN113239682B (en) * | 2021-05-06 | 2022-11-01 | 吉林大学 | Method and device for correcting errors of referee documents |
CN113673255A (en) * | 2021-08-25 | 2021-11-19 | 北京市律典通科技有限公司 | Text function region splitting method and device, computer equipment and storage medium |
CN113673255B (en) * | 2021-08-25 | 2023-06-30 | 北京市律典通科技有限公司 | Text function area splitting method and device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106815204A (en) | The segmentation method and device of judgement document | |
CN103299304B (en) | Classifying rules generating means and classifying rules generate method | |
CN107562918A (en) | A kind of mathematical problem knowledge point discovery and batch label acquisition method | |
CN108229156A (en) | URL attack detection methods, device and electronic equipment | |
CN109005145A (en) | A kind of malice URL detection system and its method extracted based on automated characterization | |
CN106815263A (en) | The searching method and device of legal provision | |
CN108647732A (en) | A kind of pathological image sorting technique and device based on deep neural network | |
CN110209795A (en) | Comment on recognition methods, device, computer readable storage medium and computer equipment | |
CN105824825B (en) | A kind of sensitive data recognition methods and device | |
CN106919542A (en) | Method and device for rule matching | |
CN111950408B (en) | Finger vein image recognition method and device based on rule diagram and storage medium | |
CN105893484A (en) | Microblog Spammer recognition method based on text characteristics and behavior characteristics | |
CN109858476A (en) | The extending method and electronic equipment of label | |
CN107506350A (en) | A kind of method and apparatus of identification information | |
CN107330009A (en) | Descriptor disaggregated model creation method, creating device and storage medium | |
CN107222511A (en) | Detection method and device, computer installation and the readable storage medium storing program for executing of Malware | |
CN106815265A (en) | The searching method and device of judgement document | |
CN106815205A (en) | The segmentation method and device of judgement document | |
CN106815209A (en) | A kind of Uighur agricultural technology term recognition methods | |
CN112818693A (en) | Automatic extraction method and system for electronic component model words | |
CN102737017B (en) | Method and apparatus for extracting page theme | |
CN106355247A (en) | Method for data processing and device, chip and electronic equipment | |
US8626688B2 (en) | Pattern matching device and method using non-deterministic finite automaton | |
CN106127202A (en) | The method of character recognition and device in a kind of picture | |
CN109472289A (en) | Critical point detection method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing Applicant after: Beijing Guoshuang Technology Co.,Ltd. Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing Applicant before: Beijing Guoshuang Technology Co.,Ltd. |
|
CB02 | Change of applicant information | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170609 |
|
RJ01 | Rejection of invention patent application after publication |