CN106815204A - The segmentation method and device of judgement document - Google Patents

The segmentation method and device of judgement document Download PDF

Info

Publication number
CN106815204A
CN106815204A CN201510867898.7A CN201510867898A CN106815204A CN 106815204 A CN106815204 A CN 106815204A CN 201510867898 A CN201510867898 A CN 201510867898A CN 106815204 A CN106815204 A CN 106815204A
Authority
CN
China
Prior art keywords
document
row
regularity
document row
condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510867898.7A
Other languages
Chinese (zh)
Inventor
胡斌
杜宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201510867898.7A priority Critical patent/CN106815204A/en
Publication of CN106815204A publication Critical patent/CN106815204A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars

Abstract

This application discloses the segmentation method and device of a kind of judgement document.The method includes:Branch's treatment is carried out to target judgement document, document row set is obtained, wherein, document row set is the set that multiple document rows are constituted for carrying out being obtained after branch's treatment to target judgement document;Corresponding paragraph mark is added to each the document row in document row set according to default regularity set respectively, wherein, default regularity set is the set of the rule composition counted according to many judgement documents;And segment processing is carried out to target judgement document based on the corresponding paragraph mark of each document row in document row set.By the application, solve the problems, such as that the accuracy that the paragraph of judgement document in correlation technique is divided is relatively low.

Description

The segmentation method and device of judgement document
Technical field
The application is related to text-processing technical field, in particular to the segmentation method and device of a kind of judgement document.
Background technology
Judgement document is the carrier for recording people's court's hearing process and result, is also that people's court determines and distribute to work as thing People's substantive right and voluntary only voucher.The judgement document that a structural integrity, key element are complete, logic is rigorous, is both to work as thing People enjoys rights and bears the voucher of obligation, is also the important of higher level people's court supervision People's Courts at lower levels civil adjudication Foundation.
In correlation technique, need to carry out paragraph division to carry out the data analysis of correlation by judgement document often.Generally, will It is to match line by line that judgement document's paragraph divides the technology for using, and full text is split the text chain of a line head and the tail connection in a row first Table;Secondly text chained list is matched into existing regulation linked, wherein, chained list is a kind of linear list, but can't be by linear Sequential storage data, but the pointer of next node is stored in each node.Text chained list and regulation linked be all by Individual matching and Next Occurrence is jumped to after the match is successful, according to specific occurrence output to corresponding paragraph;Due to making Two chained lists match and are unidirectionally to match forward, if somewhere above is after it fails to match, postorder all the elements are all With ging wrong.I.e. paragraph is divided easily and mistake mistake everywhere at occurs, this serious related mistake.Therefore, judge is caused The accuracy that paragraph is divided in document is relatively low.
For the relatively low problem of the accuracy of the paragraph division of judgement document in correlation technique, not yet propose at present effective Solution.
The content of the invention
The main purpose of the application is the segmentation method and device for providing a kind of judgement document, with solving correlation technique The relatively low problem of accuracy that the paragraph of judgement document is divided.
To achieve these goals, according to the one side of the application, there is provided a kind of segmentation method of judgement document.Should Method includes:Branch's treatment is carried out to target judgement document, document row set is obtained, wherein, document row set is that target is cut out Sentence the set that multiple document rows are constituted that document obtained after branch's treatment;According to default regularity set respectively to document Each document row in row set adds corresponding paragraph mark, wherein, default regularity set is according to many judge's texts The set of the rule composition that book is counted;And identified to target based on the corresponding paragraph of each document row in document row set Judgement document carries out segment processing.
Further, it is corresponding to each the document row addition in document row set respectively according to default regularity set Paragraph mark includes:;By the multiple regularity conditions in default regularity set gradually with document row set in each The content of document row is matched;Obtain document row set in multiple regularity condition couplings on document row;And it is right Paragraph mark corresponding with the document row addition in multiple regularity condition couplings.
Further, it is corresponding to each the document row addition in document row set respectively according to default regularity set Paragraph mark includes:By the multiple regularity conditions in default regularity set gradually with document row set in each text The content of book row is matched;Obtain document row set in not with multiple regularity condition couplings on document row, obtain to A few non-identification instrument row;Determine that a upper document row of at least one non-identification instrument row is corresponding in document row set Paragraph is identified;And identify as at least one not the corresponding paragraph of a upper document row of at least one non-identification instrument row The paragraph mark of identification instrument row.
Further, multiple regularity conditions include the first regularity condition and the second regularity condition, wherein, The condition that first regularity condition is currently matched with document row set, the second regularity condition is the first regularity Condition and document row set are next in multiple regularity conditions to be matched with document row set in the case that it fails to match Condition, by multiple regularity conditions in default regularity set gradually with document row set in each document row Content carries out matching to be included:By each text in the first regularity condition in multiple regularity conditions and document row set The content of book row is matched;Judge every in the first regularity condition in multiple regularity conditions and document row set Whether the content of individual document row matches and terminates;If the first regularity condition and document row collection in multiple regularity conditions The content of each the document row in conjunction has been matched and terminated, using the second regularity condition in multiple regularity conditions as work as The condition that the content of each the document row in the preceding row set with document is matched;And by multiple regularity conditions Two regularity conditions are matched with the content of each the document row in document row set.
Further, branch's treatment is carried out to target judgement document, obtaining document row set includes:Determine target judge's text The Format Type of book;Determine the corresponding newline of Format Type of target judgement document;And according to the lattice of target judgement document The corresponding newline of formula type carries out branch's treatment, obtains document row set.
Further, target judgement document is carried out based on each the document row corresponding paragraph mark in document row set Segment processing includes:Determine each paragraph mark in the corresponding paragraph mark of each document row in document row set;It is based on Each paragraph mark carries out paragraph division to the document row in target judgement document;And it is many by what is identified with identical paragraph Individual document row merges into same paragraph.
To achieve these goals, according to the another aspect of the application, there is provided a kind of sectioning of judgement document.Should Device includes:First processing units, for carrying out branch's treatment to target judgement document, obtain document row set, wherein, document Row set is the set that multiple document rows are constituted for carrying out being obtained after branch's treatment to target judgement document;Adding device, is used for Corresponding paragraph mark is added to each the document row in document row set according to default regularity set respectively, wherein, in advance If regularity set is the set of the rule composition counted according to many judgement documents;And second processing unit, it is used for Segment processing is carried out to target judgement document based on the corresponding paragraph mark of each document row in document row set.
Further, adding device includes:;First matching module, for by default regularity set it is multiple just Then content of the rule condition gradually with each the document row in document row set is matched;First acquisition module, for obtaining In document row set with multiple regularity condition couplings on document row;And add module, advised with multiple canonicals for Dui Then the document row in condition coupling adds corresponding paragraph mark.
Further, adding device includes:Second matching module, for by the multiple canonicals in default regularity set Content of the rule condition gradually with each the document row in document row set is matched;Second acquisition module, for obtaining text In book row set not with multiple regularity condition couplings on document row, obtain at least one non-identification instrument row;First is true Cover half block, the corresponding paragraph mark of a upper document row for determining at least one non-identification instrument row in document row set Know;And second determining module, for the corresponding paragraph mark of a upper document row of at least one non-identification instrument row to be made For the paragraph of at least one non-identification instrument row is identified.
Further, first processing units include:3rd determining module, the form class for determining target judgement document Type;4th determining module, the corresponding newline of Format Type for determining target judgement document;And processing module, it is used for The corresponding newline of Format Type according to target judgement document carries out branch's treatment, obtains document row set.
By the application, using following steps:Branch's treatment is carried out to target judgement document, document row set is obtained, its In, document row set is the set that multiple document rows are constituted for carrying out being obtained after branch's treatment to target judgement document;According to pre- If corresponding paragraph mark is added in regularity set to each the document row in document row set respectively, wherein, preset canonical Regular collection is the set of the rule composition counted according to many judgement documents;And based on each text in document row set The corresponding paragraph mark of book row carries out segment processing to target judgement document, and the paragraph for solving judgement document in correlation technique is drawn Point the relatively low problem of accuracy, the corresponding paragraph of each document row in document row set identified to target judgement document Segment processing is carried out, and then has reached the effect of the accuracy that lifting judgement document paragraph is divided.
Brief description of the drawings
The accompanying drawing for constituting the part of the application is used for providing further understanding of the present application, the schematic reality of the application Apply example and its illustrate for explaining the application, do not constitute the improper restriction to the application.In the accompanying drawings:
Fig. 1 is the flow chart of the segmentation method of the judgement document according to the application first embodiment;
Fig. 2 is the flow chart of the segmentation method of the judgement document according to the application second embodiment;And
Fig. 3 is the schematic diagram of the sectioning of the judgement document according to the embodiment of the present application.
Specific embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combination.Describe the application in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present application, it is clear that described embodiment is only The embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ordinary skill people The every other embodiment that member is obtained under the premise of creative work is not made, should all belong to the model of the application protection Enclose.
It should be noted that term " first ", " in the description and claims of this application and above-mentioned accompanying drawing Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so using Data can exchange in the appropriate case, so as to embodiments herein described herein.Additionally, term " including " and " tool Have " and their any deformation, it is intended that covering is non-exclusive to be included, for example, containing series of steps or unit Process, method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include without clear It is listing to Chu or for these processes, method, product or other intrinsic steps of equipment or unit.
According to embodiments herein, there is provided a kind of segmentation method of judgement document.
Fig. 1 is the flow chart of the segmentation method of the judgement document according to the application first embodiment.As shown in figure 1, the party Method is comprised the following steps:
Step S101, branch's treatment is carried out to target judgement document, obtains document row set, wherein, document row set is The set that multiple document rows are constituted obtained after branch's treatment is carried out to target judgement document.
In order to lift the accuracy of judgement document's paragraph division, in the segmentation side of the judgement document of the application first embodiment In method, branch's treatment is carried out to target judgement document first, the multiple document rows for obtaining are somebody's turn to do to obtain multiple document row composition document rows Set.
Preferably, in order to lift the accuracy to the treatment of target judgement document branch, branch is carried out to target judgement document Treatment, obtaining document row set can also be realized by following steps:Determine the Format Type of target judgement document;Determine target The corresponding newline of Format Type of judgement document;And carried out according to the corresponding newline of Format Type of target judgement document Branch is processed, and obtains document row set.
For example, the partial content in a table of contents mark judgement document is as follows:
Yunnan Province Zhenxiong county people's court
Criminal judgment
(2015) the first word the 150th of town punishment
Zhenxiong county people's procuratorate of Yunnan Province of public prosecution organ.
Defendant Xu so-and-so, man.
Detained for criminal act on December 22nd, 2014 because being accused of commission of a theft, the arrested of January 23 in 2015.Now detain in Zhenxiong County detention house.
Yunnan Province Zhenxiong county people's procuratorate Yi Zhen inspection public prosecution punishment tell (2015) No. 80 indictments accuse defendants Xu so-and-so Commission of a theft, prosecutes on March 30th, 2015 to the court.The court constitutes collegiate bench in accordance with the law, open on April 18th, 2015 This case is tried.Zhenxiong county people's procuratorate assigns acting prosecutor Pan Yong to appear in court support the public prosecution, and so-and-so arrives defendant slowly Front yard third party claim.Termination is tried.
Partial content in above-mentioned target judgement document determines the format content class of above target judgement document Type is text type, determines the corresponding newline of text type, and the content in target judgement document is entered by the newline Row branch is processed, and obtains multiple document rows, such as:First document row:Yunnan Province Zhenxiong county people's court;Second document row:It is criminal Court verdict;3rd document row:(2015) the first word the 150th of town punishment;4th document row:Yunnan Province of the public prosecution organ Zhenxiong County people examine Cha Yuan;5th document row:Defendant Xu so-and-so, man;6th document row:Because being accused of commission of a theft on December 22nd, 2014 by punishment Thing is detained, the arrested of January 23 in 2015.Now detain in Zhenxiong County detention house.7th document row:The Yunnan Province Zhenxiong County people examine Cha Yuan tells that (2015) No. 80 indictments accuse so-and-so commission of a theft of defendants Xu with town inspection public prosecution punishment, on March 30th, 2015 to Prosecute the court.The court constitutes collegiate bench in accordance with the law, and this case has been tried on April 18th, 2015.The Zhenxiong County people Procuratorate assigns acting prosecutor Pan Yong to appear in court support the public prosecution, defendant Xu so-and-so present in court third party claim.Termination is tried.
Step S102, it is corresponding to each the document row addition in document row set respectively according to default regularity set Paragraph is identified, wherein, default regularity set is the set of the rule composition counted according to many judgement documents.
Corresponding paragraph mark is added to each the document row in document row set according to default regularity set respectively. Default regularity set includes multiple regularity conditions.For example, the first regularity condition is " defendant ^ [people]([\ U4e00- u9fa5a] [,]) { 0,5 } [men and women]{ 0,50 } $ ", the first regularity condition is represented:Opened with defendant or defendant Head, is followed by 0 to 5 minor sentences words, then connect it is possible that sex mark, last it is possible that 50 characters.First canonical Rule condition represents the defendant's party paragraph in matching target decision document, according to the first regularity condition to matching Document row addition defendant's party paragraph paragraph mark.And for example, the 5th regularity condition for " violate [and u4e00- U9fa5a] { 2,20 } crime, sentence [u4e00- u9fa5a] { 1,5 } punishment ", the 5th regularity condition represents matching target decision Judgement paragraph in document, according to the 5th regularity condition to the document row addition for matching when the paragraph mark of judgement paragraph Know, etc..
Step S103, is carried out based on the corresponding paragraph mark of each document row in document row set to target judgement document Segment processing.
Alternatively, in the segmentation method of the judgement document that the application first embodiment is provided, based in document row set Each document row corresponding paragraph mark segment processing is carried out to target judgement document can also be realized by following steps:Really Determine each paragraph mark in the corresponding paragraph mark of each document row in document row set;Based on each paragraph mark to mesh Document row in mark judgement document carries out paragraph division;And merge into together the multiple document rows identified with identical paragraph One paragraph.
By step S101 to step S103, corresponding paragraph mark is added to each document row, segment processing is based on section The mark that falls is not influenceed by other document rows, i.e., each document row looks for ownership paragraph relatively independent, and then has reached lifting judge The accuracy that document paragraph is divided.
The segmentation method of the judgement document that the application first embodiment is provided, is carried out at branch by target judgement document Reason, obtains document row set, wherein, document row set is the multiple documents for carrying out being obtained after branch's treatment to target judgement document The set of row composition;Corresponding paragraph is added to each the document row in document row set according to default regularity set respectively Mark, wherein, default regularity set is the set of the rule composition counted according to many judgement documents;And based on text The corresponding paragraph mark of each document row in book row set carries out segment processing to target judgement document, solves correlation technique The relatively low problem of accuracy that the paragraph of middle judgement document is divided, each the corresponding paragraph of document row in document row set Mark carries out segment processing to target judgement document, and then has reached the effect of the accuracy that lifting judgement document paragraph is divided.
Fig. 2 is the flow chart of the segmentation method of the judgement document according to the application second embodiment.Fig. 2 can be as Fig. 1 A kind of preferred embodiment of illustrated embodiment.As shown in Fig. 2 the method is comprised the following steps:
Step S201, branch's treatment is carried out to target judgement document, obtains document row set, wherein, document row set is The set that multiple document rows are constituted obtained after branch's treatment is carried out to target judgement document.
Step S201 will not be repeated here with above-mentioned steps S101.
Step S202, by the multiple regularity conditions in default regularity set gradually with document row set in it is every The content of individual document row is matched.
For example, by multiple regularity conditions gradually with document row set in the first document row:Yunnan Province people from Zhenxiong County Civil law institute;Second document row:Criminal judgment;3rd document row:(2015) the first word the 150th of town punishment;4th document row:Public prosecution Machine-operated Yunnan Province Zhenxiong county people's procuratorate;5th document row:Defendant Xu so-and-so, man etc. is matched.
Alternatively, in the segmentation method of the judgement document of the application second embodiment, multiple regularity conditions include First regularity condition and the second regularity condition, wherein, the first regularity condition is currently carried out with document row set The condition of matching, the second regularity condition is the first regularity condition and document row set in the case that it fails to match, many Next condition matched with document row set in individual regularity condition, by the multiple in default regularity set just Then rule condition gradually carry out matching with the content of each the document row in document row set including:By multiple regularity conditions In the first regularity condition matched with the content of each the document row in document row set;Judge multiple regularities The first regularity condition in condition terminates with whether the content of each the document row in document row set matches;If multiple The first regularity condition in regularity condition has been matched with the content of each the document row in document row set and terminated, will The second regularity condition in multiple regularity conditions as currently with document row set in each document row content The condition for being matched;And by each in the second regularity condition in multiple regularity conditions and document row set The content of document row is matched.
For example, the first regularity condition is:" defendant ^ [people]([in u4e00- document row sets u9fa5a] [,]) { 0,5 } [men and women]{ 0,50 } $ ";Second regularity condition is:" thinking ^ the courts ", the second regularity condition table Show:To match the row that " thinking the court " starts, by each the document row in the first regularity condition and document row set Appearance is matched.If the content matching of each the document row in the first regularity condition and document row set terminates, by Two regularity conditions are:" thinking ^ the courts " is matched with the content of each the document row in document row set.
Step S203, obtain document row set in multiple regularity condition couplings on document row.
For example, defendant Xu in the 5th participle row set in document row set so-and-so, it is male with the first above-mentioned canonical Rule condition is matched, i.e., defendant Xu so-and-so, the 5th document row in male corresponding document behavior document row set.
Step S204, pair paragraph corresponding with the document row addition in multiple regularity condition couplings is identified.
For example, defendant Xu in the 5th participle row set in document row set so-and-so, it is male with the first above-mentioned canonical Rule condition is matched, and the first regularity condition represents the defendant's party paragraph in matching target decision document, i.e., right The paragraph mark of the 5th document row addition defendant's party paragraph.
The content in the 20th document row in document row set:The court thinks, in the second regularity condition coupling, Second regularity condition represents that paragraph is thought in this case in matching target decision document, i.e., to the 20th in document row set Document row adds the paragraph mark that paragraph is thought in this case.
Step S205, is carried out based on the corresponding paragraph mark of each document row in document row set to target judgement document Segment processing.
Step S205 will not be repeated here with above-mentioned steps S103.
Alternatively, in the segmentation method of the judgement document of the application second embodiment, according to default regularity set Adding corresponding paragraph mark to each the document row in document row set respectively also includes:By in default regularity set Content of multiple regularity conditions gradually with each the document row in document row set is matched;In acquisition document row set Not with multiple regularity condition couplings on document row, obtain at least one non-identification instrument row;In document row set really The corresponding paragraph mark of a upper document row of fixed at least one non-identification instrument row;And by least one non-identification instrument row A upper document row corresponding paragraph mark identified as the paragraph of at least one non-identification instrument row.
For example, the content in the 21st document row in document row set is not all matched with all regularity conditions On, the paragraph mark of the 20th document row in document row set is identified as the paragraph of the 20th a line document row.
The segmentation method of the judgement document that the application second embodiment is provided, is carried out at branch by target judgement document Reason, obtains document row set, wherein, document row set is the multiple documents for carrying out being obtained after branch's treatment to target judgement document The set of row composition;By the multiple regularity conditions in default regularity set gradually with document row set in each text The content of book row is matched;Obtain document row set in multiple regularity condition couplings on document row;And pair with Document row in multiple regularity condition couplings adds corresponding paragraph mark;And based on each text in document row set The corresponding paragraph mark of book row carries out segment processing to target judgement document, and the paragraph for solving judgement document in correlation technique is drawn Point the relatively low problem of accuracy, the corresponding paragraph of each document row in document row set identified to target judgement document Segment processing is carried out, and then has reached the effect of the accuracy that lifting judgement document paragraph is divided.
It should be noted that can be in such as one group computer executable instructions the step of the flow of accompanying drawing is illustrated Performed in computer system, and, although logical order is shown in flow charts, but in some cases, can be with not The order being same as herein performs shown or described step.
The embodiment of the present application additionally provides a kind of sectioning of judgement document, it is necessary to explanation, the embodiment of the present application The sectioning of judgement document can be used for performing the segmentation method for judgement document that is provided of the embodiment of the present application.With Under the sectioning of judgement document that the embodiment of the present application is provided is introduced.
Fig. 3 is the schematic diagram of the sectioning of the judgement document according to the embodiment of the present application.As shown in figure 3, the device bag Include:First processing units 10, adding device 20 and second processing unit 30.
First processing units 10, for carrying out branch's treatment to target judgement document, obtain document row set, wherein, text Book row set is the set that multiple document rows are constituted for carrying out being obtained after branch's treatment to target judgement document.
Adding device 20, for being added to each the document row in document row set respectively according to the set of default regularity Corresponding paragraph mark, wherein, default regularity set is the set of the rule composition counted according to many judgement documents.
Second processing unit 30, for being cut out to target based on the corresponding paragraph mark of each document row in document row set Sentencing document carries out segment processing.
The sectioning of the judgement document that the embodiment of the present application is provided, by first processing units 10 to target judgement document Branch's treatment is carried out, document row set is obtained, wherein, document row set is to be obtained after carrying out branch's treatment to target judgement document Multiple document rows composition set;Adding device 20 is according to default regularity set respectively to each in document row set Document row adds corresponding paragraph mark, wherein, it is the rule counted according to many judgement documents to preset regularity set The set of composition;And the corresponding paragraph of each document row that second processing unit 30 is based in document row set is identified to target Judgement document carries out segment processing, solves the problems, such as that the accuracy that the paragraph of judgement document in correlation technique is divided is relatively low, leads to Cross second processing unit 30 is carried out based on the corresponding paragraph mark of each document row in document row set to target judgement document Segment processing, and then reached the effect of the accuracy that lifting judgement document paragraph is divided.
Alternatively, in the sectioning of the judgement document that the embodiment of the present application is provided, adding device 20 includes:First With module, for by the multiple regularity conditions in default regularity set gradually with document row set in each document Capable content is matched;First acquisition module, for obtain in document row set with multiple regularity condition couplings on Document row;And add module, identified for pair paragraph corresponding with the document row addition in multiple regularity condition couplings.
Alternatively, in the sectioning of the judgement document that the embodiment of the present application is provided, adding device 20 includes:Second obtains Modulus block, for obtain in document row set not with multiple regularity condition couplings on document row, obtain at least one not Identification instrument row;First determining module, the upper text for determining at least one non-identification instrument row in document row set The corresponding paragraph mark of book row;And second determining module, for by a upper document row of at least one non-identification instrument row Corresponding paragraph mark is identified as the paragraph of at least one non-identification instrument row.
Alternatively, in the sectioning of the judgement document that the embodiment of the present application is provided, first processing units 10 include:The Three determining modules, the Format Type for determining target judgement document;4th determining module, for determining target judgement document's The corresponding newline of Format Type;And processing module, for the corresponding newline of Format Type according to target judgement document Branch's treatment is carried out, document row set is obtained.
The sectioning of the judgement document includes processor and memory, above-mentioned first processing units, adding device, the Two processing units etc. are stored in memory as program unit, by computing device storage said procedure in memory Unit realizes corresponding function.
Kernel is included in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can set one Or more, judgement document is accurately segmented by adjusting kernel parameter.
Memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/ Or the form, such as read-only storage (ROM) or flash memory (flash RAM) such as Nonvolatile memory, memory is deposited including at least one Storage chip.
Present invention also provides a kind of computer program product, when being performed on data processing equipment, it is adapted for carrying out just The program code of beginningization there are as below methods step:Branch's treatment is carried out to target judgement document, document row set is obtained, wherein, The document row set is the set that multiple document rows are constituted for carrying out being obtained after branch's treatment to the target judgement document;Root Corresponding paragraph is added according to default regularity set to each the document row in the document row set respectively to identify, wherein, The default regularity set is the set of the rule composition counted according to many judgement documents;And based on the document The corresponding paragraph mark of each document row in row set carries out segment processing to the target judgement document.
It should be noted that for foregoing each method embodiment, in order to be briefly described, therefore it is all expressed as a series of Combination of actions, but those skilled in the art should know, the application is not limited by described sequence of movement because According to the application, some steps can sequentially or simultaneously be carried out using other.Secondly, those skilled in the art should also know Know, embodiment described in this description belongs to preferred embodiment, involved action and module not necessarily the application It is necessary.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion described in detail in certain embodiment Point, may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed device, can be by another way Realize.For example, device embodiment described above is only schematical, such as the division of described unit is only one kind Division of logic function, can there is other dividing mode when actually realizing, such as multiple units or component can combine or can To be integrated into another system, or some features can be ignored, or not perform.
The unit that is illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be according to the actual needs selected to realize the mesh of this embodiment scheme 's.
In addition, during each functional unit in the application each embodiment can be integrated in a processing unit, it is also possible to It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.Above-mentioned integrated list Unit can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
Obviously, those skilled in the art should be understood that each module or each step of above-mentioned the application can be with general Computing device realize that they can be concentrated on single computing device, or be distributed in multiple computing devices and constituted Network on, alternatively, the program code that they can be can perform with computing device be realized, it is thus possible to they are stored Performed by computing device in the storage device, or they be fabricated to each integrated circuit modules respectively, or by they In multiple modules or step single integrated circuit module is fabricated to realize.So, the application is not restricted to any specific Hardware and software is combined.
The preferred embodiment of the application is the foregoing is only, the application is not limited to, for those skilled in the art For member, the application can have various modifications and variations.All any modifications within spirit herein and principle, made, Equivalent, improvement etc., should be included within the protection domain of the application.

Claims (10)

1. a kind of segmentation method of judgement document, it is characterised in that including:
Branch's treatment is carried out to target judgement document, document row set is obtained, wherein, the document row set is to the target Judgement document carries out the set of the multiple document rows composition obtained after branch's treatment;
Corresponding paragraph mark is added to each the document row in the document row set according to default regularity set respectively, Wherein, the default regularity set is the set of the rule composition counted according to many judgement documents;And
The target judgement document is segmented based on each the document row corresponding paragraph mark in the document row set Treatment.
2. method according to claim 1, it is characterised in that according to default regularity set respectively to the document row Each document row in set adds corresponding paragraph mark to be included:
By the multiple regularity conditions in default regularity set gradually with the document row set in each document row Content matched;
Obtain in the document row set with the multiple regularity condition coupling on document row;And
Pair corresponding with the document row addition in the multiple regularity condition coupling paragraph mark.
3. method according to claim 1, it is characterised in that according to default regularity set respectively to the document row Each document row in set adds corresponding paragraph mark to be included:
By the multiple regularity conditions in default regularity set gradually with the document row set in each document row Content matched;
Obtain in the document row set not with the multiple regularity condition coupling on document row, obtain at least one not Identification instrument row;
The corresponding paragraph mark of a upper document row of described at least one non-identification instrument row is determined in the document row set Know;And
The corresponding paragraph of a upper document row of described at least one non-identification instrument row is identified as described at least one not The paragraph mark of identification instrument row.
4. according to the method in claim 2 or 3, it is characterised in that the multiple regularity condition includes the first canonical Rule condition and the second regularity condition, wherein, the first regularity condition is currently carried out with the document row set The condition of matching, the second regularity condition is that it fails to match with the document row set for the first regularity condition In the case of, next condition matched with the document row set in the multiple regularity condition will be preset just Then the multiple regularity conditions in regular collection are gradually carried out with the content of each document row in the document row set With including:
By each the document row in the first regularity condition in the multiple regularity condition and the document row set Content matched;
Judge each document in the first regularity condition in the multiple regularity condition and the document row set Whether capable content matches is terminated;
If each document in the first regularity condition in the multiple regularity condition and the document row set Capable content has been matched and terminated, using the second regularity condition in the multiple regularity condition as currently with the text The condition that the content of each the document row in book row set is matched;And
By each the document row in the second regularity condition in the multiple regularity condition and the document row set Content matched.
5. method according to claim 1, it is characterised in that branch's treatment is carried out to target judgement document, document is obtained Row set includes:
Determine the Format Type of the target judgement document;
Determine the corresponding newline of Format Type of the target judgement document;And
The corresponding newline of Format Type according to the target judgement document carries out branch's treatment, obtains the document row collection Close.
6. method according to claim 1, it is characterised in that based on each the document row correspondence in the document row set Paragraph mark segment processing carried out to the target judgement document include:
Determine each paragraph mark in the corresponding paragraph mark of each document row in the document row set;
Paragraph division is carried out to the document row in the target judgement document based on each paragraph mark;And
The multiple document rows identified with identical paragraph are merged into same paragraph.
7. a kind of sectioning of judgement document, it is characterised in that including:
First processing units, for carrying out branch's treatment to target judgement document, obtain document row set, wherein, the document Row set is the set that multiple document rows are constituted for carrying out being obtained after branch's treatment to the target judgement document;
Adding device, for right to each the document row addition in the document row set respectively according to the set of default regularity The paragraph answered is identified, wherein, the default regularity set is the collection of the rule composition counted according to many judgement documents Close;And
Second processing unit, for being identified to the target based on the corresponding paragraph of each document row in the document row set Judgement document carries out segment processing.
8. device according to claim 7, it is characterised in that the adding device includes:
First matching module, for by the multiple regularity conditions in default regularity set gradually with the document row collection The content of each the document row in conjunction is matched;
First acquisition module, for obtain in the document row set with the multiple regularity condition coupling on document OK;And
Add module, for pair paragraph mark corresponding with the document row addition in the multiple regularity condition coupling.
9. device according to claim 7, it is characterised in that the adding device includes:
Second matching module, for by the multiple regularity conditions in default regularity set gradually with the document row collection The content of each the document row in conjunction is matched;
Second acquisition module, for obtain in the document row set not with the multiple regularity condition coupling on document OK, at least one non-identification instrument row is obtained;
First determining module, the upper text for determining described at least one non-identification instrument row in the document row set The corresponding paragraph mark of book row;And
Second determining module, for the corresponding paragraph mark of a upper document row of described at least one non-identification instrument row to be made For the paragraph of described at least one non-identification instrument row is identified.
10. device according to claim 7, it is characterised in that the first processing units include:
3rd determining module, the Format Type for determining the target judgement document;
4th determining module, the corresponding newline of Format Type for determining the target judgement document;And
Processing module, branch's treatment is carried out for the corresponding newline of Format Type according to the target judgement document, is obtained The document row set.
CN201510867898.7A 2015-12-01 2015-12-01 The segmentation method and device of judgement document Pending CN106815204A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510867898.7A CN106815204A (en) 2015-12-01 2015-12-01 The segmentation method and device of judgement document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510867898.7A CN106815204A (en) 2015-12-01 2015-12-01 The segmentation method and device of judgement document

Publications (1)

Publication Number Publication Date
CN106815204A true CN106815204A (en) 2017-06-09

Family

ID=59108088

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510867898.7A Pending CN106815204A (en) 2015-12-01 2015-12-01 The segmentation method and device of judgement document

Country Status (1)

Country Link
CN (1) CN106815204A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108984518A (en) * 2018-06-11 2018-12-11 人民法院信息技术服务中心 A kind of file classification method towards judgement document
CN109145097A (en) * 2018-06-11 2019-01-04 人民法院信息技术服务中心 A kind of judgement document's classification method based on information extraction
CN110750974A (en) * 2019-09-20 2020-02-04 成都星云律例科技有限责任公司 Structured processing method and system for referee document
CN111104798A (en) * 2018-10-27 2020-05-05 北京智慧正安科技有限公司 Analysis method, system and computer readable storage medium for criminal plot in legal document
CN113239682A (en) * 2021-05-06 2021-08-10 吉林大学 Method and device for correcting errors of referee documents
CN113673255A (en) * 2021-08-25 2021-11-19 北京市律典通科技有限公司 Text function region splitting method and device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246641A (en) * 2013-05-16 2013-08-14 李营 Text semantic information analyzing system and method
CN104462073A (en) * 2014-12-26 2015-03-25 武汉传神信息技术有限公司 Processing method and system for file coordinated translation
CN104714944A (en) * 2015-04-14 2015-06-17 语联网(武汉)信息技术有限公司 Document translation method and document translation system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246641A (en) * 2013-05-16 2013-08-14 李营 Text semantic information analyzing system and method
CN104462073A (en) * 2014-12-26 2015-03-25 武汉传神信息技术有限公司 Processing method and system for file coordinated translation
CN104714944A (en) * 2015-04-14 2015-06-17 语联网(武汉)信息技术有限公司 Document translation method and document translation system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108984518A (en) * 2018-06-11 2018-12-11 人民法院信息技术服务中心 A kind of file classification method towards judgement document
CN109145097A (en) * 2018-06-11 2019-01-04 人民法院信息技术服务中心 A kind of judgement document's classification method based on information extraction
CN111104798A (en) * 2018-10-27 2020-05-05 北京智慧正安科技有限公司 Analysis method, system and computer readable storage medium for criminal plot in legal document
CN111104798B (en) * 2018-10-27 2023-04-21 北京智慧正安科技有限公司 Resolution method, system and computer readable storage medium for sentencing episodes in legal documents
CN110750974A (en) * 2019-09-20 2020-02-04 成都星云律例科技有限责任公司 Structured processing method and system for referee document
CN113239682A (en) * 2021-05-06 2021-08-10 吉林大学 Method and device for correcting errors of referee documents
CN113239682B (en) * 2021-05-06 2022-11-01 吉林大学 Method and device for correcting errors of referee documents
CN113673255A (en) * 2021-08-25 2021-11-19 北京市律典通科技有限公司 Text function region splitting method and device, computer equipment and storage medium
CN113673255B (en) * 2021-08-25 2023-06-30 北京市律典通科技有限公司 Text function area splitting method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106815204A (en) The segmentation method and device of judgement document
CN103299304B (en) Classifying rules generating means and classifying rules generate method
CN107562918A (en) A kind of mathematical problem knowledge point discovery and batch label acquisition method
CN107704625A (en) Fields match method and apparatus
CN106815263A (en) The searching method and device of legal provision
CN108647732A (en) A kind of pathological image sorting technique and device based on deep neural network
CN109165386A (en) A kind of Chinese empty anaphora resolution method and system
CN110209795A (en) Comment on recognition methods, device, computer readable storage medium and computer equipment
CN111950408B (en) Finger vein image recognition method and device based on rule diagram and storage medium
CN105893484A (en) Microblog Spammer recognition method based on text characteristics and behavior characteristics
CN106919542A (en) Method and device for rule matching
CN107330009A (en) Descriptor disaggregated model creation method, creating device and storage medium
CN107222511A (en) Detection method and device, computer installation and the readable storage medium storing program for executing of Malware
CN107506350A (en) A kind of method and apparatus of identification information
CN107506310A (en) A kind of address search, key word storing method and equipment
CN106815205A (en) The segmentation method and device of judgement document
CN105320491A (en) Apparatus and method for efficient division performance
CN110321560A (en) A kind of method, apparatus and electronic equipment determining location information from text information
CN104462322B (en) Character string comparison method and device
CN106557566A (en) A kind of text training method and device
CN112927254A (en) Single word tombstone image binarization method, system, device and storage medium
CN112818693A (en) Automatic extraction method and system for electronic component model words
CN106355247A (en) Method for data processing and device, chip and electronic equipment
CN106127202A (en) The method of character recognition and device in a kind of picture
CN109472289A (en) Critical point detection method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20170609

RJ01 Rejection of invention patent application after publication