WO2017092555A1 - Method and device for parsing amount of money in judgement document - Google Patents

Method and device for parsing amount of money in judgement document Download PDF

Info

Publication number
WO2017092555A1
WO2017092555A1 PCT/CN2016/105272 CN2016105272W WO2017092555A1 WO 2017092555 A1 WO2017092555 A1 WO 2017092555A1 CN 2016105272 W CN2016105272 W CN 2016105272W WO 2017092555 A1 WO2017092555 A1 WO 2017092555A1
Authority
WO
WIPO (PCT)
Prior art keywords
amount
paragraph
judgment
court
clause
Prior art date
Application number
PCT/CN2016/105272
Other languages
French (fr)
Chinese (zh)
Inventor
胡斌
崔维福
Original Assignee
北京国双科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京国双科技有限公司 filed Critical 北京国双科技有限公司
Publication of WO2017092555A1 publication Critical patent/WO2017092555A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Definitions

  • the invention relates to the field of amount analysis, in particular to a method and a device for analyzing an amount in a judgment document.
  • the judgment document is the people's court exercising the state's judicial power. After the trial of the case, based on the determination of the facts of the parties' disputes and the provisions of laws, regulations and relevant judicial interpretations, the litigation procedure of the case and the substantive rights and obligations of the parties are made. Legally binding judicial documents.
  • the present invention has been made in order to provide an amount analysis method and apparatus in a referee document that overcomes the above problems or at least partially solves the above problems, which can save manpower, realize the withdrawal of the amount in the automated judgment document, and improve The correctness of the withdrawal amount.
  • the present invention provides a method for parsing an amount in a judgment document, comprising:
  • the present invention provides an amount resolving device in a referee document, comprising:
  • An extracting unit configured to extract, in accordance with a predetermined rule, an amount in each clause of the appeal paragraph and the judgment paragraph;
  • the first summing unit is configured to sum the amounts extracted by the clauses in the appeal paragraph and the judgment paragraph respectively, and obtain the amount of the plaintiff's appeal and the court's support amount in the judgment document.
  • the method and device for analyzing the amount in the judgment document provided by the present invention firstly obtains the plaintiff's appeal paragraph and the court's judgment paragraph in stages, and then the appeal paragraph and the judgment.
  • the paragraph is divided into clauses, and the amount of money appearing in each clause of the judgment document is unified to facilitate the subsequent calculation of the amount, and the amount in each clause of the appeal paragraph and the judgment paragraph is extracted according to the predetermined rule, further In the process of extracting the amount, the amount of the duplicate is successively eliminated, and the correctness of the withdrawal of the amount is further verified.
  • the amounts parsed by the clauses in the appeal paragraph and the judgment paragraph are summed up, which can be accurately analyzed. The amount in the judgment document.
  • the present invention can save manpower by realizing the unification of the plurality of different expression forms in the judgment document, thereby realizing the extraction of the amount in the automated judgment document and improving The correctness of the withdrawal amount.
  • FIG. 1 is a schematic flowchart of a method for analyzing an amount in a referee document according to an embodiment of the present invention
  • FIG. 2 is a schematic flow chart of an amount analysis method in another referee document provided by an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of an amount resolving device in a referee document according to an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of an amount resolving device in another referee document according to an embodiment of the present invention.
  • the embodiment of the invention provides a schematic diagram of a method for analyzing the amount of money in the judgment document. As shown in FIG. 1 , the method includes:
  • the judgment document is a record of the trial process and the result of the people's court. It is the carrier of the outcome of the lawsuit and the only evidence for the people's court to determine and assign the rights and obligations of the parties.
  • the refereeing document is segmented.
  • the appealing paragraph with the first one being the first and the second one being the final one is selected, and the first one is the original or the modified version.
  • the identifier 2 is a line-feeding symbol; the judgment paragraph with the identifier three as the head and the identifier four as the tail intercepting the court is selected, and the identifier 3 is a variant as follows or the judgment is as follows, and the identifier is the variant of the case or the case.
  • the plaintiff’s petition paragraph records the amount of the claim of the plaintiff, and the amount of court support is recorded in the court support paragraph.
  • the contents of the judgment document can be divided into two parts: the plaintiff and the court.
  • the amount of the two parts is further extracted, and the relevant amount in the two parts is obtained.
  • the appeal paragraph and the judgment paragraph may be divided by a line break, a period or a semicolon.
  • the claim paragraph and the judgment paragraph are firstly segmented according to a newline character. If there is a period in the paragraph obtained by the line break separation, the text in the paragraph is further divided according to the period, if according to the period If there is a semicolon in the separated paragraph, the text in the period will continue to be segmented according to the semicolon until the claim paragraph and the judgment paragraph are divided into multiple clauses.
  • the step is specifically: firstly, according to a predetermined rule, the amount in each clause is sorted to obtain an amount in a preset standard form, wherein the embodiment does not limit the amount of the preset standard form, and may be set according to requirements. For example, if the Arabic numeral can be used as the amount of the standard form, the amount of the final form will be the amount in the form of Arabic numerals, and then the amount of the preset standard form sorted out in each clause will be deduplicated and extracted. After the amount.
  • the amount of the preset standard form obtained in each clause is de-duplicated, and the de-weighted amount is extracted, specifically: the duplicate amount in the amount sorted in each clause is removed, for example
  • the compensation for the victim Zhang San medical expenses is 1,000 yuan
  • the labor insurance fee is 2,000 yuan
  • the total compensation is 3,000 yuan.
  • the first two sums of 1,000 yuan and 2,000 yuan are 3,000 yuan.
  • the amount is the duplicate amount of the third amount, and the first two amounts need to be excluded.
  • the amount in the appeal paragraph extracted in step 103 and the amount in the judgment paragraph are respectively summed, and the amount of the plaintiff's appeal and the court's support amount in the judgment document are obtained, and the corresponding amount record is made.
  • the method for parsing the amount in the judgment document provided by the embodiment of the present invention firstly obtains the appealing paragraph of the plaintiff and the judgment paragraph of the court by segmenting the judgment document, and then segmenting the appeal paragraph and the judgment paragraph according to the clause.
  • the predetermined rule extracts the amount in each clause of the appeal paragraph and the judgment paragraph, and can accurately resolve the amount in the judgment document.
  • the present invention can save manpower by realizing the unification of the plurality of different expression forms in the judgment document, thereby realizing the extraction of the amount in the automated judgment document and improving The correctness of the withdrawal amount.
  • the embodiment of the present invention provides another method for analyzing the amount of money in the judgment document. As shown in FIG. 2, the method includes:
  • the first paragraph is marked with the first one, and the second one is the closing paragraph of the plaintiff.
  • the first one is a variant of the original or the original, and the second is a newline symbol.
  • the identifiers 1 and 2 are the keywords that can identify the paragraph of the claim of the plaintiff.
  • the embodiment of the present invention does not limit the keywords, and the referee can be based on the actual situation.
  • the keyword that shows the plaintiff’s claim is used as the logo.
  • the judgment paragraph headed by the third identifier and the fourth identifier is the court.
  • the identifier 3 is a variant of the judgment as follows or the judgment is as follows.
  • the identifier 4 is a variant of the case or the case.
  • the identifier 3 and the identifier 4 herein are keywords that can identify the court decision paragraph of the court.
  • the embodiment of the present invention does not limit the keyword, and the referee can be expressed according to the actual situation.
  • the keyword of the court's decision is used as the logo.
  • the specific includes:
  • the amount of the Chinese case in the respective clauses is sorted into the amount in the preset standard form.
  • the step may include:
  • the word segmentation technique is used to process the word segmentation to obtain a plurality of words.
  • the amount string is divided into a plurality of amount segmentation words. For example, in the above example, “thousands” and “pick up” are the unit words of the amount, then the amount string "one thousand zeros can be picked up. “Yuan” is divided into two parts: “one thousand” and "zero”.
  • the Arabic values corresponding to each of the amount segmentation words are summed to obtain the amount involved in each of the clauses.
  • clauses may also include unit words of billions, ten thousand, one hundred, ones, and cents, and may be processed according to the process described above, which is not limited in this application.
  • the amount string is divided into a plurality of amount segments according to the unit of the amount of the amount, and further: whether the query amount string contains a plurality of consecutive unit words, and if not, the processing is as follows: When it is, the amount string is divided into a plurality of amount segmentation words according to the last unit word in a plurality of consecutive unit words. At this time, for the amount segmentation word containing a plurality of consecutive amount unit words, the Arabic value is calculated according to the amount value and the amount unit.
  • the method described above is recursively performed in order from left to right according to the continuous amount unit word.
  • the Arabic value corresponding to each unit word is calculated in turn until the Arabic value corresponding to the last unit word is calculated, and the value is used as the final Arabic value of the amount segmentation word.
  • the amount involved in Chinese capitalization can be accurately sorted into the amount represented by the standard Arabic numerals, which satisfies the diversity and accuracy of the amount extraction. Sexual requirements.
  • the clause involving the proportional relationship amount can be identified by keyword recognition, for example, The clause "the damage caused by 3,000 yuan, A should bear 70%", when the keyword "bear” is identified, the amount of proportional relationship is considered in the clause, and then according to the data of 3000 and 70% 2100.
  • the clause involving the amount of the deductible relationship can be identified by means of keyword recognition, such as for the clause "deduction”
  • keyword recognition such as for the clause "deduction”
  • the 1,000 yuan previously paid when the keyword "deduction” is recognized, the clause is considered to involve the deductible relationship amount, and then the data 1000 is sorted into a negative value, that is, -1000.
  • the plurality of amounts sorted in each clause are added from the first amount, and the sum is compared in turn. If the sum of the first two amounts is equal to the third amount, then the former The value of the two amounts is cleared, and the third amount is retained. Similarly, starting from the second amount, the second amount and the third amount are added and compared, and then the analogy is incremented until the amount in each clause Extract it and keep the amount after the extraction.
  • the result of retaining the total value of the amount sorted in each clause is described above, and the single amount before the summation is cleared to zero, so that the parsing obtains a total amount in each clause, so that the amount can be avoided.
  • the repeated calculations also ensure the accuracy of the withdrawal amount.
  • a group of documents in this step refers to a group of referee documents of the same type, and each of the referee documents in the group can obtain the amount of the plaintiff's appeal in each judgment document and the court through the above steps.
  • the amount of support is summed up to get the total amount of the original request and the amount of support from the General Court.
  • the court support ratio is equal to the court support amount divided by the original telling amount
  • the total court support amount and the total original request amount are obtained in step 207, and the total court support amount is divided by the total original request.
  • the amount of the court is supported by the court of the group of judges.
  • the court support ratio of each referee document can also be calculated, and the court support ratios obtained by all the judgment documents of the group are aggregated and averaged, and the average court of the group of judgment documents is obtained. Support the ratio and get another summary indicator.
  • the form of the amount of money appearing in each clause of the judgment document is unified, which facilitates the subsequent calculation of the amount, and further eliminates the repetition by multiple layers in the process of extracting the amount.
  • the amount further verified the correctness of the withdrawal of the amount, and accurately analyzed the amount of the plaintiff’s claim and the amount of support from the court in the judgment document.
  • an embodiment of the present invention provides an amount resolving device in a referee document, and the device embodiment corresponds to the foregoing method embodiment, and the device is not implemented in the foregoing method for reading.
  • the device in this embodiment can implement all the contents in the foregoing method embodiments.
  • the device includes: an obtaining unit 31 and a clause unit 32.
  • the obtaining unit 31 is configured to segment the appeal document by the plaintiff and the judgment paragraph of the court;
  • the clause unit 32 is configured to perform a clause on the appeal paragraph and the judgment paragraph;
  • the extracting unit 33 is configured to extract the amount in each clause of the appeal paragraph and the judgment paragraph according to a predetermined rule
  • the first summation unit 34 is configured to sum up the amounts extracted by the clauses in the appeal paragraph and the judgment paragraph respectively, to obtain the amount of the plaintiff's appeal and the court's support amount in the judgment document. .
  • the method for analyzing the amount in the judgment document provided by the embodiment of the present invention firstly obtains the appeal paragraph of the plaintiff and the judgment paragraph of the court by segmenting the judgment document, and then segmenting the appeal paragraph and the judgment paragraph according to the clause.
  • the predetermined rule extracts the amount in each clause of the appeal paragraph and the judgment paragraph, and can accurately resolve the amount in the judgment document.
  • the present invention can save manpower by realizing the unification of the plurality of different expression forms in the judgment document, thereby realizing the extraction of the amount in the automated judgment document and improving The correctness of the withdrawal amount.
  • an embodiment of the present invention provides an amount resolving device in another judging document, and the device embodiment corresponds to the foregoing method embodiment, and the device is not in the foregoing method for reading.
  • the details in the embodiment are described one by one, but it should be understood that the device in this embodiment can implement all the contents in the foregoing method embodiments.
  • the device further includes: a second summing unit 35. And ratio calculation unit 36.
  • the second summing unit 35 is configured to traverse each of the plurality of referee documents in the set of documents, respectively summing the amount of the plaintiff's appeal and the amount of the court's support in each of the referee documents, and obtaining the total original amount and total amount. Court support amount;
  • the ratio calculation unit 36 is configured to divide the total court support amount by the total original telling amount, and obtain the court support ratio.
  • the acquiring unit 31 includes:
  • the first intercepting module is configured to intercept the plaintiff with the first one of the identifier and the second identifier, and the identifier is a variant of the original or the original, and the identifier is a newline symbol;
  • the second intercepting module is used for the judgment paragraph with the identifier three as the head and the identifier four as the tail intercepting the court.
  • the identifier 3 is a variant of the following judgment or the following judgment, and the identifier is the present case.
  • clause unit 32 includes:
  • the extracting unit 33 includes:
  • a sorting module configured to sort the amount in each clause according to a predetermined rule to obtain an amount in a preset standard form
  • the extraction module is configured to de-weight the amount of the preset standard form collated in each clause, and extract the de-weighted amount.
  • finishing module is specifically configured to:
  • the word segmentation technique is used to perform word segmentation processing on each clause, and a plurality of words are obtained.
  • the clause "the court compensates the plaintiff for a thousand yuan and a whole yuan” is processed by word segmentation, and the following words are obtained: the court, the compensation, The plaintiff, one thousand and zero, the yuan, the whole;
  • the amount string is divided into a plurality of amount segmentation words. For example, in the above example, “thousands” and “pick up” are the unit words of the amount, then the amount string "one thousand zeros can be picked up. “Yuan” is divided into two parts: “one thousand” and "zero”.
  • clauses may also include unit words of billions, ten thousand, one hundred, ones, and cents, and may be processed according to the process described above, which is not limited in this application.
  • the amount string there may also be such a "trillion”, “billion”, “ten million” and so on, which contain the amount of the two consecutive unit words, so based on the above
  • the amount string is divided into a plurality of amount segments according to the unit of the amount of the amount, and further: whether the query amount string contains a plurality of consecutive unit words, and if not, the processing is as follows: When it is, the amount string is divided into a plurality of amount segmentation words according to the last unit word in a plurality of consecutive unit words.
  • the Arabic value is calculated according to the amount value and the amount unit, and further: according to the continuous amount unit words from left to right, recursively using the above description
  • the method calculates the Arabic value corresponding to each unit word in turn until the Arabic value corresponding to the last unit word is calculated, and the value is used as the final Arabic value of the amount segmentation word.
  • the amount involved in Chinese capitalization can be accurately sorted into the amount represented by the standard Arabic numerals, which satisfies the diversity and accuracy of the amount extraction. Sexual requirements.
  • the clause involving the proportional relationship amount can be identified by keyword recognition, for example, The clause "the damage caused by 3,000 yuan, A should bear 70%", when the keyword "bear” is identified, the amount of proportional relationship is considered in the clause, and then according to the data of 3000 and 70% 2100.
  • the amount deducted in the deduction relationship in each of the clauses is arranged as a negative value of the amount set as a preset standard form
  • the clause involving the amount of the deductible relationship can be identified by means of keyword recognition, such as for the clause "deduction”
  • keyword recognition such as for the clause "deduction”
  • the 1,000 yuan previously paid when the keyword "deduction” is recognized, the clause is considered to involve the deductible relationship amount, and then the data 1000 is sorted into a negative value, that is, -1000.
  • the form of the amount of money appearing in each clause of the judgment document is unified, which facilitates the subsequent calculation of the amount, and further eliminates the repeated amount by multiple layers in the process of extracting the amount, further verifying the correctness of the amount extraction. And accurately analyzed the amount of the plaintiff's appeal in the judgment document and the amount of support of the court.
  • the amount resolving device in the referee document includes a processor and a memory, and the above-mentioned obtaining unit 31, the sentence unit 32, the extracting unit 33, the first summing unit 34, and the like are all stored as a program unit in a memory, and are executed by the processor.
  • the above described program elements in the memory implement the corresponding functions.
  • the processor contains a kernel, and the kernel removes the corresponding program unit from the memory.
  • the kernel can be set to one or more.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory (flash RAM), the memory including at least one Memory chip.
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • the present application also provides a computer program product, when executed on a data processing device, is adapted to perform a program code that initializes a method step of: segmenting a referee document to obtain a plaintiff's appeal paragraph and a court judgment paragraph; Subdividing the appeal paragraph and the judgment paragraph; extracting the amount in each clause of the appeal paragraph and the judgment paragraph according to a predetermined rule; respectively parsing the clauses in the appeal paragraph and the judgment paragraph The amount of money is summed up to obtain the amount of the plaintiff’s claim and the amount of support from the court.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.

Abstract

A method and device for analysing the amount of money in a judgement document, which relate to the field of amount parsing. The method comprises: firstly segmenting a judgement document to acquire an appealing paragraph of an accuser and a sentence paragraph of a court (101); then phrasing the appealing paragraph and the sentence paragraph (102); extracting the amount of money in each clause of the appealing paragraph and the sentence paragraph according to a predetermined rule (103); and totalling the amount of money extracted from the various clauses in the appealing paragraph and the sentence paragraph respectively to obtain the appealing amount of money of the accuser and the supporting amount of money of the court in the judgement document (104). The method is mainly used to extract the amount of money in a judgement document.

Description

一种裁判文书中的金额解析方法及装置Method and device for analyzing amount in judgment document
本申请基于申请号为201510867476.X、申请日为2015年12月01日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。The present application is filed on the basis of the Chinese Patent Application No. 201510867476, the entire disclosure of which is hereby incorporated by reference.
技术领域Technical field
本发明涉及金额解析领域,尤其是一种裁判文书中的金额解析方法及装置。The invention relates to the field of amount analysis, in particular to a method and a device for analyzing an amount in a judgment document.
背景技术Background technique
裁判文书是人民法院行使国家审判权,在案件审理终结后,依据对当事人讼争事实的认定和法律、法规及有关司法解释的规定,对案件的诉讼程序问题和当事人的实体权利义务问题作出的具有法律约束力的司法文件。The judgment document is the people's court exercising the state's judicial power. After the trial of the case, based on the determination of the facts of the parties' disputes and the provisions of laws, regulations and relevant judicial interpretations, the litigation procedure of the case and the substantive rights and obligations of the parties are made. Legally binding judicial documents.
由于裁判文书中涉及金额的表述方法有多种多样,例如不同的字符形式包括中文大写、中文数据、阿拉伯数据,又如涉及到的语法结构包括:“对所造成的损害3000元,A应当承担70%”,即A应当承担2100元,因此,上述多种形式的数据表示形式会导致裁判文书中涉及金额的提取难度较大。Because there are various methods for expressing the amount involved in the judgment documents, for example, different character forms include Chinese capitalization, Chinese data, and Arabic data, and the grammatical structure involved includes: “After the damage caused by 3,000 yuan, A should bear 70%", that is, A should bear 2100 yuan. Therefore, the above various forms of data representation will lead to difficulty in extracting the amount involved in the judgment documents.
目前通常使用人工提取的方式进行裁判文书中涉及金额的提取。然而,由于裁判文书中数据量庞大,若全部采用人工提取的方式,工作量太大,耗费时间长,而且容易出现错误提取。At present, manual extraction is usually used to extract the amount involved in the judgment document. However, due to the large amount of data in the judgment documents, if all the methods are manually extracted, the workload is too large, it takes a long time, and it is prone to error extraction.
发明内容Summary of the invention
鉴于上述问题,提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的一种裁判文书中的金额解析方法及装置,能够节省人力,实现自动化裁判文书中的金额提取,并且提高了提取金额的正确度。In view of the above problems, the present invention has been made in order to provide an amount analysis method and apparatus in a referee document that overcomes the above problems or at least partially solves the above problems, which can save manpower, realize the withdrawal of the amount in the automated judgment document, and improve The correctness of the withdrawal amount.
一方面,本发明提供了一种裁判文书中的金额解析方法,包括:In one aspect, the present invention provides a method for parsing an amount in a judgment document, comprising:
对裁判文书进行分段获取原告的诉请段落和法院的判决段落;Subdividing the judgment documents to obtain the plaintiff’s petition paragraph and the court’s judgment paragraph;
对所述诉请段落和判决段落进行分句; Clause the petition paragraph and the judgment paragraph;
按照预定规则对所述诉请段落和判决段落的各个分句中的金额进行提取;Extracting the amounts in the respective clauses of the appeal paragraph and the judgment paragraph in accordance with predetermined rules;
分别将所述诉请段落和判决段落中各分句提取出的金额进行加总,得到所述裁判文书中原告的诉请金额和法院的支持金额。The sums of the clauses in the appeal paragraph and the judgment paragraph are summed up respectively, and the amount of the plaintiff's appeal and the court's support amount in the judgment document are obtained.
另一方面,本发明提供一种裁判文书中的金额解析装置,包括:In another aspect, the present invention provides an amount resolving device in a referee document, comprising:
获取单元,用于对裁判文书进行分段获取原告的诉请段落和法院的判决段落;An acquisition unit for segmenting the judgment document to obtain the plaintiff’s appeal paragraph and the court’s judgment paragraph;
分句单元,用于对所述诉请段落和判决段落进行分句;a clause unit for segmenting the petition paragraph and the judgment paragraph;
提取单元,用于按照预定规则对所述诉请段落和判决段落的各个分句中的金额进行提取;An extracting unit, configured to extract, in accordance with a predetermined rule, an amount in each clause of the appeal paragraph and the judgment paragraph;
第一加总单元,用于分别将所述诉请段落和判决段落中各分句提取出的金额进行加总,得到所述裁判文书中原告的诉请金额和法院的支持金额。The first summing unit is configured to sum the amounts extracted by the clauses in the appeal paragraph and the judgment paragraph respectively, and obtain the amount of the plaintiff's appeal and the court's support amount in the judgment document.
借由上述技术方案,本发明提供的一种裁判文书中的金额解析方法及装置,首先对裁判文书进行分段获取原告的诉请段落和法院的判决段落,然后对所述诉请段落和判决段落进行分句,通过对裁判文书中各分句内出现的金额形式进行统一,方便后续进行金额计算,按照预定规则对所述诉请段落和判决段落的各个分句中的金额进行提取,进一步在金额提取的过程中通过多层次逐次加剔除重复的金额,进一步验证金额提取的正确性,最后分别将所述诉请段落和判决段落中各分句解析出的金额进行加总,能够准确解析出裁判文书中的金额。与现有技术的裁判文书中金额解析方法相比,本发明通过对裁判文书中多种不同表述形式的金额先统一后再进行提取,能够节省人力,实现自动化裁判文书中的金额提取,并且提高了提取金额的正确度。According to the above technical solution, the method and device for analyzing the amount in the judgment document provided by the present invention firstly obtains the plaintiff's appeal paragraph and the court's judgment paragraph in stages, and then the appeal paragraph and the judgment. The paragraph is divided into clauses, and the amount of money appearing in each clause of the judgment document is unified to facilitate the subsequent calculation of the amount, and the amount in each clause of the appeal paragraph and the judgment paragraph is extracted according to the predetermined rule, further In the process of extracting the amount, the amount of the duplicate is successively eliminated, and the correctness of the withdrawal of the amount is further verified. Finally, the amounts parsed by the clauses in the appeal paragraph and the judgment paragraph are summed up, which can be accurately analyzed. The amount in the judgment document. Compared with the amount analysis method in the referee document of the prior art, the present invention can save manpower by realizing the unification of the plurality of different expression forms in the judgment document, thereby realizing the extraction of the amount in the automated judgment document and improving The correctness of the withdrawal amount.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solutions of the present invention, and the above-described and other objects, features and advantages of the present invention can be more clearly understood. Specific embodiments of the invention are set forth below.
附图说明DRAWINGS
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示 相同的部件。在附图中:Various other advantages and benefits will become apparent to those skilled in the art from a The drawings are only for the purpose of illustrating the preferred embodiments and are not to be construed as limiting. Also throughout the drawings, the same reference symbols are used. The same parts. In the drawing:
图1示出了本发明实施例提供的一种裁判文书中的金额解析方法流程示意图;FIG. 1 is a schematic flowchart of a method for analyzing an amount in a referee document according to an embodiment of the present invention;
图2示出了本发明实施例提供的另一种裁判文书中的金额解析方法流程示意图;2 is a schematic flow chart of an amount analysis method in another referee document provided by an embodiment of the present invention;
图3示出了本发明实施例提供的一种裁判文书中的金额解析装置结构示意图;FIG. 3 is a schematic structural diagram of an amount resolving device in a referee document according to an embodiment of the present invention;
图4示出了本发明实施例提供的另一种裁判文书中的金额解析装置结构示意图。FIG. 4 is a schematic structural diagram of an amount resolving device in another referee document according to an embodiment of the present invention.
具体实施方式detailed description
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the embodiments of the present invention have been shown in the drawings, the embodiments Rather, these embodiments are provided so that this disclosure will be more fully understood and the scope of the disclosure will be fully disclosed.
本发明实施例提供了一种裁判文书中的金额解析方法流程示意图,如图1所示,所述方法包括:The embodiment of the invention provides a schematic diagram of a method for analyzing the amount of money in the judgment document. As shown in FIG. 1 , the method includes:
101、对裁判文书进行分段获取原告的诉请段落和法院的判决段落。101. Segment the judgment documents to obtain the plaintiff’s petition paragraph and the court’s judgment paragraph.
其中,裁判文书是记载人民法院审理过程和结果,它是诉讼结果的载体,也是人民法院确定和分配当事人实体权利义务的唯一凭证。Among them, the judgment document is a record of the trial process and the result of the people's court. It is the carrier of the outcome of the lawsuit and the only evidence for the people's court to determine and assign the rights and obligations of the parties.
本步骤中对裁判文书进行分段处理,本实施例中选取以标识一为首、标识二为尾截取原告的诉请段落,所述标识一为原告诉请或者原告诉请的变体,所述标识二为换行符号;选取以标识三为首、标识四为尾截取法院的判决段落,所述标识三为判决如下或者判决如下的变体,所述标识四为本案或者本案的变体。其中,所述原告的诉请段落中记录有关于原告的诉请金额,所述法院支持段落中记录有关于法院支持的金额。In this step, the refereeing document is segmented. In this embodiment, the appealing paragraph with the first one being the first and the second one being the final one is selected, and the first one is the original or the modified version. The identifier 2 is a line-feeding symbol; the judgment paragraph with the identifier three as the head and the identifier four as the tail intercepting the court is selected, and the identifier 3 is a variant as follows or the judgment is as follows, and the identifier is the variant of the case or the case. Wherein, the plaintiff’s petition paragraph records the amount of the claim of the plaintiff, and the amount of court support is recorded in the court support paragraph.
通过上述的分段过程能够得到裁判文书中的内容分为原告方和法院端的两部分内容,进一步对这两部分内容进行金额提取,获取这两部分中的相关金额。Through the above-mentioned segmentation process, the contents of the judgment document can be divided into two parts: the plaintiff and the court. The amount of the two parts is further extracted, and the relevant amount in the two parts is obtained.
102、对所述诉请段落和判决段落进行分句。 102. Clause the claim paragraph and the judgment paragraph.
本步骤可以依次以换行符、句号或分号对所述诉请段落和判决段落进行分句。In this step, the appeal paragraph and the judgment paragraph may be divided by a line break, a period or a semicolon.
具体地,首先根据换行符对所述诉请段落和判决段落进行分句,若根据换行符分隔得到的段落中存在句号,则进一步根据句号对所述段落内的文字进行分句,若根据句号分隔得到的段落中存在分号,则根据分号继续对所述句号内的文字进行分句,直至将所述诉请段落和判决段落分为多个分句为止。Specifically, the claim paragraph and the judgment paragraph are firstly segmented according to a newline character. If there is a period in the paragraph obtained by the line break separation, the text in the paragraph is further divided according to the period, if according to the period If there is a semicolon in the separated paragraph, the text in the period will continue to be segmented according to the semicolon until the claim paragraph and the judgment paragraph are divided into multiple clauses.
103、按照预定规则对所述诉请段落和判决段落的各个分句中的金额进行提取。103. Extract the amount in each clause of the appeal paragraph and the judgment paragraph according to a predetermined rule.
本步骤具体为:首先按照预定规则对各个分句中的金额进行整理得到预设标准形式的金额,其中,本实施例对预设标准形式的金额不做限制,具体可根据需求自行设定,例如可将阿拉伯数字作为预设标准形式的金额,则最终得到的金额形式即为阿拉伯数字形式的金额,然后将各个分句中整理得到的预设标准形式的金额进行去重处理,提取去重后的金额。The step is specifically: firstly, according to a predetermined rule, the amount in each clause is sorted to obtain an amount in a preset standard form, wherein the embodiment does not limit the amount of the preset standard form, and may be set according to requirements. For example, if the Arabic numeral can be used as the amount of the standard form, the amount of the final form will be the amount in the form of Arabic numerals, and then the amount of the preset standard form sorted out in each clause will be deduplicated and extracted. After the amount.
其中,所述将各个分句中整理得到的预设标准形式的金额进行去重处理,提取去重后的金额,具体为:对各个分句中整理出的金额中重复的金额进行剔除,例如对于分句中“应赔偿受害人张三医药费1000元,护工费2000元,总计赔偿3000元”,此时由于前两个金额1000元和2000元的加和为3000元,此时前两个金额为第三个金额的重复金额,需要将前两个金额剔除。Wherein, the amount of the preset standard form obtained in each clause is de-duplicated, and the de-weighted amount is extracted, specifically: the duplicate amount in the amount sorted in each clause is removed, for example In the clause, "the compensation for the victim Zhang San medical expenses is 1,000 yuan, the labor insurance fee is 2,000 yuan, and the total compensation is 3,000 yuan." At this time, the first two sums of 1,000 yuan and 2,000 yuan are 3,000 yuan. The amount is the duplicate amount of the third amount, and the first two amounts need to be excluded.
104、分别将所述诉请段落和判决段落中各分句提取出的金额进行加总,得到该所述裁判文书中原告的诉请金额和法院的支持金额。104. The amounts extracted from the clauses in the appeal paragraph and the judgment paragraph are respectively summed, and the amount of the plaintiff's appeal and the court's support amount in the judgment document are obtained.
其中,分别将步骤103中提取出的诉请段落中的金额和判决段落中的金额进行加和,得到所述裁判文书中原告的诉请金额和法院的支持金额,并且做相应的金额记录。Wherein, the amount in the appeal paragraph extracted in step 103 and the amount in the judgment paragraph are respectively summed, and the amount of the plaintiff's appeal and the court's support amount in the judgment document are obtained, and the corresponding amount record is made.
本发明实施例提供的一种裁判文书中的金额解析方法,首先对裁判文书进行分段获取原告的诉请段落和法院的判决段落,然后对所述诉请段落和判决段落进行分句,按照预定规则对所述诉请段落和判决段落的各个分句中的金额进行提取,能够准确解析出裁判文书中的金额。与现有技术的裁判文书中金额解析方法相比,本发明通过对裁判文书中多种不同表述形式的金额先统一后再进行提取,能够节省人力,实现自动化裁判文书中的金额提取,并且提高了提取金额的正确度。 The method for parsing the amount in the judgment document provided by the embodiment of the present invention firstly obtains the appealing paragraph of the plaintiff and the judgment paragraph of the court by segmenting the judgment document, and then segmenting the appeal paragraph and the judgment paragraph according to the clause. The predetermined rule extracts the amount in each clause of the appeal paragraph and the judgment paragraph, and can accurately resolve the amount in the judgment document. Compared with the amount analysis method in the referee document of the prior art, the present invention can save manpower by realizing the unification of the plurality of different expression forms in the judgment document, thereby realizing the extraction of the amount in the automated judgment document and improving The correctness of the withdrawal amount.
本发明实施例提供了另一种裁判文书中的金额解析方法,如图2所示,所述方法包括:The embodiment of the present invention provides another method for analyzing the amount of money in the judgment document. As shown in FIG. 2, the method includes:
201、以标识一为首、标识二为尾截取原告的诉请段落,所述标识一为原告诉请或者原告诉请的变体,所述标识二为换行符号。201. The first paragraph is marked with the first one, and the second one is the closing paragraph of the plaintiff. The first one is a variant of the original or the original, and the second is a newline symbol.
需要说明的是,这里的标识一与标识二为能够识别出该段落为原告方的诉讼请求段落的关键字,本发明实施例对上述关键字不做限定,具体可根据实际情况中裁判文书能够表现出原告诉讼请求的关键字作为标识。It should be noted that the identifiers 1 and 2 are the keywords that can identify the paragraph of the claim of the plaintiff. The embodiment of the present invention does not limit the keywords, and the referee can be based on the actual situation. The keyword that shows the plaintiff’s claim is used as the logo.
202、以标识三为首、标识四为尾截取法院的判决段落,所述标识三为判决如下或者判决如下的变体,所述标识四为本案或者本案的变体。202. The judgment paragraph headed by the third identifier and the fourth identifier is the court. The identifier 3 is a variant of the judgment as follows or the judgment is as follows. The identifier 4 is a variant of the case or the case.
同理,这里的标识三与标识四为能够识别出该段落为法院方的法院判决段落的关键字,本发明实施例对上述关键字不做限定,具体可根据实际情况中裁判文书能够表现出法院判决情况的关键字作为标识。Similarly, the identifier 3 and the identifier 4 herein are keywords that can identify the court decision paragraph of the court. The embodiment of the present invention does not limit the keyword, and the referee can be expressed according to the actual situation. The keyword of the court's decision is used as the logo.
203、依次以换行符、句号或分号对所述诉请段落和判决段落进行分句。203. Terminating the appeal paragraph and the judgment paragraph by a line break, a period or a semicolon.
204、按照预定规则对各个分句中的金额进行整理得到预设标准形式的金额。204. Organize the amount in each clause according to a predetermined rule to obtain an amount in a preset standard form.
本步骤中,具体包括:In this step, the specific includes:
1)将所述各个分句中涉及中文大小写的金额整理为预设标准形式的金额。优选地,该步骤可以包括:1) The amount of the Chinese case in the respective clauses is sorted into the amount in the preset standard form. Preferably, the step may include:
利用分词技术对各个分句进行分词处理,得到多个词语,例如,将分句“被告赔偿原告一千零伍拾元整”进行分词处理,得到如下多个词语:被告、赔偿、原告、一千零伍拾、元、整;The word segmentation technique is used to process the word segmentation to obtain a plurality of words. For example, the clause "the defendant compensates the plaintiff for a thousand yuan and a whole yuan" for word segmentation, and obtains the following words: defendant, compensation, plaintiff, one Thousands of zeros, yuan, and whole;
将所述多个词语中涉及中文大小写金额的词语组合为金额字符串,如在上面的例子中,“一千零伍拾”为表示一个数值的词,“元”表示一个量词,则将“一千零伍拾元”作为组合后的金额字符串;Combining words of the plurality of words involving Chinese capitalization amount into an amount string, as in the above example, "one thousand zero pick" is a word representing a numerical value, and "yuan" is a quantifier, "One thousand and zero yuan" as the combined amount of the string;
按照金额单位词将所述金额字符串切分为多个金额切分词,如在上面的例子中,“千”和“拾”为金额单位词,则可以将金额字符串“一千零伍拾元”切分为“一千”、“零伍拾”两个金额切分词;According to the amount unit word, the amount string is divided into a plurality of amount segmentation words. For example, in the above example, "thousands" and "pick up" are the unit words of the amount, then the amount string "one thousand zeros can be picked up. "Yuan" is divided into two parts: "one thousand" and "zero".
根据每一个金额切分词的金额数值和金额单位计算出每一个金额切分词对应的阿拉伯数值,如在上面的例子中,金额切分词“一千”对应的阿拉伯数值为1*1000=1000,金额切分词“零伍拾”对应的阿拉伯数值为5*10=50; Calculate the Arabic value corresponding to each amount of the segmentation word according to the amount value and the amount unit of each amount of the segmentation word. For example, in the above example, the Arabic value corresponding to the amount segmentation word "one thousand" is 1*1000=1000, the amount The Arabic value corresponding to the segmentation word "zero 伍" is 5*10=50;
对所述每一个金额切分词对应的阿拉伯数值求和,得到所述各个分句中涉及的金额,如在上面的例子中,最终得到的金额为1000+50=1050元。The Arabic values corresponding to each of the amount segmentation words are summed to obtain the amount involved in each of the clauses. For example, in the above example, the final amount is 1000+50=1050 yuan.
在本实施例中,分句中还可以包括亿、万、百、角和分等金额单位词,则可以按照上面描述的过程进行处理,本申请不做限定。In this embodiment, the clauses may also include unit words of billions, ten thousand, one hundred, ones, and cents, and may be processed according to the process described above, which is not limited in this application.
此外,还需要说明的是,在金额字符串中,还可能会出现“万亿”、“亿亿”“千万”等这种包含连续两个金额单位词的金额切分词,因此,基于上面描述的过程,所述按照金额单位词将所述金额字符串切分为多个金额切分词进一步为:查询金额字符串中是否包含连续多个金额单位词,当否时,则按照如上的步骤处理;当是时,则按照连续多个金额单位词中的最后一个单位词将所述金额字符串切分为多个金额切分词。此时,对于包含多个连续金额单位词的金额切分词,根据其金额数值和金额单位计算其阿拉伯数值进一步为:按照所述连续金额单位词从左到右的顺序,递归使用如上描述的方法依次计算每一个金额单位词对应的阿拉伯数值,直到计算出最后一个金额单位词所对应的阿拉伯数值,并将该值作为该金额切分词最终的阿拉伯数值。In addition, it should be noted that in the amount string, there may also be such a "trillion", "billion", "ten million" and so on, which contain the amount of the two consecutive unit words, so based on the above In the process of describing, the amount string is divided into a plurality of amount segments according to the unit of the amount of the amount, and further: whether the query amount string contains a plurality of consecutive unit words, and if not, the processing is as follows: When it is, the amount string is divided into a plurality of amount segmentation words according to the last unit word in a plurality of consecutive unit words. At this time, for the amount segmentation word containing a plurality of consecutive amount unit words, the Arabic value is calculated according to the amount value and the amount unit. Further, the method described above is recursively performed in order from left to right according to the continuous amount unit word. The Arabic value corresponding to each unit word is calculated in turn until the Arabic value corresponding to the last unit word is calculated, and the value is used as the final Arabic value of the amount segmentation word.
通过如上步骤可知,对于各种不同的金额表述形式,在本实施例中,都可以准确地将涉及中文大小写的金额整理为标准的阿拉伯数字表示的金额,满足了对金额提取多样性和准确性的要求。As can be seen from the above steps, for various different expression forms, in this embodiment, the amount involved in Chinese capitalization can be accurately sorted into the amount represented by the standard Arabic numerals, which satisfies the diversity and accuracy of the amount extraction. Sexual requirements.
2)对所述各个分句中涉及的比例关系的金额整理为按比例分配后预设标准形式的金额。2) The amount of the proportional relationship involved in each of the clauses is organized into the amount of the standard form after the proportional allocation.
例如,对于分句中的“所造成的损害3000元,A应承担70%”,即整理为2100元,优选地,可以通过关键词识别的方式来识别涉及比例关系金额的分句,如对于分句“所造成的损害3000元,A应承担70%”,当识别出关键词“承担”时,则认为该分句中涉及比例关系的金额,继而根据3000和70%两个数据整理为2100。For example, for the clause “in the sentence, the damage caused by 3,000 yuan, A should bear 70%”, that is, it is organized into 2100 yuan. Preferably, the clause involving the proportional relationship amount can be identified by keyword recognition, for example, The clause "the damage caused by 3,000 yuan, A should bear 70%", when the keyword "bear" is identified, the amount of proportional relationship is considered in the clause, and then according to the data of 3000 and 70% 2100.
3)对所述各个分句中涉及抵扣关系中扣除的金额整理为设为预设标准形式的金额的负值。3) The amount deducted from the deduction relationship in each of the clauses is arranged as a negative value of the amount set as a preset standard form.
例如,对于分句中的“扣除先前垫付的1000元”,即整理为-1000元,优选地,可以通过关键词识别的方式来识别涉及抵扣关系金额的分句,如对于分句“扣除先前垫付的1000元”,当识别出关键词“扣除”时,则认为该分句中涉及抵扣关系金额,继而将数据1000整理为负值,即-1000。 For example, for "deducting the previously advanced 1000 yuan" in the clause, that is, it is organized as -1000 yuan, preferably, the clause involving the amount of the deductible relationship can be identified by means of keyword recognition, such as for the clause "deduction" The 1,000 yuan previously paid", when the keyword "deduction" is recognized, the clause is considered to involve the deductible relationship amount, and then the data 1000 is sorted into a negative value, that is, -1000.
需要说明的是,由于裁判文书中全角括号内的内容为对前述内容的进一步补充和说明,在进行金额提取之前需将各个分句中全角括号及其内容进行剔除,以免对金额重复计算,影响结果的准确性。It should be noted that since the content in the full-width brackets in the judgment document is a further supplement and explanation of the foregoing content, the full-width brackets and their contents in each clause should be removed before the amount is extracted, so as to avoid double counting and affecting the amount. The accuracy of the results.
205、将各个分句中整理得到的预设标准形式的金额进行去重处理,提取去重后的金额。205. Perform deduplication processing on the amount of the preset standard form obtained in each clause, and extract the amount after deduplication.
通过步骤204对金额整理后,对于各个分句中整理出的多个金额自第一个金额起始,依次进行加和比较,若前两个金额的加和等于第三个金额,则将前两个金额的数值清零,保留第三个金额,同理从第二个金额起始,将第二个金额和第三个金额加和比较,依次列类推,直至将各个分句中的金额提取出来,保留提取后的金额。After the amount is sorted by step 204, the plurality of amounts sorted in each clause are added from the first amount, and the sum is compared in turn. If the sum of the first two amounts is equal to the third amount, then the former The value of the two amounts is cleared, and the third amount is retained. Similarly, starting from the second amount, the second amount and the third amount are added and compared, and then the analogy is incremented until the amount in each clause Extract it and keep the amount after the extraction.
本实施例通过上述将各分句中整理出的金额保留加总值的结果,同时将加总之前的单个金额清零,使得解析得到每个分句中保留一个总的金额,这样能够避免金额的重复计算,同时也保证了提取金额的准确性。In this embodiment, the result of retaining the total value of the amount sorted in each clause is described above, and the single amount before the summation is cleared to zero, so that the parsing obtains a total amount in each clause, so that the amount can be avoided. The repeated calculations also ensure the accuracy of the withdrawal amount.
206、分别将所述诉请段落和判决段落中各分句提取出的金额进行加总,得到该所述裁判文书中原告的诉请金额和法院的支持金额。206. The amounts extracted by the clauses in the appeal paragraph and the judgment paragraph are respectively summed, and the amount of the plaintiff's appeal and the court's support amount in the judgment document are obtained.
207、遍历一组文书中的每个裁判文书,分别将每个裁判文书中原告的诉请金额和法院的支持金额加总,得到总原告诉请金额和总法院支持金额。207. Traverse each of the referee documents in a set of documents, and sum the amount of the plaintiff's appeal and the amount of the court's support in each of the judges' documents, and obtain the total amount of the original notice and the amount supported by the court.
需要说明的是,本步骤中的一组文书指的是相同类型的一组裁判文书,将该组中每个裁判文书通过上述步骤能够求出每个裁判文书中原告的诉请金额和法院的支持金额,分别加总得到总原告诉请金额和总法院支持金额。It should be noted that a group of documents in this step refers to a group of referee documents of the same type, and each of the referee documents in the group can obtain the amount of the plaintiff's appeal in each judgment document and the court through the above steps. The amount of support is summed up to get the total amount of the original request and the amount of support from the General Court.
208、将所述总法院支持金额除以所述总原告诉请金额,得到法院支持比例。208. Divide the total court support amount by the total original amount, and obtain the court support ratio.
其中,所述法院支持比例等于法院支持金额除以原告诉请金额,本实施例中通过步骤207中得到总法院支持金额和总原告诉请金额,并将总法院支持金额除以总原告诉请金额得到该组裁判文书的法院支持比,Wherein, the court support ratio is equal to the court support amount divided by the original telling amount, in this embodiment, the total court support amount and the total original request amount are obtained in step 207, and the total court support amount is divided by the total original request. The amount of the court is supported by the court of the group of judges.
另外,需要说明的是,本实施例也可通过计算出每个裁判文书的法院支持比例,并将该组所有裁判文书求出的法院支持比例加总求平均,得到该组裁判文书的平均法院支持比例,得到另外一个汇总指标。In addition, it should be noted that, in this embodiment, the court support ratio of each referee document can also be calculated, and the court support ratios obtained by all the judgment documents of the group are aggregated and averaged, and the average court of the group of judgment documents is obtained. Support the ratio and get another summary indicator.
本实施例通过对裁判文书中各分句内出现的金额形式进行统一,方便后续进行金额计算,进一步在金额提取的过程中通过多层次逐次加剔除重复的 金额,进一步验证金额提取的正确性,并且准确解析了判决文书中原告的诉请金额和法院的支持金额。In this embodiment, the form of the amount of money appearing in each clause of the judgment document is unified, which facilitates the subsequent calculation of the amount, and further eliminates the repetition by multiple layers in the process of extracting the amount. The amount further verified the correctness of the withdrawal of the amount, and accurately analyzed the amount of the plaintiff’s claim and the amount of support from the court in the judgment document.
进一步地,作为图1所示方法的具体实现,本发明实施例提供一种裁判文书中的金额解析装置,该装置实施例与前述方法实施例对应,为便于阅读,本装置不在对前述方法实施例中的细节内容进行逐一赘述,但应当明确,本实施例中的装置能够对应实现前述方法实施例中的全部内容,如图3所示,所述装置包括:获取单元31、分句单元32、提取单元33、第一加总单元34。Further, as a specific implementation of the method shown in FIG. 1 , an embodiment of the present invention provides an amount resolving device in a referee document, and the device embodiment corresponds to the foregoing method embodiment, and the device is not implemented in the foregoing method for reading. The details in the example are described one by one, but it should be understood that the device in this embodiment can implement all the contents in the foregoing method embodiments. As shown in FIG. 3, the device includes: an obtaining unit 31 and a clause unit 32. The extracting unit 33 and the first summing unit 34.
所述获取单元31,用于对裁判文书进行分段获取原告的诉请段落和法院的判决段落;The obtaining unit 31 is configured to segment the appeal document by the plaintiff and the judgment paragraph of the court;
所述分句单元32,用于对所述诉请段落和判决段落进行分句;The clause unit 32 is configured to perform a clause on the appeal paragraph and the judgment paragraph;
所述提取单元33,用于按照预定规则对所述诉请段落和判决段落的各个分句中的金额进行提取;The extracting unit 33 is configured to extract the amount in each clause of the appeal paragraph and the judgment paragraph according to a predetermined rule;
所述第一加总单元34,用于分别将所述诉请段落和判决段落中各分句提取出的金额进行加总,得到该所述裁判文书中原告的诉请金额和法院的支持金额。The first summation unit 34 is configured to sum up the amounts extracted by the clauses in the appeal paragraph and the judgment paragraph respectively, to obtain the amount of the plaintiff's appeal and the court's support amount in the judgment document. .
本发明实施例提供的一种裁判文书中的金额解析装置,首先对裁判文书进行分段获取原告的诉请段落和法院的判决段落,然后对所述诉请段落和判决段落进行分句,按照预定规则对所述诉请段落和判决段落的各个分句中的金额进行提取,能够准确解析出裁判文书中的金额。与现有技术的裁判文书中金额解析方法相比,本发明通过对裁判文书中多种不同表述形式的金额先统一后再进行提取,能够节省人力,实现自动化裁判文书中的金额提取,并且提高了提取金额的正确度。The method for analyzing the amount in the judgment document provided by the embodiment of the present invention firstly obtains the appeal paragraph of the plaintiff and the judgment paragraph of the court by segmenting the judgment document, and then segmenting the appeal paragraph and the judgment paragraph according to the clause. The predetermined rule extracts the amount in each clause of the appeal paragraph and the judgment paragraph, and can accurately resolve the amount in the judgment document. Compared with the amount analysis method in the referee document of the prior art, the present invention can save manpower by realizing the unification of the plurality of different expression forms in the judgment document, thereby realizing the extraction of the amount in the automated judgment document and improving The correctness of the withdrawal amount.
进一步地,作为图2所示方法的具体实现,本发明实施例提供另一种裁判文书中的金额解析装置,该装置实施例与前述方法实施例对应,为便于阅读,本装置不在对前述方法实施例中的细节内容进行逐一赘述,但应当明确,本实施例中的装置能够对应实现前述方法实施例中的全部内容,如图4所示,所述装置还包括:第二加总单元35和比例计算单元36。Further, as a specific implementation of the method shown in FIG. 2, an embodiment of the present invention provides an amount resolving device in another judging document, and the device embodiment corresponds to the foregoing method embodiment, and the device is not in the foregoing method for reading. The details in the embodiment are described one by one, but it should be understood that the device in this embodiment can implement all the contents in the foregoing method embodiments. As shown in FIG. 4, the device further includes: a second summing unit 35. And ratio calculation unit 36.
所述第二加总单元35,用于遍历一组文书中的每个裁判文书,分别将每个裁判文书中原告的诉请金额和法院的支持金额加总,得到总原告诉请金额和总法院支持金额; The second summing unit 35 is configured to traverse each of the plurality of referee documents in the set of documents, respectively summing the amount of the plaintiff's appeal and the amount of the court's support in each of the referee documents, and obtaining the total original amount and total amount. Court support amount;
所述比例计算单元36,用于将所述总法院支持金额除以所述总原告诉请金额,得到法院支持比例。The ratio calculation unit 36 is configured to divide the total court support amount by the total original telling amount, and obtain the court support ratio.
进一步地,所述获取单元31,包括:Further, the acquiring unit 31 includes:
第一截取模块,用于以标识一为首、标识二为尾截取原告的诉请段落,所述标识一为原告诉请或者原告诉请的变体,所述标识二为换行符号;The first intercepting module is configured to intercept the plaintiff with the first one of the identifier and the second identifier, and the identifier is a variant of the original or the original, and the identifier is a newline symbol;
第二截取模块,用于以标识三为首、标识四为尾截取法院的判决段落,所述标识三为判决如下或者判决如下的变体,所述标识四为本案。The second intercepting module is used for the judgment paragraph with the identifier three as the head and the identifier four as the tail intercepting the court. The identifier 3 is a variant of the following judgment or the following judgment, and the identifier is the present case.
进一步地,所述分句单元32,包括:Further, the clause unit 32 includes:
分句模块,用于依次以换行符、句号或分号对所述诉请段落和判决段落进行分句。A clause module for subscribing the appeal paragraph and the judgment paragraph by a line break, a period or a semicolon.
进一步地,所述提取单元33,包括:Further, the extracting unit 33 includes:
整理模块,用于按照预定规则对各个分句中的金额进行整理得到预设标准形式的金额;a sorting module, configured to sort the amount in each clause according to a predetermined rule to obtain an amount in a preset standard form;
提取模块,用于将各个分句中整理得到的预设标准形式的金额进行去重处理,提取去重后的金额。The extraction module is configured to de-weight the amount of the preset standard form collated in each clause, and extract the de-weighted amount.
进一步地,所述整理模块,具体用于:Further, the finishing module is specifically configured to:
将所述各个分句中涉及中文大小写的金额整理为预设标准形式的金额;Arranging the amount of Chinese capitalization in each of the clauses into an amount in a preset standard form;
优选地,利用分词技术对各个分句进行分词处理,得到多个词语,例如,将分句“被告赔偿原告一千零伍拾元整”进行分词处理,得到如下多个词语:被告、赔偿、原告、一千零伍拾、元、整;Preferably, the word segmentation technique is used to perform word segmentation processing on each clause, and a plurality of words are obtained. For example, the clause "the defendant compensates the plaintiff for a thousand yuan and a whole yuan" is processed by word segmentation, and the following words are obtained: the defendant, the compensation, The plaintiff, one thousand and zero, the yuan, the whole;
将所述多个词语中涉及中文大小写金额的词语组合为金额字符串,如在上面的例子中,“一千零伍拾”为表示一个数值的词,“元”表示一个量词,则将“一千零伍拾元”作为组合后的金额字符串;Combining words of the plurality of words involving Chinese capitalization amount into an amount string, as in the above example, "one thousand zero pick" is a word representing a numerical value, and "yuan" is a quantifier, "One thousand and zero yuan" as the combined amount of the string;
按照金额单位词将所述金额字符串切分为多个金额切分词,如在上面的例子中,“千”和“拾”为金额单位词,则可以将金额字符串“一千零伍拾元”切分为“一千”、“零伍拾”两个金额切分词;According to the amount unit word, the amount string is divided into a plurality of amount segmentation words. For example, in the above example, "thousands" and "pick up" are the unit words of the amount, then the amount string "one thousand zeros can be picked up. "Yuan" is divided into two parts: "one thousand" and "zero".
根据每一个金额切分词的金额数值和金额单位计算出每一个金额切分词对应的阿拉伯数值,如在上面的例子中,金额切分词“一千”对应的阿拉伯数值为1*1000=1000,金额切分词“零伍拾”对应的阿拉伯数值为5*10=50;Calculate the Arabic value corresponding to each amount of the segmentation word according to the amount value and the amount unit of each amount of the segmentation word. For example, in the above example, the Arabic value corresponding to the amount segmentation word "one thousand" is 1*1000=1000, the amount The Arabic value corresponding to the segmentation word "zero 伍" is 5*10=50;
对所述每一个金额切分词对应的阿拉伯数值求和,得到所述各个分句中 涉及的金额,如在上面的例子中,最终得到的金额为1000+50=1050元。Sumifying the Arabic values corresponding to each of the amount of segmentation words, and obtaining the respective clauses The amount involved, as in the above example, the final amount is 1000+50=1050 yuan.
在本实施例中,分句中还可以包括亿、万、百、角和分等金额单位词,则可以按照上面描述的过程进行处理,本申请不做限定。In this embodiment, the clauses may also include unit words of billions, ten thousand, one hundred, ones, and cents, and may be processed according to the process described above, which is not limited in this application.
此外,还需要说明的是,在金额字符串中,还可能会出现“万亿”、“亿亿”“千万”等这种包含连续两个金额单位词的金额切分词,因此,基于上面描述的过程,所述按照金额单位词将所述金额字符串切分为多个金额切分词进一步为:查询金额字符串中是否包含连续多个金额单位词,当否时,则按照如上的步骤处理;当是时,则按照连续多个金额单位词中的最后一个单位词将所述金额字符串切分为多个金额切分词。此时,对于包含多个连续金额单位词的金额切分词,根据其金额数值和金额单位计算其阿拉伯数值,进一步为:按照所述连续金额单位词从左到右的顺序,递归使用如上描述的方法依次计算每一个金额单位词对应的阿拉伯数值,直到计算出最后一个金额单位词所对应的阿拉伯数值,并将该值作为该金额切分词最终的阿拉伯数值。In addition, it should be noted that in the amount string, there may also be such a "trillion", "billion", "ten million" and so on, which contain the amount of the two consecutive unit words, so based on the above In the process of describing, the amount string is divided into a plurality of amount segments according to the unit of the amount of the amount, and further: whether the query amount string contains a plurality of consecutive unit words, and if not, the processing is as follows: When it is, the amount string is divided into a plurality of amount segmentation words according to the last unit word in a plurality of consecutive unit words. At this time, for the amount segmentation word containing a plurality of consecutive amount unit words, the Arabic value is calculated according to the amount value and the amount unit, and further: according to the continuous amount unit words from left to right, recursively using the above description The method calculates the Arabic value corresponding to each unit word in turn until the Arabic value corresponding to the last unit word is calculated, and the value is used as the final Arabic value of the amount segmentation word.
通过如上步骤可知,对于各种不同的金额表述形式,在本实施例中,都可以准确地将涉及中文大小写的金额整理为标准的阿拉伯数字表示的金额,满足了对金额提取多样性和准确性的要求。As can be seen from the above steps, for various different expression forms, in this embodiment, the amount involved in Chinese capitalization can be accurately sorted into the amount represented by the standard Arabic numerals, which satisfies the diversity and accuracy of the amount extraction. Sexual requirements.
对所述各个分句中涉及的比例关系的金额整理为按比例分配后预设标准形式的金额;The amount of the proportional relationship involved in each of the clauses is organized into the amount of the standard form after the proportional allocation;
例如,对于分句中的“所造成的损害3000元,A应承担70%”,即整理为2100元,优选地,可以通过关键词识别的方式来识别涉及比例关系金额的分句,如对于分句“所造成的损害3000元,A应承担70%”,当识别出关键词“承担”时,则认为该分句中涉及比例关系的金额,继而根据3000和70%两个数据整理为2100。For example, for the clause “in the sentence, the damage caused by 3,000 yuan, A should bear 70%”, that is, it is organized into 2100 yuan. Preferably, the clause involving the proportional relationship amount can be identified by keyword recognition, for example, The clause "the damage caused by 3,000 yuan, A should bear 70%", when the keyword "bear" is identified, the amount of proportional relationship is considered in the clause, and then according to the data of 3000 and 70% 2100.
对所述各个分句中涉及抵扣关系中扣除的金额整理为设为预设标准形式的金额的负值;The amount deducted in the deduction relationship in each of the clauses is arranged as a negative value of the amount set as a preset standard form;
例如,对于分句中的“扣除先前垫付的1000元”,即整理为-1000元,优选地,可以通过关键词识别的方式来识别涉及抵扣关系金额的分句,如对于分句“扣除先前垫付的1000元”,当识别出关键词“扣除”时,则认为该分句中涉及抵扣关系金额,继而将数据1000整理为负值,即-1000。For example, for "deducting the previously advanced 1000 yuan" in the clause, that is, it is organized as -1000 yuan, preferably, the clause involving the amount of the deductible relationship can be identified by means of keyword recognition, such as for the clause "deduction" The 1,000 yuan previously paid", when the keyword "deduction" is recognized, the clause is considered to involve the deductible relationship amount, and then the data 1000 is sorted into a negative value, that is, -1000.
另外,需要说明的是,由于裁判文书中全角括号内的内容为对前述内容 的进一步补充和说明,在进行金额提取之前需将各个分句中全角括号及其内容进行剔除,以免对金额重复计算,影响结果的准确性。In addition, it should be noted that the content in the full-width brackets in the judgment document is for the above content. To further supplement and explain, the full-width brackets and their contents in each clause should be removed before the amount is extracted, so as to avoid double counting the amount and affect the accuracy of the results.
本实施例通过对裁判文书中各分句内出现的金额形式进行统一,方便后续进行金额计算,进一步在金额提取的过程中通过多层次逐次加剔除重复的金额,进一步验证金额提取的正确性,并且准确解析了判决文书中原告的诉请金额和法院的支持金额。In this embodiment, the form of the amount of money appearing in each clause of the judgment document is unified, which facilitates the subsequent calculation of the amount, and further eliminates the repeated amount by multiple layers in the process of extracting the amount, further verifying the correctness of the amount extraction. And accurately analyzed the amount of the plaintiff's appeal in the judgment document and the amount of support of the court.
所述裁判文书中的金额解析装置包括处理器和存储器,上述获取单元31、分句单元32、提取单元33和第一加总单元34等均作为程序单元存储在存储器中,由处理器执行存储在存储器中的上述程序单元来实现相应的功能。The amount resolving device in the referee document includes a processor and a memory, and the above-mentioned obtaining unit 31, the sentence unit 32, the extracting unit 33, the first summing unit 34, and the like are all stored as a program unit in a memory, and are executed by the processor. The above described program elements in the memory implement the corresponding functions.
处理器中包含内核,由内核去存储器中调取相应的程序单元。内核可以设置一个或以上,通过调整内核参数来节省人力,实现自动化裁判文书中的金额提取,并且提高了提取金额的正确度。The processor contains a kernel, and the kernel removes the corresponding program unit from the memory. The kernel can be set to one or more. By adjusting the kernel parameters to save manpower, the amount of money in the automated refereeing document is extracted, and the correctness of the withdrawal amount is improved.
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM),存储器包括至少一个存储芯片。The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory (flash RAM), the memory including at least one Memory chip.
本申请还提供了一种计算机程序产品,当在数据处理设备上执行时,适于执行初始化有如下方法步骤的程序代码:对裁判文书进行分段获取原告的诉请段落和法院的判决段落;对所述诉请段落和判决段落进行分句;按照预定规则对所述诉请段落和判决段落的各个分句中的金额进行提取;分别将所述诉请段落和判决段落中各分句解析出的金额进行加总,得到该所述裁判文书中原告的诉请金额和法院的支持金额。The present application also provides a computer program product, when executed on a data processing device, is adapted to perform a program code that initializes a method step of: segmenting a referee document to obtain a plaintiff's appeal paragraph and a court judgment paragraph; Subdividing the appeal paragraph and the judgment paragraph; extracting the amount in each clause of the appeal paragraph and the judgment paragraph according to a predetermined rule; respectively parsing the clauses in the appeal paragraph and the judgment paragraph The amount of money is summed up to obtain the amount of the plaintiff’s claim and the amount of support from the court.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Thus, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in combination of software and hardware. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算 机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to general purpose computers, dedicated computing a processor, an embedded processor, or a processor of another programmable data processing device to produce a machine such that instructions executed by a processor of a computer or other programmable data processing device are generated for implementing a process or processes in a flowchart And/or a block diagram of a device in a box or a plurality of functions specified in a plurality of blocks.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。存储器是计算机可读介质的示例。The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
以上仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所 作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。 The above is only an embodiment of the present application and is not intended to limit the application. Various changes and modifications can be made to the present application by those skilled in the art. Where within the spirit and principles of this application Any modifications, equivalent substitutions, improvements, etc., are intended to be included within the scope of the appended claims.

Claims (12)

  1. 一种裁判文书中的金额解析方法,其特征在于,包括:A method for parsing an amount in a judgment document, characterized in that it comprises:
    对裁判文书进行分段获取原告的诉请段落和法院的判决段落;Subdividing the judgment documents to obtain the plaintiff’s petition paragraph and the court’s judgment paragraph;
    对所述诉请段落和判决段落进行分句;Clause the petition paragraph and the judgment paragraph;
    按照预定规则对所述诉请段落和判决段落的各个分句中的金额进行提取;Extracting the amounts in the respective clauses of the appeal paragraph and the judgment paragraph in accordance with predetermined rules;
    分别将所述诉请段落和判决段落中各分句提取出的金额进行加总,得到所述裁判文书中原告的诉请金额和法院的支持金额。The sums of the clauses in the appeal paragraph and the judgment paragraph are summed up respectively, and the amount of the plaintiff's appeal and the court's support amount in the judgment document are obtained.
  2. 根据权利要求1所述的方法,其特征在于,所述对裁判文书进行分段获取原告的诉请段落和法院的判决段落,包括:The method according to claim 1, wherein said stepping on the judgment document obtains the plaintiff's appeal paragraph and the court's judgment paragraph, including:
    以标识一为首、标识二为尾截取原告的诉请段落,所述标识一为原告诉请或者原告诉请的变体,所述标识二为换行符号;The claim paragraph headed by the first one and the second one is the last paragraph of the appeal, and the identifier 1 is a variant of the original or the original request, and the second identifier is a newline symbol;
    以标识三为首、标识四为尾截取法院的判决段落,所述标识三为判决如下或者判决如下的变体,所述标识四为本案或者本案的变体。The judgment paragraph headed by the third sign and the fourth sign is the court. The mark 3 is a variant of the following judgment or judgment, and the logo 4 is a variant of the case or the case.
  3. 根据权利要求1所述的方法,其特征在于,所述对所述诉请段落和判决段落进行分句,包括:The method of claim 1 wherein said stepping said claim paragraph and the decision paragraph comprises:
    依次以换行符、句号或分号对所述诉请段落和判决段落进行分句。The claim paragraph and the judgment paragraph are then divided by a line break, a period or a semicolon.
  4. 根据权利要求1所述的方法,其特征在于,所述按照预定规则对所述诉请段落和判决段落的各个分句中的金额进行提取,包括:The method according to claim 1, wherein said extracting the amount in each clause of said appeal paragraph and the judgment paragraph according to a predetermined rule comprises:
    按照预定规则对各个分句中的金额进行整理得到预设标准形式的金额;The amount in each clause is sorted according to a predetermined rule to obtain an amount in a preset standard form;
    将各个分句中整理得到的预设标准形式的金额进行去重处理,提取去重后的金额。The amount of the preset standard form sorted out in each clause is de-duplicated, and the de-weighted amount is extracted.
  5. 根据权利要求4所述的方法,其特征在于,所述按照预定规则对各个分句中的金额进行整理得到预设标准形式的金额,包括:The method according to claim 4, wherein the sorting the amount in each clause according to a predetermined rule to obtain an amount in a preset standard form comprises:
    将所述各个分句中涉及中文大小写的金额整理为预设标准形式的金额;Arranging the amount of Chinese capitalization in each of the clauses into an amount in a preset standard form;
    对所述各个分句中涉及的比例关系的金额整理为按比例分配后预设标准形 式的金额;The amount of the proportional relationship involved in each of the clauses is arranged as a pre-set standard form Amount of money;
    对所述各个分句中涉及抵扣关系中扣除的金额整理为预设标准形式的金额的负值。The negative amount of the amount deducted from the deduction relationship in the respective clauses into the preset standard form.
  6. 根据权利要求5所述的方法,其特征在于,所述将所述各个分句中涉及中文大小写的金额整理为预设标准形式的金额,包括:The method according to claim 5, wherein the summing the amount of the Chinese case in the respective clauses into an amount in a preset standard form comprises:
    对各个分句进行分词处理,得到多个词语;Perform word segmentation on each clause to obtain multiple words;
    将所述多个词语中涉及中文大小写金额的词语组合为金额字符串;Combining words of the plurality of words involving Chinese capitalization amount into an amount string;
    按照金额单位词将所述金额字符串切分为多个金额切分词;Dividing the amount string into a plurality of amount segmentation words according to the unit word of the amount;
    根据每一个金额切分词的金额数值和金额单位计算出每一个金额切分词对应的阿拉伯数值;Calculate the Arabic value corresponding to each amount of the segmentation word according to the amount value and the amount unit of each amount of the segmentation word;
    对所述每一个金额切分词对应的阿拉伯数值求和,得到所述各个分句中涉及的金额。The Arabic values corresponding to each of the amount segmentation words are summed to obtain the amount involved in each of the clauses.
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,还包括:The method according to any one of claims 1 to 6, further comprising:
    遍历一组文书中的每个裁判文书,分别将每个裁判文书中原告的诉请金额和法院的支持金额加总,得到总原告诉请金额和总法院支持金额;Traversing each of the adjudication documents in a set of documents, summing up the amount of the plaintiff’s claim in each adjudication document and the amount of support from the court, and obtaining the total amount of the original request and the amount supported by the General Court;
    将所述总法院支持金额除以所述总原告诉请金额,得到法院支持比例。The total court support amount is divided by the total original telling amount, and the court supports the proportion.
  8. 一种裁判文书中的金额解析装置,其特征在于,包括:An amount resolving device in a refereeing document, comprising:
    获取单元,用于对裁判文书进行分段获取原告的诉请段落和法院的判决段落;An acquisition unit for segmenting the judgment document to obtain the plaintiff’s appeal paragraph and the court’s judgment paragraph;
    分句单元,用于对所述诉请段落和判决段落进行分句;a clause unit for segmenting the petition paragraph and the judgment paragraph;
    提取单元,用于按照预定规则对所述诉请段落和判决段落的各个分句中的金额进行提取;An extracting unit, configured to extract, in accordance with a predetermined rule, an amount in each clause of the appeal paragraph and the judgment paragraph;
    第一加总单元,用于分别将所述诉请段落和判决段落中各分句提取出的金额进行加总,得到所述裁判文书中原告的诉请金额和法院的支持金额。The first summing unit is configured to sum the amounts extracted by the clauses in the appeal paragraph and the judgment paragraph respectively, and obtain the amount of the plaintiff's appeal and the court's support amount in the judgment document.
  9. 根据权利要求8所述的装置,其特征在于,所述获取单元,包括: The device according to claim 8, wherein the obtaining unit comprises:
    第一截取模块,用于以标识一为首、标识二为尾截取原告的诉请段落,所述标识一为原告诉请或者原告诉请的变体,所述标识二为换行符号;The first intercepting module is configured to intercept the plaintiff with the first one of the identifier and the second identifier, and the identifier is a variant of the original or the original, and the identifier is a newline symbol;
    第二截取模块,用于以标识三为首、标识四为尾截取法院的判决段落,所述标识三为判决如下或者判决如下的变体,所述标识四为本案。The second intercepting module is used for the judgment paragraph with the identifier three as the head and the identifier four as the tail intercepting the court. The identifier 3 is a variant of the following judgment or the following judgment, and the identifier is the present case.
  10. 根据权利要求8所述的装置,其特征在于,所述提取单元,包括:The device according to claim 8, wherein the extracting unit comprises:
    整理模块,用于按照预定规则对各个分句中的金额进行整理得到预设标准形式的金额;a sorting module, configured to sort the amount in each clause according to a predetermined rule to obtain an amount in a preset standard form;
    提取模块,用于将各个分句中整理得到的预设标准形式的金额进行去重处理,提取去重后的金额。The extraction module is configured to de-weight the amount of the preset standard form collated in each clause, and extract the de-weighted amount.
  11. 根据权利要求10所述的装置,其特征在于,所述整理模块具体用于:The apparatus according to claim 10, wherein the sorting module is specifically configured to:
    将所述各个分句中涉及中文大小写的金额整理为预设标准形式的金额;Arranging the amount of Chinese capitalization in each of the clauses into an amount in a preset standard form;
    对所述各个分句中涉及的比例关系的金额整理为按比例分配后预设标准形式的金额;The amount of the proportional relationship involved in each of the clauses is organized into the amount of the standard form after the proportional allocation;
    对所述各个分句中涉及抵扣关系中扣除的金额整理为预设标准形式的金额的负值。The negative amount of the amount deducted from the deduction relationship in the respective clauses into the preset standard form.
  12. 根据权利要求8至11中任一项所述的装置,其特征在于,还包括:The apparatus according to any one of claims 8 to 11, further comprising:
    第二加总单元,用于遍历一组文书中的每个裁判文书,分别将每个裁判文书中原告的诉请金额和法院的支持金额加总,得到总原告诉请金额和总法院支持金额;The second summing unit is configured to traverse each of the plurality of referee documents in the set of documents, and summon the amount of the plaintiff's appeal and the amount of the court's support in each of the referee documents, and obtain the total original amount and the total court support amount. ;
    比例计算单元,用于将所述总法院支持金额除以所述总原告诉请金额,得到法院支持比例。 The ratio calculation unit is configured to divide the total court support amount by the total original telling amount, and obtain the court support ratio.
PCT/CN2016/105272 2015-12-01 2016-11-10 Method and device for parsing amount of money in judgement document WO2017092555A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510867476.X 2015-12-01
CN201510867476.XA CN106815203B (en) 2015-12-01 2015-12-01 Method and device for analyzing amount of money in referee document

Publications (1)

Publication Number Publication Date
WO2017092555A1 true WO2017092555A1 (en) 2017-06-08

Family

ID=58796238

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/105272 WO2017092555A1 (en) 2015-12-01 2016-11-10 Method and device for parsing amount of money in judgement document

Country Status (2)

Country Link
CN (1) CN106815203B (en)
WO (1) WO2017092555A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046345A (en) * 2019-03-12 2019-07-23 同盾控股有限公司 A kind of data extraction method and device
CN110765889A (en) * 2019-09-29 2020-02-07 平安直通咨询有限公司上海分公司 Legal document feature extraction method, related device and storage medium
CN111144095A (en) * 2019-11-26 2020-05-12 方正璞华软件(武汉)股份有限公司 Method and device for generating work damage case sanction book
CN111507095A (en) * 2019-01-29 2020-08-07 阿里巴巴集团控股有限公司 Method and device for generating referee document, storage medium and processor
CN111798344A (en) * 2020-07-01 2020-10-20 北京金堤科技有限公司 Method and device for determining subject name, electronic equipment and storage medium
CN112307726A (en) * 2020-11-09 2021-02-02 浙江大学 Automatic court opinion generation method guided by causal deviation removal model
CN112632941A (en) * 2019-09-23 2021-04-09 北京国双科技有限公司 Method, device, equipment and storage medium for generating PDF format public security document
CN113010684A (en) * 2020-12-31 2021-06-22 北京法意科技有限公司 Construction method and system of civil complaint and judgment map
CN113343661A (en) * 2021-06-28 2021-09-03 福建师范大学 Automatic generation method and device for criminal reduction and parole document
CN114239561A (en) * 2021-12-10 2022-03-25 北京天眼查科技有限公司 Supply relation obtaining method and device, storage medium and electronic equipment

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197099A (en) * 2017-12-01 2018-06-22 厦门快商通信息技术有限公司 A kind of text message extracting method and computer readable storage medium
CN108287818A (en) * 2018-01-03 2018-07-17 小草数语(北京)科技有限公司 The extracting method of the amount of money, device and electronic equipment in judgement document
CN108984500B (en) * 2018-06-19 2022-04-29 平安科技(深圳)有限公司 Method for extracting amount information, terminal device and medium
CN110633458A (en) * 2018-06-25 2019-12-31 阿里巴巴集团控股有限公司 Method and device for generating referee document
CN109446511B (en) * 2018-09-10 2022-07-08 平安科技(深圳)有限公司 Referee document processing method, referee document processing device, computer equipment and storage medium
CN110378784A (en) * 2019-07-24 2019-10-25 中国工商银行股份有限公司 Amount of money input method and device
CN110851591A (en) * 2019-09-17 2020-02-28 河北省讯飞人工智能研究院 Judgment document quality evaluation method, device, equipment and storage medium
CN112541344A (en) * 2019-09-23 2021-03-23 北京国双科技有限公司 Method and device for determining target paragraph, storage medium and equipment
CN111008523A (en) * 2019-11-21 2020-04-14 中科鼎富(北京)科技发展有限公司 Information extraction method and device and server
CN111177332B (en) * 2019-11-27 2023-11-24 中证信用增进股份有限公司 Method and device for automatically extracting judge document case-related label and judge result
CN112651853A (en) * 2020-11-17 2021-04-13 四川大学 Judgment and opinion mining method and system based on referee document

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120084228A1 (en) * 2003-10-15 2012-04-05 Rao Srinivasan N System and method for processing partially unstructured data
CN103077164A (en) * 2012-12-27 2013-05-01 新浪网技术(中国)有限公司 Text analysis method and text analyzer

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101312559A (en) * 2007-05-23 2008-11-26 乐金电子(中国)研究开发中心有限公司 Consumer short message management method based on mobile communication terminal and mobile communication terminal thereof
CN102682109B (en) * 2012-05-09 2014-07-16 北京彼速信息技术有限公司 Patent information analysis method and device
CN102866990B (en) * 2012-08-20 2016-08-03 北京搜狗信息服务有限公司 A kind of theme dialogue method and device
CN103778200B (en) * 2014-01-09 2017-08-08 中国科学院计算技术研究所 A kind of message information source abstracting method and its system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120084228A1 (en) * 2003-10-15 2012-04-05 Rao Srinivasan N System and method for processing partially unstructured data
CN103077164A (en) * 2012-12-27 2013-05-01 新浪网技术(中国)有限公司 Text analysis method and text analyzer

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GAO, XIAOYUN ET AL.: "Chinese Time Words and Numerals Automatic Segmentation Method Based on Rules", NEW TECHNOLOGY OF LIBRARY AND INFORMATION SERVICE, vol. 3, 25 March 2007 (2007-03-25), pages 46 - 50 *
WU, FEI.: "Study on Extraction Method of Value Information", ELECTRONIC TECHNOLOGY & INFORMATION SCIENCE , CHINA MASTER'S THESES FULL-TEXT DATABASE, 15 March 2011 (2011-03-15), pages 1138 - 1541, ISSN: 1674-0246 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507095A (en) * 2019-01-29 2020-08-07 阿里巴巴集团控股有限公司 Method and device for generating referee document, storage medium and processor
CN111507095B (en) * 2019-01-29 2023-05-02 阿里云计算有限公司 Method and device for generating referee document, storage medium and processor
CN110046345A (en) * 2019-03-12 2019-07-23 同盾控股有限公司 A kind of data extraction method and device
CN112632941A (en) * 2019-09-23 2021-04-09 北京国双科技有限公司 Method, device, equipment and storage medium for generating PDF format public security document
CN110765889A (en) * 2019-09-29 2020-02-07 平安直通咨询有限公司上海分公司 Legal document feature extraction method, related device and storage medium
CN111144095A (en) * 2019-11-26 2020-05-12 方正璞华软件(武汉)股份有限公司 Method and device for generating work damage case sanction book
CN111144095B (en) * 2019-11-26 2024-04-05 方正璞华软件(武汉)股份有限公司 Method and device for generating work case judgment
CN111798344A (en) * 2020-07-01 2020-10-20 北京金堤科技有限公司 Method and device for determining subject name, electronic equipment and storage medium
CN111798344B (en) * 2020-07-01 2023-09-22 北京金堤科技有限公司 Principal name determining method and apparatus, electronic device, and storage medium
CN112307726A (en) * 2020-11-09 2021-02-02 浙江大学 Automatic court opinion generation method guided by causal deviation removal model
CN112307726B (en) * 2020-11-09 2023-08-04 浙江大学 Automatic court view generation method guided by causal deviation removal model
CN113010684A (en) * 2020-12-31 2021-06-22 北京法意科技有限公司 Construction method and system of civil complaint and judgment map
CN113010684B (en) * 2020-12-31 2024-02-09 北京法意科技有限公司 Construction method and system of civil complaint judging map
CN113343661A (en) * 2021-06-28 2021-09-03 福建师范大学 Automatic generation method and device for criminal reduction and parole document
CN114239561A (en) * 2021-12-10 2022-03-25 北京天眼查科技有限公司 Supply relation obtaining method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN106815203B (en) 2021-03-30
CN106815203A (en) 2017-06-09

Similar Documents

Publication Publication Date Title
WO2017092555A1 (en) Method and device for parsing amount of money in judgement document
CN111291570B (en) Method and device for realizing element identification in judicial documents
WO2017167067A1 (en) Method and device for webpage text classification, method and device for webpage text recognition
WO2019237540A1 (en) Method and device for acquiring financial data, terminal device, and medium
WO2020052184A1 (en) Judgment document processing method and apparatus, computer device and storage medium
TWI689825B (en) Method and device for obtaining document quality index
JP5534280B2 (en) Text clustering apparatus, text clustering method, and program
WO2017092337A1 (en) Comment tag extraction method and apparatus
WO2019080402A1 (en) Text information extraction method for structured text, storage medium and server
WO2019242124A1 (en) Sum of money information extraction method and apparatus, and terminal device and medium
CN107644010A (en) A kind of Text similarity computing method and device
CN110321466B (en) Securities information duplicate checking method and system based on semantic analysis
CN110738039B (en) Case auxiliary information prompting method and device, storage medium and server
CN109684476B (en) Text classification method, text classification device and terminal equipment
WO2019028990A1 (en) Code element naming method, device, electronic equipment and medium
CN109471933A (en) A kind of generation method of text snippet, storage medium and server
CN111291177A (en) Information processing method and device and computer storage medium
CN106815201A (en) A kind of method and device of automatic judgement judgement document court verdict
CN110110325B (en) Repeated case searching method and device and computer readable storage medium
CN112329460A (en) Text topic clustering method, device, equipment and storage medium
CN106598997B (en) Method and device for calculating text theme attribution degree
CN109753646B (en) Article attribute identification method and electronic equipment
CN104615728B (en) A kind of webpage context extraction method and device
CN105786929B (en) A kind of information monitoring method and device
CN106970919B (en) Method and device for discovering new word group

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16869865

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16869865

Country of ref document: EP

Kind code of ref document: A1