CN106844554A - A kind of contract classification automatic identifying method and system - Google Patents

A kind of contract classification automatic identifying method and system Download PDF

Info

Publication number
CN106844554A
CN106844554A CN201611265396.8A CN201611265396A CN106844554A CN 106844554 A CN106844554 A CN 106844554A CN 201611265396 A CN201611265396 A CN 201611265396A CN 106844554 A CN106844554 A CN 106844554A
Authority
CN
China
Prior art keywords
contract
keyword
classification
weight
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611265396.8A
Other languages
Chinese (zh)
Inventor
许林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xuansba Technology Tianjin Co ltd
Original Assignee
Xuansba Technology Tianjin Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xuansba Technology Tianjin Co ltd filed Critical Xuansba Technology Tianjin Co ltd
Priority to CN201611265396.8A priority Critical patent/CN106844554A/en
Publication of CN106844554A publication Critical patent/CN106844554A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Tourism & Hospitality (AREA)
  • Databases & Information Systems (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Technology Law (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a kind of contract classification automatic identifying method and system, method includes, using the deep learning algorithm for having manual intervention, the keyword of contract intrinsic propesties can not being reflected in exclusion sample;Learnt by great amount of samples, the weight to keyword is configured;The difference of different contract classifications is found out, contract type is judged.This method realizes automatic identification and the classification to Chinese contract using Artificial intellectual technology, and discrimination is very high, has reached the requirement of practical application, solves the problems, such as to lean on user voluntarily to select contract classification to cause legal risk prompting inaccurate at present.

Description

A kind of contract classification automatic identifying method and system
Technical field
The invention belongs to Informatization Service technical field, more particularly, to a kind of contract classification automatic identifying method and it is System.
Background technology
In market economy, contract plays more and more important role in company's day-to-day operations, how in day-to-day operations In prevent contract risk and be particularly important.Contract Risk includes the managing risk of contract and the legal risk of contract, contract Managing risk be management function from contract, and legal risk is gone out from the completeness and validity of contract text Hair, furthers investigate the design tactics of various clauses.With developing rapidly for artificial intelligence technology, abroad, using computer skill The research of the artificial intelligence contract law risk automatic identification of art has been achieved for initial achievements;At home, due to the language of Chinese Adopted automatic identification with English have huge difference, using artificial intelligence technology the research of contract law risk identification also in Step section.
Because kinds of contract is various, the legal risk of variety classes contract is also different.Therefore, the contract classification of discrimination high Automatic identification technology turns into realizes the crucial problem that contract law risk is pointed out automatically using computer technology.
The shortcoming of prior art:
1st, during the country is for the research of contract law risk automatic identification at present, employ by user oneself selection contract point Class, then provides targetedly indicating risk by computer again.Because kinds of contract is various, the boundary mould between contract is of all categories Paste, general nonlegal professional is difficult accurately to define the contract of oneself cutting class really.Therefore, because user's selection classification is wrong By mistake, cause contract law indicating risk very inaccurate, it is difficult to reach the requirement of practical application.
2nd, because the language construction of Chinese is different from English, at present also cannot direct foreign successful experience.
The content of the invention
In view of this, the present invention is directed to propose a kind of contract classification automatic identifying method, solves and exist in the prior art Use Computer Automatic Recognition contract classification when, discrimination problem not high.
To reach above-mentioned purpose, the technical proposal of the invention is realized in this way:
A kind of contract classification automatic identifying method, comprises the following steps:
(1) using the deep learning algorithm for having manual intervention, contract intrinsic propesties keyword can not be reflected in exclusion sample;
(2) learnt by great amount of samples, the weight to keyword is configured;
(3) difference of different contract classifications is found out, contract type is judged.
Further, the step (1) specifically includes following steps:
(1) by internet hunt to all kinds of contract samples, current sample number amount is 1000 parts;
(2) using the reverse segmentation methods of maximum based on dictionary, every class contract frequency of occurrence highest keyword is found out, is pressed According to the language rule of Chinese, the 3-6 word of Chinese character is screened, remove the keyword of Adjective, retained within 30 per class Keyword.
Further, the step (2) specifically includes following steps:
(1) different keywords are set with different weights, the keyword for can substantially determine contract type give compared with Weight high;Keyword for can not substantially determine contract classification gives relatively low weight;
(2) keyword being given in the manner described above and setting weight from 1-50, each increment 10 differentiates 1000 parts of samples successively Whether the classification of contract is correct, calculates corresponding discrimination, takes wherein discrimination highest as optimal weights.
Further, the step (3) specifically include to feature keyword assign weight ratio it is higher, for area Divide kinds of contract relatively.
Different weights are assigned using the characteristic key words to variety classes contract, and combines AOI logical operation, reached Discrimination high.Specific implementation is establishment automatic identification procedure, and intelligent optimization is carried out to weighted value, finds out optimal weights, and Scope to keyword weight is drafted according to the professional knowledge of professional.
Relative to prior art, a kind of contract classification automatic identifying method of the present invention has the advantage that:We Method realizes automatic identification and the classification to Chinese contract using Artificial intellectual technology, and discrimination is very high, reaches The requirement of practical application, solves the problems, such as to lean on user voluntarily to select contract classification to cause legal risk prompting inaccurate at present.
Another object of the present invention is to propose a kind of contract classification automatic recognition system, solve and exist in the prior art Use Computer Automatic Recognition contract classification when, discrimination problem not high.
To reach above-mentioned purpose, the technical proposal of the invention is realized in this way:
A kind of contract classification automatic recognition system, including
For using the deep learning algorithm for having manual intervention, contract intrinsic propesties keyword can not to be reflected in exclusion sample Extraction module;
For being learnt by great amount of samples, the setup module being configured to the weight of keyword;
Difference for finding out different contract classifications, judges the discrimination module of contract type.
Further, the extraction module includes
For by internet hunt to all kinds of contract samples, current sample number amount to be 1000 parts of search module;
For using the reverse segmentation methods of maximum based on dictionary, finding out every class contract frequency of occurrence highest keyword, According to the language rule of Chinese, the 3-6 word of Chinese character is screened, remove the keyword of Adjective, retained within 30 per class Keyword screening module.
Further, the setup module includes
For different keywords to be set with different weights, the keyword for can substantially determine contract type give compared with Weight high;Keyword for can not substantially determine contract classification gives the different weight setting modules of relatively low weight;
Weight is set from 1-50 for giving keyword in the manner described above, and each increment 10 differentiates 1000 parts of samples successively Whether the classification of this contract is correct, calculates corresponding discrimination, takes wherein discrimination highest as the optimal of optimal weights Weight setting module.
A kind of contract classification automatic recognition system of the present invention has with a kind of above-mentioned contract classification automatic identifying method There is identical beneficial effect, will not be repeated here.
Brief description of the drawings
The accompanying drawing for constituting a part of the invention is used for providing a further understanding of the present invention, schematic reality of the invention Apply example and its illustrate, for explaining the present invention, not constitute inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is a kind of flow chart of the contract classification automatic identifying method described in the embodiment of the present invention.
Specific embodiment
It should be noted that in the case where not conflicting, the embodiment in the present invention and the feature in embodiment can phases Mutually combination.
Describe the present invention in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
As shown in figure 1, a kind of contract classification automatic identifying method specifically includes following steps:
(1) by internet hunt to all kinds of contract samples, current sample number amount is 1000 parts;
(2) using the reverse segmentation methods of maximum based on dictionary, every class contract frequency of occurrence highest keyword is found out, is pressed According to the language rule of Chinese, the 3-6 word of Chinese character is screened, remove the keyword of Adjective, retained within 30 per class Keyword.
(3) different weights are set to different keywords.Keyword for can substantially determine contract type give compared with Weight high, such as " Contracts Cargo Deal concerning foreign affairs " or " outlet deal contract ";For can not substantially determine contract classification Keyword gives relatively low weight, for example " importer ", " exported country ", " FOB " etc..
(4) keyword being given in the manner described above and setting weight from 1-50, each increment 10 differentiates 1000 parts of samples successively Whether the classification of contract is correct, calculates corresponding discrimination, takes wherein discrimination highest as optimal weights.
(5) because same keyword can all may occur in different contract classifications.For example:Keyword " loan contract " " contract of guaranty ", can all occur in loaning bill class contract and in guarantee class contract, and the two keywords are in this two classes contract The frequency of appearance is all very high, therefore contract classification cannot be made a distinction by the weight of the two words, but " borrowing rate ", The frequency of occurrence in loan contract such as " life of loan " keyword is higher, therefore these have the weight of the keyword imparting of feature Compare high.The weight of the characteristic key words of variety classes contract can so be heightened, can effectively distinguish two relatively Kinds of contract.
Presently preferred embodiments of the present invention is the foregoing is only, is not intended to limit the invention, it is all in essence of the invention Within god and principle, any modification, equivalent substitution and improvements made etc. should be included within the scope of the present invention.

Claims (7)

1. a kind of contract classification automatic identifying method, it is characterised in that:Comprise the following steps:
(1) using the deep learning algorithm for having manual intervention, contract intrinsic propesties keyword can not be reflected in exclusion sample;
(2) learnt by great amount of samples, the weight to keyword is configured;
(3) difference of different contract classifications is found out, contract type is judged.
2. a kind of contract classification automatic identifying method according to claim 1, it is characterised in that:The step (1) is specific Comprise the following steps:
(101) by internet hunt to all kinds of contract samples, current sample number amount is 1000 parts;
(102) using the reverse segmentation methods of maximum based on dictionary, every class contract frequency of occurrence highest keyword is found out, according to The language rule of Chinese, screens the 3-6 word of Chinese character, removes the keyword of Adjective, and the pass within 30 is retained per class Keyword.
3. a kind of contract classification automatic identifying method according to claim 1, it is characterised in that:The step (2) is specific Comprise the following steps:
(201) different weights are set to different keywords, the keyword for can substantially determine contract type gives higher Weight;Keyword for can not substantially determine contract classification gives relatively low weight;
(202) keyword being given in the manner described above and setting weight from 1-50, each increment 10 differentiates that 1000 parts of samples are closed successively Whether same classification is correct, calculates corresponding discrimination, takes wherein discrimination highest as optimal weights.
4. a kind of contract classification automatic identifying method according to claim 1, it is characterised in that:The step (3) is specific Weight ratio including being assigned to the keyword with feature is higher, for distinguishing kinds of contract relatively.
5. a kind of contract classification automatic recognition system, it is characterised in that:Including
For using the deep learning algorithm for having manual intervention, carrying for contract intrinsic propesties keyword can not to be reflected in exclusion sample Modulus block;
For being learnt by great amount of samples, the setup module being configured to the weight of keyword;
Difference for finding out different contract classifications, judges the discrimination module of contract type.
6. a kind of contract classification automatic recognition system according to claim 5, it is characterised in that:The extraction module includes
For by internet hunt to all kinds of contract samples, current sample number amount to be 1000 parts of search module;
For using the reverse segmentation methods of maximum based on dictionary, finding out every class contract frequency of occurrence highest keyword, according to The language rule of Chinese, screens the 3-6 word of Chinese character, removes the keyword of Adjective, and the pass within 30 is retained per class The screening module of keyword.
7. a kind of contract classification automatic recognition system according to claim 5, it is characterised in that:The setup module includes
For setting different weights to different keywords, for can substantially determine that the keyword of contract type is given compared with Gao Quan Weight;Keyword for can not substantially determine contract classification gives the different weight setting modules of relatively low weight;
Weight is set from 1-50 for giving keyword in the manner described above, and each increment 10 differentiates that 1000 parts of samples are closed successively With classification it is whether correct, calculate corresponding discrimination, take the optimal weights of wherein discrimination highest as optimal weights Setup module.
CN201611265396.8A 2016-12-30 2016-12-30 A kind of contract classification automatic identifying method and system Pending CN106844554A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611265396.8A CN106844554A (en) 2016-12-30 2016-12-30 A kind of contract classification automatic identifying method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611265396.8A CN106844554A (en) 2016-12-30 2016-12-30 A kind of contract classification automatic identifying method and system

Publications (1)

Publication Number Publication Date
CN106844554A true CN106844554A (en) 2017-06-13

Family

ID=59117393

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611265396.8A Pending CN106844554A (en) 2016-12-30 2016-12-30 A kind of contract classification automatic identifying method and system

Country Status (1)

Country Link
CN (1) CN106844554A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019819A (en) * 2019-03-26 2019-07-16 方正株式(武汉)科技开发有限公司 Method of generating classification model, electronic contract automatic content classification method and device
CN110119464A (en) * 2019-05-09 2019-08-13 韶关市启之信息技术有限公司 The intelligent recommendation method and device of numerical value in a kind of contract
CN110826321A (en) * 2019-09-19 2020-02-21 平安科技(深圳)有限公司 Contract file risk checking method and device, computer equipment and storage medium
CN111950891A (en) * 2020-08-10 2020-11-17 云南电网有限责任公司信息中心 Contract legal risk management and control method based on artificial intelligence NLP technology
CN113936289A (en) * 2021-12-17 2022-01-14 中航金网(北京)电子商务有限公司 Cutter contract identification method and device, electronic equipment and medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1185797A (en) * 1997-09-01 1999-03-30 Canon Inc Automatic document classification device, learning device, classification device, automatic document classification method, learning method, classification method and storage medium
JP2000305948A (en) * 1999-04-26 2000-11-02 Ricoh Co Ltd Sorting device for group of documents and sorting method of group of documents
CN101196904A (en) * 2007-11-09 2008-06-11 清华大学 News keyword abstraction method based on word frequency and multi-component grammar
CN101308498A (en) * 2008-07-03 2008-11-19 上海交通大学 Text collection visualized system
CN101315624A (en) * 2007-05-29 2008-12-03 阿里巴巴集团控股有限公司 Text subject recommending method and device
CN102955812A (en) * 2011-08-29 2013-03-06 阿里巴巴集团控股有限公司 Method and device for building index database as well as method and device for querying
CN104361037A (en) * 2014-10-29 2015-02-18 国家计算机网络与信息安全管理中心 Microblog classifying method and device
CN105320778A (en) * 2015-11-25 2016-02-10 焦点科技股份有限公司 Commodity labeling method suitable for electronic commerce Chinese website

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1185797A (en) * 1997-09-01 1999-03-30 Canon Inc Automatic document classification device, learning device, classification device, automatic document classification method, learning method, classification method and storage medium
JP2000305948A (en) * 1999-04-26 2000-11-02 Ricoh Co Ltd Sorting device for group of documents and sorting method of group of documents
CN101315624A (en) * 2007-05-29 2008-12-03 阿里巴巴集团控股有限公司 Text subject recommending method and device
CN101196904A (en) * 2007-11-09 2008-06-11 清华大学 News keyword abstraction method based on word frequency and multi-component grammar
CN101308498A (en) * 2008-07-03 2008-11-19 上海交通大学 Text collection visualized system
CN102955812A (en) * 2011-08-29 2013-03-06 阿里巴巴集团控股有限公司 Method and device for building index database as well as method and device for querying
CN104361037A (en) * 2014-10-29 2015-02-18 国家计算机网络与信息安全管理中心 Microblog classifying method and device
CN105320778A (en) * 2015-11-25 2016-02-10 焦点科技股份有限公司 Commodity labeling method suitable for electronic commerce Chinese website

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019819A (en) * 2019-03-26 2019-07-16 方正株式(武汉)科技开发有限公司 Method of generating classification model, electronic contract automatic content classification method and device
CN110119464A (en) * 2019-05-09 2019-08-13 韶关市启之信息技术有限公司 The intelligent recommendation method and device of numerical value in a kind of contract
CN110119464B (en) * 2019-05-09 2021-03-23 韶关市启之信息技术有限公司 Intelligent recommendation method and device for numerical values in contract
CN110826321A (en) * 2019-09-19 2020-02-21 平安科技(深圳)有限公司 Contract file risk checking method and device, computer equipment and storage medium
CN111950891A (en) * 2020-08-10 2020-11-17 云南电网有限责任公司信息中心 Contract legal risk management and control method based on artificial intelligence NLP technology
CN113936289A (en) * 2021-12-17 2022-01-14 中航金网(北京)电子商务有限公司 Cutter contract identification method and device, electronic equipment and medium

Similar Documents

Publication Publication Date Title
CN106844554A (en) A kind of contract classification automatic identifying method and system
CN104331498B (en) A kind of method that web page contents to internet user access are classified automatically
RU2648946C2 (en) Image object category recognition method and device
CN109189767B (en) Data processing method and device, electronic equipment and storage medium
CN101876987A (en) Overlapped-between-clusters-oriented method for classifying two types of texts
CN108460421A (en) The sorting technique of unbalanced data
Zhang et al. Development of a supervised software tool for automated determination of optimal segmentation parameters for ecognition
CN113255340B (en) Theme extraction method and device for scientific and technological requirements and storage medium
CN104820724A (en) Method for obtaining prediction model of knowledge points of text-type education resources and model application method
CN105306296A (en) Data filter processing method based on LTE (Long Term Evolution) signaling
CN103246655A (en) Text categorizing method, device and system
CN108737290A (en) Non-encrypted method for recognizing flux based on load mapping and random forest
CN102521402B (en) Text filtering system and method
CN115146062A (en) Intelligent event analysis method and system fusing expert recommendation and text clustering
Siudek et al. Shaping physical properties of galaxy subtypes in the VIPERS survey: Environment matters
CN110765285A (en) Multimedia information content control method and system based on visual characteristics
Ferreira et al. A fuzzy c-means algorithm for fingerprint segmentation
CN110765266A (en) Method and system for merging similar dispute focuses of referee documents
CN112348360B (en) Chinese medicine production process parameter analysis system based on big data technology
CN107729877B (en) Face detection method and device based on cascade classifier
CN109002561A (en) Automatic document classification method, system and medium based on sample keyword learning
CN111598116B (en) Data classification method, device, electronic equipment and readable storage medium
CN110750712A (en) Software security requirement recommendation method based on data driving
CN116204647A (en) Method and device for establishing target comparison learning model and text clustering
CN115794803A (en) Engineering audit problem monitoring method and system based on big data AI technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170613

RJ01 Rejection of invention patent application after publication