CN109992752B - Label marking method, device, computer device and storage medium for contract file - Google Patents

Label marking method, device, computer device and storage medium for contract file Download PDF

Info

Publication number
CN109992752B
CN109992752B CN201910173513.5A CN201910173513A CN109992752B CN 109992752 B CN109992752 B CN 109992752B CN 201910173513 A CN201910173513 A CN 201910173513A CN 109992752 B CN109992752 B CN 109992752B
Authority
CN
China
Prior art keywords
contract
predefined
tag
file
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910173513.5A
Other languages
Chinese (zh)
Other versions
CN109992752A (en
Inventor
刘玉强
方俊波
鄢真
杨昊燃
李雯
叶素兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910173513.5A priority Critical patent/CN109992752B/en
Publication of CN109992752A publication Critical patent/CN109992752A/en
Application granted granted Critical
Publication of CN109992752B publication Critical patent/CN109992752B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Accounting & Taxation (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The label marking method of the contract file comprises the following steps: judging the types of the contract files, wherein each type corresponds to at least one file component part needing to be marked by a label; determining a label set corresponding to each file component part needing label marking; judging the tag set as a predefined or custom tag set; when the predefined tag set is the predefined tag set, positioning the position of each predefined tag in the predefined tag set from the contract file, and marking the predefined tag to a paragraph corresponding to the position; otherwise, identifying the core keywords from the file components, marking the core keywords as custom tags to paragraphs where the core keywords are located, and adding the core keywords into the custom tag set; the predefined and custom tag sets are combined into a new tag set. The application also provides a label marking device, a computer device and a storage medium of the contract file, which are beneficial to guaranteeing objective and accurate label output and improving sample processing efficiency.

Description

Label marking method, device, computer device and storage medium for contract file
Technical Field
The present application relates to the field of computer technology, and in particular, to a method for labeling a contract document, a device for labeling a contract document, a computer device, and a computer readable storage medium.
Background
Currently, more and more transaction activities require contract. Some transactions involve a lot of item information while signing and closing, and the item information is different due to different transactions, so that a great deal of manpower, material resources and financial resources are required to be input for classifying and labeling the contracts. However, since manual labeling requires participation of business personnel in the field, and on the other hand, the business personnel of different calendars have different cognition on the same problem, the manual labeling is greatly influenced by personal subjective factors, and has no unified standard. Furthermore, manual labeling takes a lot of time, and accuracy is difficult to ensure, and a lot of time is required.
Disclosure of Invention
In view of the above, there is a need for a method and apparatus for labeling a contract document, a computer apparatus, and a computer-readable storage medium, which solve the above problems.
A first aspect of the present application provides a method for labeling a contractual document, applied to a computer device, the method comprising:
dividing the contract file into a plurality of preset file components;
judging the types of the contract files, wherein each type corresponds to at least one file component part needing to be marked by a label;
determining a label set corresponding to each file component part needing label marking, wherein the label set is one of a predefined label set and a custom label set, and the predefined label set comprises a plurality of predefined labels;
judging the label set corresponding to each file component as a predefined label set or a custom label set;
when a tag set corresponding to one file component is a predefined tag set, positioning the position of each predefined tag in the predefined tag set from the contract file, and marking the predefined tag to a paragraph corresponding to the position;
when a tag set corresponding to one file component is a custom tag set, identifying a core keyword from the file component, marking the core keyword as a custom tag to a paragraph where the core keyword is located, and adding the core keyword into the custom tag set; and
the predefined tag set and the custom tag set are combined into a new tag set, the new tag set corresponding to one type of contract file, such that the computer device can use the new tag set to tag other contract files of the same type.
A second aspect of the present application provides a tag marking apparatus for a contract document, the apparatus comprising:
the division module is used for dividing the contract file into a plurality of preset file components;
the first judging module is used for judging the types of the contract files, and each type corresponds to at least one file component part needing to be marked with a label;
the system comprises a determining module, a marking module and a marking module, wherein the determining module is used for determining a label set corresponding to each file component part needing label marking, the label set is one of a predefined label set and a custom label set, and the predefined label set comprises a plurality of predefined labels;
the second judging module is used for judging that the label set corresponding to each file component is a predefined label set or a custom label set;
the positioning and marking module is used for positioning the position of each predefined tag in the predefined tag set from the contract file when the tag set corresponding to one of the file components is the predefined tag set, marking the predefined tag to a paragraph corresponding to the position, identifying a core keyword from the file component when the tag set corresponding to one of the file components is the custom tag set, marking the core keyword as a custom tag to the paragraph where the core keyword is located, and adding the core keyword into the custom tag set; and
and the merging module is used for merging the predefined tag set and the custom tag set into a new tag set, wherein the new tag set corresponds to one type of the contract files, so that the tag marking device of the contract files can use the new tag set to mark other contract files of the same type.
A third aspect of the application provides a computer apparatus comprising a processor for implementing a method of labelling a contract document as hereinbefore described when executing a computer program stored in a memory.
A fourth aspect of the application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method of tagging a contract document as described above.
The embodiment of the application determines a tag set according to the type of the contract file, and when the tag set is a predefined tag set, positions the predefined tag according to each predefined tag of the tag set and marks the predefined tag as a tag; when the tag set is a custom tag set, extracting core keywords as custom tags for marking, wherein the standards are unified, so that objective and accurate output of the tags can be guaranteed; furthermore, automatic labeling can improve work efficiency, avoids manpower waste.
Drawings
Fig. 1 is a flowchart of a method for labeling a contract document according to an embodiment of the present application.
Fig. 2 is a schematic structural diagram of a label marking device for a contract document according to a second embodiment of the present application.
Fig. 3 is a schematic diagram of a computer device according to a third embodiment of the present application.
Symbol description
The application will be further described in the following detailed description in conjunction with the above-described figures.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, and the described embodiments are merely some, rather than all, embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
Example 1
Referring to fig. 1, a flowchart of a method for labeling a contract document according to a first embodiment of the present application is shown. The label marking method of the contract file is applied to a computer device. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs.
Step S11, dividing the contract file into a plurality of preset file components.
In this embodiment, the file component includes an about-head portion, a body portion, and an about-tail portion. The contract header is located at the header of the contract document, and includes the name of the contract document, the contract number (date of contract, place of contract), the name and address of the buyer and seller, etc. The body portion is used to define rights and obligations of both parties, including terms of the transaction of the contract, such as commodity name, quality specifications, quantity packaging, unit price and total value, delivery terms, payment terms, insurance, verification, claims, unreliability and arbitration terms, and the like. The about tail part is positioned at the tail of the contract file and comprises the efficacy, the number of copies of the contract words, the contracted time and place and the effective time, the efficacy of the accessories, the signature of the two parties and the like.
In this embodiment, the format of the contract file is a text format.
And step S12, judging the type of the contract file.
The type of the contract file comprises one of a mortgage contract, a goods contract, a lease contract and a borrowing contract. In this embodiment, the type of the contract document is determined based on the name of the contract document in the about head part. If the names of the contract files in the about first part are "mortgage guarantee contract", "house mortgage contract" or "mortgage loan contract", etc., judging that the contract files are mortgage contracts; when the name of the contract file in the contract head part is 'order contract' or 'order contract', judging that the contract file is order contract; and when the names of the contract files in the about head part are 'house lease contract', 'land lease contract' or 'factory building lease contract', and the like, judging that the contract files are lease contracts.
In this embodiment, the computer device stores the correspondence between different types of the contract file, at least one file component to be labeled, and a label set. Each type corresponds to at least one file component part needing to be labeled, and each file component part corresponds to a label set. Each tag set is one of a custom tag set and a predefined tag set, the content of the custom tag set is empty, and the predefined tag set comprises a plurality of predefined tags. The predefined set of tags may be collected by a professional, and the predefined tags may be keywords that typically occur in a contract document of the type. The predefined tags may also be set and altered according to the corresponding important information of the file components. For example, for a mortgage contract, if it is desired to tag the contract, including a header portion and a body portion, the header portion corresponds to a predefined tag set that may include a mortgage person, a legal representative (typically all mortgage contracts will have both keywords); the body portion corresponds to another predefined set of tags, which may include mortgage, mortgage rate, mortgage guarantee range (typically all mortgage contracts will have these four keywords present); the tail portion corresponds to a custom tag set.
And step S13, determining a label set corresponding to each file component part needing label marking according to the type.
The tag set is determined according to the type and the corresponding relation.
Step S14, judging whether the tag set corresponding to each file component is a predefined tag set, if so, performing step S15; otherwise, step S16 is performed.
Step S15, locating the position of each predefined tag in the predefined tag set from the contract file, and marking the predefined tag to a paragraph corresponding to the position.
For example, if the approximate head part of the mortgage contract corresponds to a predefined tag set, and the predefined tags included in the predefined tag set comprise a mortgage person and a legal representative person, the computer device locates the mortgage person and the legal representative person at the position of the approximate head part, and marks the mortgage person and the legal representative person to the corresponding paragraphs respectively. If the text part of the mortgage contract corresponds to a predefined tag set and the predefined tags contained in the predefined tag set comprise a mortgage, a mortgage rate and a mortgage guarantee range, the computer device positions the mortgage, the mortgage rate and the mortgage guarantee range at the position of the text part and marks the mortgage, the mortgage rate and the mortgage guarantee range to corresponding paragraphs respectively.
Subsequently, after the marking of the predefined tag to the paragraph corresponding to the location, the method may further comprise the steps of: and extracting the text content corresponding to each predefined label after the position of the predefined label is located, and associating the extracted text content with the predefined label. Wherein the extracted text content is used for reflecting the corresponding key information of the predefined tag.
For example, when mortgage person A and statutory person B are in the approximately header section, then the extracted text content is A, B, respectively. And when the mortgage in the tail part is C, the mortgage rate is D, and the mortgage guarantee range is E, the extracted text content is C, D, E respectively.
Since the text content corresponding to each predefined tag is usually located in the same paragraph as the predefined tag, in order to ensure the accuracy of the extraction of the corresponding text content, in this embodiment, after locating the location of each predefined tag in the determined tag set from the contract document, the computer device further identifies the contract content located in the same paragraph after the predefined tag, performs language logic relationship analysis on the contract content, so as to split the contract content after the predefined tag into at least one word unit, and then extracts the word unit conforming to the semantic meaning of the most predefined tag as the text content corresponding to the predefined tag.
For example, for a mortgage contract, the format of the contract content following a "mortgage person" is typically:
mortgage person (hereinafter referred to as party b): zhang San (Zhang San)
Identification card number: xxxxxxxxxxxxxxx
Contact phone: xxxxxxxxxx
Thus, in locating the position of a "mortgage person" in the contract document, the computer means identifies the contract content of the same paragraph (hereinafter referred to as party b): zhang San (Zhang San). The contract content is then split into "(", "below", "referred to as", "b", ")", according to linguistic logic analysis: "and" Zhang San ", since" Zhang San "most conforms to the semantic meaning of the mortgage, the computer device takes" Zhang San "as the text content corresponding to the predefined label" mortgage ". In this way, the accuracy of text content extraction can be improved.
And S16, identifying a core keyword from the file component part, marking the core keyword as a custom tag to a paragraph where the core keyword is located, and adding the core keyword into the custom tag set.
In this embodiment, the identifying the core keyword from the file component includes: filtering unused words including punctuation and special symbols in the file component, and then carrying out language logic relation analysis on the filtered file component so as to split the file component into a plurality of word units, and taking at least one word unit capable of reflecting the conclusion and meaning of the file component as the core keyword. For example, for the tail-end portion of a mortgage contract, because the tail-end portion typically includes the efficacy, number of copies, time and place of contract and time of validation, efficacy of the attachment, and both parties signature, etc., the computer device may take the contracted time, contracted place and time of validation as the core keywords and then tag the core keywords as custom tags to the corresponding paragraphs. Further, before adding the core keywords into the custom tag set, the computer device performs synonym expansion on the core keywords, and then adds the expanded core keywords into the custom tag set.
Subsequently, after the step of marking the core keyword as a custom tag to the paragraph where the core keyword is located, the method may further include the following steps: and extracting the text content corresponding to the position of each core keyword, and associating the extracted text content with the custom tag. The extracted text content is used for reflecting the corresponding key information of the core key words.
S17, combining the predefined tag set and the custom tag set into a new tag set, wherein the new tag set corresponds to one type of contract files, so that the computer device can use the new tag set to mark other contract files of the same type.
In this embodiment, the marking the other contract files of the same type using the new tag set includes:
(a) Judging the type of the contract file needing to be marked currently;
(b) Judging whether the type is consistent with the type corresponding to the new tag set;
(c) And when the type is consistent with the type corresponding to the new tag set, marking the contract file by using the new tag set. Specifically, the contract file is divided into a plurality of preset file components, a tag set corresponding to each file component is determined according to the type, the position of each predefined tag in the tag set is defined from the contract file, and the predefined tags are used as tags to be marked on paragraphs where the keywords are located, so that the contract file is marked.
And combining the predefined tag set and the custom tag set, and then applying the new tag set to a new contract file of the same type to automatically tag, so that the existing tag set is further enriched.
The method for labeling a contract document according to the present application is described in detail with reference to fig. 1, and the program modules of the software device for implementing the method for labeling a contract document and the hardware device architecture for implementing the method for labeling a contract document are described below with reference to fig. 2 to 3.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
Example two
FIG. 2 is a block diagram of a preferred embodiment of a tag marking apparatus for a contract document according to the present application.
In some embodiments, the tag marking device 10 of the contract document runs in a computer device. The tag marking apparatus 10 of the contract document may include a plurality of program modules composed of program code sections. Program code for each program segment in the tagging means 10 of the contract document may be stored in a memory of the computer means and executed by the at least one processor to perform the tagging function of the contract document.
In this embodiment, the tag marking apparatus 10 of the contract document may be divided into a plurality of program modules according to the functions performed by the tag marking apparatus. Referring to fig. 2, the program modules may include: the device comprises a dividing module 101, a first judging module 102, a determining module 103, a second judging module 104, a positioning and marking module 105 and a combining module 107. The module referred to in the present application refers to a series of computer program segments capable of being executed by at least one processor and of performing a fixed function, stored in a memory. In the present embodiment, the functions of the respective modules will be described in detail in the following embodiments.
The dividing module 101 is configured to divide the contract file into a plurality of preset file components.
In this embodiment, the file component includes an about-head portion, a body portion, and an about-tail portion. The contract header is located at the header of the contract document, and includes the name of the contract document, the contract number (date of contract, place of contract), the name and address of the buyer and seller, etc. The body portion is used to define rights and obligations of both parties, including terms of the transaction of the contract, such as commodity name, quality specifications, quantity packaging, unit price and total value, delivery terms, payment terms, insurance, verification, claims, unreliability and arbitration terms, and the like. The about tail part is positioned at the tail of the contract file and comprises the efficacy, the number of copies of the contract words, the contracted time and place and the effective time, the efficacy of the accessories, the signature of the two parties and the like.
In this embodiment, the format of the contract file is a text format.
The first determining module 102 is configured to determine a type of the contract document.
The type of the contract file comprises one of a mortgage contract, a goods contract, a lease contract and a borrowing contract. In this embodiment, the type of the contract document is determined based on the name of the contract document in the about head part. For example, when the name of the contract document in the about first part is "mortgage guarantee contract", "house mortgage contract" or "mortgage loan contract", etc., the first judging module 102 judges that the contract document is a mortgage contract; when the name of the contract document in the contract header is "order contract" or "order contract", the first judging module 102 judges that the contract document is an order contract; when the names of the contract files in the about first part are "house lease contract", "land lease contract" or "factory building lease contract", etc., the first judgment module 102 judges that the contract files are lease contracts.
In this embodiment, the computer device stores the correspondence between different types of the contract file, at least one file component to be labeled, and a label set. Each type corresponds to at least one file component part needing to be labeled, and each file component part corresponds to a label set. Each tag set is one of a custom tag set and a predefined tag set, the content of the custom tag set is empty, and the predefined tag set comprises a plurality of predefined tags. The predefined set of tags may be collected by a professional, and the predefined tags may be keywords that typically occur in a contract document of the type. The predefined tags may also be set and altered according to the corresponding important information of the file components. For example, for a mortgage contract, if it is desired to tag the contract, including a header portion and a body portion, the header portion corresponds to a predefined tag set that may include a mortgage person, a legal representative (typically all mortgage contracts will have both keywords); the body portion corresponds to another predefined set of tags, which may include mortgage, mortgage rate, mortgage guarantee range (typically all mortgage contracts will have these four keywords present); the tail portion corresponds to a custom tag set.
The determining module 103 is configured to determine, according to the type, a tag set corresponding to each of the file components to be tagged.
The tag set is determined according to the type and the corresponding relation.
The second determining module 104 is configured to determine whether the tag set corresponding to each file component is a predefined tag set.
When the tag set corresponding to one of the file components is a predefined tag set, the positioning and marking module 105 is configured to position each predefined tag in the predefined tag set from the contract file, and mark the predefined tag to a paragraph corresponding to the position.
For example, if the approximate header portion of the mortgage contract corresponds to a predefined tag set and the predefined tags included in the predefined tag set include a mortgage person and a legal representative person, the positioning and marking module 105 positions the mortgage person and the legal representative person at the approximate header portion and marks the mortgage person and the legal representative person to the corresponding paragraphs, respectively. If the text portion of the mortgage contract corresponds to a predefined tag set and the predefined tags included in the predefined tag set include a mortgage, a mortgage rate, and a mortgage guarantee range, the positioning and marking module 105 positions the mortgage, the mortgage rate, and the mortgage guarantee range at the location of the text portion, and marks the mortgage, the mortgage rate, and the mortgage guarantee range to corresponding paragraphs, respectively.
In this embodiment, the program modules of the tag marking apparatus 10 of the contract document may further include an association module 106. After the positioning and marking module 105 marks the predefined tags to paragraphs corresponding to the locations, the associating module 106 is configured to extract text content corresponding to each predefined tag at a location and associate the extracted text content with the predefined tag. Wherein the extracted text content is used for reflecting the corresponding key information of the predefined tag.
For example, when the mortgage person A and the legal representative person B in the approximately header portion, the text content extracted by the association module 106 is A, B, respectively. When the mortgage in the tail portion is C, the mortgage rate is D, and the mortgage guarantee range is E, the text content extracted by the association module 106 is C, D, E respectively.
Since the text content corresponding to each predefined tag is usually located in the same paragraph as the predefined tag, in order to ensure the accuracy of extracting the corresponding text content, in this embodiment, after the locating and marking module 105 locates the location of each predefined tag in the determined tag set from the contract document, the association module 106 further identifies the contract content located in the same paragraph after the predefined tag, and performs language logic relationship analysis on the contract content, so as to split the contract content after the predefined tag into at least one word unit, and then extract the word unit conforming to the semantic meaning of the most predefined tag as the text content corresponding to the predefined tag.
For example, for a mortgage contract, the format of the contract content following a "mortgage person" is typically:
mortgage person (hereinafter referred to as party b): zhang San (Zhang San)
Identification card number: xxxxxxxxxxxxxxx
Contact phone: xxxxxxxxxx
Thus, in locating the position of the "mortgage" in the treaty document, the association module 106 identifies the treaty content of the same paragraph (hereinafter referred to as party b): zhang San (Zhang San). The association module 106 then splits the contract content into "(", "below", "referred to as", "b", ")", according to linguistic logic analysis: "and" Zhang San ", since" Zhang San "most conforms to the semantic meaning of the mortgage, the computer device takes" Zhang San "as the text content corresponding to the predefined label" mortgage ". In this way, the accuracy of text content extraction can be improved.
When the tag set corresponding to one of the file components is a custom tag set, the positioning and marking module 105 is further configured to identify a core keyword from the file component, mark the core keyword as a custom tag to a paragraph where the core keyword is located, and add the core keyword into the custom tag set.
In this embodiment, the positioning and marking module 105 filters the unused words including punctuation and special symbols in the document component, and then performs language logic relationship analysis on the filtered document component, so as to split the document component into a plurality of word units, and uses at least one word unit capable of reflecting the conclusion and meaning of the document component as the core keyword. For example, for the tail-end portion of a mortgage contract, the location and marking module 105 may take the contracted time, contracted place, and time of validation as the core keywords and then mark the core keywords as custom labels to the corresponding paragraphs, as the tail-end portion typically includes the efficacy, number of copies, contracted time and place, time of validation, efficacy of the attachment, and both parties signature, etc. Further, before adding the core keyword to the custom tag set, the positioning and marking module 105 performs synonym expansion on the core keyword, and then adds the expanded core keyword to the custom tag set.
Subsequently, after the positioning and marking module 105 marks the core keywords as custom labels to the paragraphs where the core keywords are located, the association module 106 is further configured to extract text content corresponding to the positions where each core keyword is located, and associate the extracted text content with the custom labels. The extracted text content is used for reflecting the corresponding key information of the core key words.
The merging module 107 is configured to merge the predefined tag set and the custom tag set into a new tag set, where the new tag set corresponds to one type of contract file, so that the computer device can use the new tag set to mark other contract files of the same type.
In this embodiment, the marking the other contract files of the same type using the new tag set includes:
(a) The first judging module 102 judges the type of the contract file to be labeled currently;
(b) The first judging module 102 judges whether the type is consistent with the type corresponding to the new tag set;
(c) When the type is consistent with the type corresponding to the new tag set, the dividing module 101 divides the contract file into a plurality of preset file components, the determining module 103 determines the tag set corresponding to each file component according to the type, and the positioning and marking module 105 defines the position of each predefined tag in the tag set from the contract file, marks the predefined tag as a tag to a paragraph where the keyword is located, and marks the contract file.
And combining the predefined tag set and the custom tag set, and then applying the new tag set to a new contract file of the same type to automatically tag, so that the existing tag set is further enriched.
As described above, in the tag marking device for a contract file in the embodiment of the present application, a tag set is determined according to the type of the contract file, and when the tag set is a predefined tag set, then the location of the predefined tag is located according to each predefined tag of the tag set, and the predefined tag is marked as a tag; when the tag set is a custom tag set, extracting core keywords as custom tags for marking, wherein the standards are unified, so that objective and accurate output of the tags can be guaranteed; furthermore, automatic labeling can improve work efficiency, avoids manpower waste.
Example III
FIG. 3 is a schematic diagram of a computer device according to a preferred embodiment of the application.
The computer device 1 comprises a memory 20, a processor 30 and a computer program 40, such as a label marking program of a contract document, stored in the memory 20 and executable on the processor 30. The processor 30 implements the steps of the label marking method embodiment of the contract file described above, such as steps S11 to S17 shown in fig. 1, when executing the computer program 40. Alternatively, the processor 30, when executing the computer program 40, performs the functions of the modules/units of the tag marking apparatus embodiment of the contract document described above, such as modules 101-107 in FIG. 2.
Illustratively, the computer program 40 may be partitioned into one or more modules/units that are stored in the memory 20 and executed by the processor 30 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments are used for describing the execution of the computer program 40 in the computer device 1. For example, the computer program 40 may be partitioned into a partitioning module 101, a first determining module 102, a determining module 103, a second determining module 104, a positioning and marking module 105, an associating module 106, and a merging module 107 in fig. 2. For specific functions of each module, see embodiment two.
The computer device 1 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. It will be appreciated by a person skilled in the art that the schematic diagram is only an example of the computer apparatus 1 and does not constitute a limitation of the computer apparatus 1, and may comprise more or less components than shown, or may combine certain components, or different components, e.g. the computer apparatus 1 may further comprise input and output devices, network access devices, buses, etc.
The processor 30 may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor 30 may be any conventional processor or the like, the processor 30 being the control center of the computer device 1, the various interfaces and lines being used to connect the various parts of the overall computer device 1.
The memory 20 may be used to store the computer program 40 and/or modules/units, and the processor 30 may perform various functions of the computer device 1 by executing or executing the computer program and/or modules/units stored in the memory 20, and invoking data stored in the memory 20. The memory 20 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the computer apparatus 1 (such as audio data, phonebook, etc.), and the like. In addition, the memory 20 may include high-speed random access memory, and may also include nonvolatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid state storage device.
The modules/units integrated in the computer device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
In the several embodiments provided by the present application, it should be understood that the disclosed computer apparatus and method may be implemented in other ways. For example, the above-described embodiments of the computer apparatus are merely illustrative, and for example, the division of the units is merely a logical function division, and there may be other manners of division when actually implemented.
In addition, each functional unit in the embodiments of the present application may be integrated in the same processing unit, or each unit may exist alone physically, or two or more units may be integrated in the same unit. The integrated units may be implemented in hardware or in hardware plus software program modules.
It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. Multiple units or computer means recited in the computer means claim may also be implemented by means of software or hardware by means of the same unit or computer means. The terms first, second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present application without departing from the spirit and scope of the technical solution of the present application.

Claims (8)

1. The label marking method of the contract file is applied to a computer device and is characterized by comprising the following steps:
dividing the contract file into a plurality of preset file components;
judging the types of the contract files, wherein each type corresponds to at least one file component part needing to be marked by a label;
determining a label set corresponding to each file component part needing label marking, wherein the label set is one of a predefined label set and a custom label set, and the predefined label set comprises a plurality of predefined labels;
judging the label set corresponding to each file component as a predefined label set or a custom label set;
when a tag set corresponding to one file component is a predefined tag set, positioning the position of each predefined tag in the predefined tag set from the contract file, marking the predefined tag to a paragraph corresponding to the position, extracting text content corresponding to each predefined tag after the position is located, and associating the extracted text content with the predefined tag;
when a tag set corresponding to one file component is a custom tag set, identifying a core keyword from the file component, marking the core keyword as a custom tag to a paragraph where the core keyword is located, adding the core keyword into the custom tag set, extracting text content corresponding to each position where the core keyword is located, and associating the extracted text content with the custom tag; and
the predefined tag set and the custom tag set are combined into a new tag set, the new tag set corresponding to one type of contract file, such that the computer device can use the new tag set to tag other contract files of the same type.
2. The method for labeling a contract document according to claim 1, wherein extracting text content corresponding to each predefined label after the position of the predefined label is extracted specifically includes:
identifying contract content located in the same paragraph after each predefined tag in the determined tag set after locating the location of the predefined tag from the contract document;
performing language logic relation analysis on the contract content so as to split the contract content after the predefined label into at least one word unit; and
and extracting word units conforming to the semantics of the most predefined label as the text content corresponding to the predefined label.
3. The method for tagging a contract document according to claim 1, wherein said identifying core keywords from said document parts specifically includes:
filtering non-use words including punctuation and special symbols in the file component parts;
carrying out language logic relation analysis on the filtered file component parts so as to split the file component parts into a plurality of word units; and
and taking at least one word unit capable of reflecting the conclusion and meaning of the file component as the core keyword.
4. The method of tagging contract documents according to claim 1, wherein the tagging other contract documents of the same type with the new tag set includes:
judging the type of the contract file needing to be marked currently;
judging whether the type is consistent with the type corresponding to the new tag set; and
when the type is consistent with the type corresponding to the new tag set, marking the contract file by using the new tag set; and when the type is inconsistent with the type corresponding to the new tag set, marking the contract file without using the new tag set.
5. The method for labeling a contract document according to claim 1, wherein the computer device stores correspondence between different types of the contract document, at least one document component to be labeled, and a label set, wherein each type corresponds to at least one document component to be labeled, each document component corresponds to a label set, and the label set corresponding to each document component to be labeled is determined according to the correspondence.
6. A tag marking apparatus for a contract document, said apparatus comprising:
the division module is used for dividing the contract file into a plurality of preset file components;
the first judging module is used for judging the types of the contract files, and each type corresponds to at least one file component part needing to be marked with a label;
the system comprises a determining module, a marking module and a marking module, wherein the determining module is used for determining a label set corresponding to each file component part needing label marking, the label set is one of a predefined label set and a custom label set, and the predefined label set comprises a plurality of predefined labels;
the second judging module is used for judging that the label set corresponding to each file component is a predefined label set or a custom label set;
the positioning and marking module is used for positioning the position of each predefined tag in the predefined tag set from the contract file when the tag set corresponding to one file component is the predefined tag set, marking the predefined tag to a paragraph corresponding to the position, extracting text content corresponding to each predefined tag after the position is located, and associating the extracted text content with the predefined tag; the positioning and marking module is further used for identifying a core keyword from the file component part when the tag set corresponding to the file component part is a custom tag set, marking the core keyword as a custom tag to a paragraph where the core keyword is located, adding the core keyword into the custom tag set, extracting text content corresponding to each position where the core keyword is located, and associating the extracted text content with the custom tag; and
and the merging module is used for merging the predefined tag set and the custom tag set into a new tag set, wherein the new tag set corresponds to one type of the contract files, so that the tag marking device of the contract files can use the new tag set to mark other contract files of the same type.
7. A computer apparatus, characterized in that: the computer device comprising a processor for implementing the method of tagging of a treaty document according to any of claims 1-5 when executing a computer program stored in a memory.
8. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program, when executed by a processor, implements a method of tagging a contract document according to any one of claims 1 to 5.
CN201910173513.5A 2019-03-07 2019-03-07 Label marking method, device, computer device and storage medium for contract file Active CN109992752B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910173513.5A CN109992752B (en) 2019-03-07 2019-03-07 Label marking method, device, computer device and storage medium for contract file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910173513.5A CN109992752B (en) 2019-03-07 2019-03-07 Label marking method, device, computer device and storage medium for contract file

Publications (2)

Publication Number Publication Date
CN109992752A CN109992752A (en) 2019-07-09
CN109992752B true CN109992752B (en) 2023-10-20

Family

ID=67130301

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910173513.5A Active CN109992752B (en) 2019-03-07 2019-03-07 Label marking method, device, computer device and storage medium for contract file

Country Status (1)

Country Link
CN (1) CN109992752B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532563B (en) * 2019-09-02 2023-06-20 苏州美能华智能科技有限公司 Method and device for detecting key paragraphs in text
CN110569370B (en) * 2019-09-16 2022-09-02 北京百度网讯科技有限公司 Knowledge graph construction method and device, electronic equipment and storage medium
US11275934B2 (en) * 2019-11-20 2022-03-15 Sap Se Positional embeddings for document processing
CN112199466B (en) * 2020-09-08 2024-04-12 深圳价值在线信息科技股份有限公司 Method and device for identifying associated rule of mail
CN113360459A (en) * 2021-07-08 2021-09-07 国网能源研究院有限公司 Method, system and device for semi-automatically marking and storing files

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2867786A1 (en) * 2012-03-23 2013-09-26 Blackberry Limited Systems and methods for presenting content relevant to text
CN106528506A (en) * 2016-10-20 2017-03-22 广东小天才科技有限公司 XML tag-based data processing method and apparatus, and terminal device
CN108536676A (en) * 2018-03-28 2018-09-14 广州华多网络科技有限公司 Data processing method, device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10235374B2 (en) * 2016-03-08 2019-03-19 International Business Machines Corporation Key-value store for managing user files based on pairs of key-value pairs

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2867786A1 (en) * 2012-03-23 2013-09-26 Blackberry Limited Systems and methods for presenting content relevant to text
CN106528506A (en) * 2016-10-20 2017-03-22 广东小天才科技有限公司 XML tag-based data processing method and apparatus, and terminal device
CN108536676A (en) * 2018-03-28 2018-09-14 广州华多网络科技有限公司 Data processing method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109992752A (en) 2019-07-09

Similar Documents

Publication Publication Date Title
CN109992752B (en) Label marking method, device, computer device and storage medium for contract file
CN110781299B (en) Asset information identification method, device, computer equipment and storage medium
CN110163478B (en) Risk examination method and device for contract clauses
CN111737499B (en) Data searching method based on natural language processing and related equipment
US10019535B1 (en) Template-free extraction of data from documents
RU2613846C2 (en) Method and system for extracting data from images of semistructured documents
CN110909123B (en) Data extraction method and device, terminal equipment and storage medium
CN112732897A (en) Document processing method and device, electronic equipment and storage medium
US11392774B2 (en) Extracting relevant sentences from text corpus
US20220092878A1 (en) Method and apparatus for document management
CN112434884A (en) Method and device for establishing supplier classified portrait
CN115599885A (en) Document full-text retrieval method and device, computer equipment, storage medium and product
CN110837727A (en) Document template generation method and device, terminal equipment and medium
US11120074B2 (en) Streamlining citations and references
EP4300445A1 (en) Generalizable key-value set extraction from documents using machine learning models
CN112765965A (en) Text multi-label classification method, device, equipment and storage medium
CN111639250A (en) Enterprise description information acquisition method and device, electronic equipment and storage medium
CN111428497A (en) Method, device and equipment for automatically extracting financing information
CN110544467A (en) Voice data auditing method, device, equipment and storage medium
CN115544256A (en) Automatic data classification and classification method and system based on NLP algorithm model
CN110909538B (en) Question and answer content identification method and device, terminal equipment and medium
CN110909112B (en) Data extraction method, device, terminal equipment and medium
CN110069595B (en) Corpus label determining method and device, electronic equipment and storage medium
CN112199466B (en) Method and device for identifying associated rule of mail
CN115357688B (en) Enterprise list information acquisition method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant