CN113033197A - Building construction contract rule query method and device - Google Patents

Building construction contract rule query method and device Download PDF

Info

Publication number
CN113033197A
CN113033197A CN202110315094.1A CN202110315094A CN113033197A CN 113033197 A CN113033197 A CN 113033197A CN 202110315094 A CN202110315094 A CN 202110315094A CN 113033197 A CN113033197 A CN 113033197A
Authority
CN
China
Prior art keywords
word
text
construction contract
words
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110315094.1A
Other languages
Chinese (zh)
Inventor
邓逸川
邓晖
苏成
王煜
宋建炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sino Singapore International Joint Research Institute
Original Assignee
Sino Singapore International Joint Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sino Singapore International Joint Research Institute filed Critical Sino Singapore International Joint Research Institute
Priority to CN202110315094.1A priority Critical patent/CN113033197A/en
Publication of CN113033197A publication Critical patent/CN113033197A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Tourism & Hospitality (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Technology Law (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for inquiring building construction contract laws and regulations, wherein the method comprises the following steps: collecting a building construction contract regulation, electronizing the building construction contract regulation, and establishing a building construction contract regulation library; performing text word segmentation and word removal processing on building construction contract rules based on a natural language processing technology, and calculating feature words through a word frequency inverse text algorithm; performing synonym expansion query of the feature words through a self-built common term lexicon and a continuous word bag model of the building construction contract rules; similarity calculation of contract rules is carried out based on a vector space model and a cosine function improvement method, and corresponding laws and regulations in construction contract conditions are obtained; the whole database and the query system are integrated into a local server or intelligent equipment, so that the level of construction contract management can be greatly improved, the benefit of each party is prevented from being damaged, the query time is saved, and the query efficiency is improved.

Description

Building construction contract rule query method and device
Technical Field
The invention relates to the technical field of building construction contract regulation management, in particular to a building construction contract regulation query method and a device thereof based on a natural language processing technology.
Background
The construction of the engineering project is a comprehensive production activity of multiple categories, the construction period of the engineering project is long, and a plurality of uncertain factors exist in the construction process. Typically, contractor participants specify their roles and responsibilities by contracting to prevent such claims and disputes from occurring. However, in the process, favorable terms that may be beneficial to the contractor are often revised or even deleted, thereby presenting a significant potential risk to the contractor. Therefore, there is a need for efficient querying of relevant construction contract regulatory terms to avoid future risks. If an automatic inquiry system of the building construction contract laws and regulations is established, the corresponding construction contract laws and regulations are searched by inputting keywords, the level of construction contract management can be greatly improved, and the benefit damage of each party is avoided.
Currently, regarding the construction contract management, contractors and parties can write in a direction beneficial to themselves and present the written text in an unstructured manner, and it takes a lot of time and is inefficient for the parties to check the corresponding laws and regulations, so that there is a great gap in the contract management field in the inquiry of the construction contract regulations.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a building construction contract rule query method and a device thereof, which can greatly improve the level of construction contract management, avoid the loss of benefits of all parties, save query time and improve query efficiency.
In order to achieve the purpose, the invention provides a method for inquiring a building construction contract rule, which comprises the following steps:
step S1, collecting the construction contract laws and regulations, electronizing the construction contract laws and regulations, and establishing a construction contract laws and regulations library;
step S2, performing text word segmentation and word removal processing on the building construction contract rules based on the natural language processing technology, and calculating characteristic words through a word frequency inverse text algorithm;
step S3, performing synonym expansion query of the feature words through a self-built common term word bank and a continuous word bag model of the construction contract rules;
step S4, calculating the similarity of contract rules based on a vector space model and a cosine function improvement method to obtain corresponding laws and regulations in construction contract conditions;
step S5, the whole database and query system is integrated into a local server or an intelligent device.
Preferably, the step S2 includes the following steps:
step S21, performing word segmentation processing on the construction contract rules through jieba, wherein the jieba word segmentation uses a prefix tree to classify words for improving the searching efficiency;
step S22, removing the null words existing in the text of the construction contract rules by self-building a stop word library, wherein the stop words are extremely common words and have little value for helping to calculate the similarity of the text, and the size of the library can be greatly reduced and the retrieval efficiency can be improved by deleting the meaningless words;
and step S23, selecting a word frequency inverse text algorithm to extract the characteristic words through algorithm comparison and selection, calculating the weight of the characteristic words, and extracting the characteristic words in the construction contract rules.
Preferably, the step S23 includes the following steps:
step S231, calculating a weight according to the importance of the words on the basis of the word frequency, wherein the weight is called 'inverse text frequency', and the size of the weight is inversely proportional to the common degree of the words;
step S232, different weights are given to different words; giving a larger weight to less common words, giving a smaller weight to more common words, giving a minimum weight to most common words, and multiplying the word frequency and the inverse text frequency to obtain a TF-IDF value of the words;
and step S233, the higher the importance of the word to the text, the larger the TF-IDF value of the word is, and the feature value extraction of the text can be completed according to the descending order of the TF-IDF value.
Preferably, the word frequency, the inverse text frequency, and the word frequency-inverse text frequency are calculated as follows:
word frequency TF: the number of times a feature value appears in the text, i.e. if ti, k appears ni, k times in the text di
TFi,k=ni,k
In practical applications, to avoid statistical deviations due to too long text, a normalization process, Σ, is generally requiredmnm,kI.e. the total number of words of the text:
Figure RE-GDA0003060750020000031
inverse text frequency IDF: the frequency of the feature items appearing in the total text set D is that if the total text set has M texts and the feature items ti, k appear in mi, k texts
Figure RE-GDA0003060750020000032
Wherein alpha is an empirical constant, and is generally 0.01; the more common the denominator of the word is, the smaller the inverse text frequency is; the reason for the denominator plus a is to avoid being 0, i.e. all text does not contain the word;
word frequency-inverse text frequency IF-IDF: the IF-IDF calculation method is that the word frequency is multiplied by the inverse text frequency
wi,k=TFi,k*IDFi,k
The word frequency-inverse text frequency is inversely proportional to the occurrence number of a word in the whole total text library and is directly proportional to the occurrence number of the word in a specific text, so that the word frequency-inverse text frequency of the word is calculated, and the characteristic value is extracted by descending order.
Preferably, the step S3 includes the following steps:
step S31, giving a training text, namely a building construction contract rule base and a Chinese Wikipedia, using one-hot codes as input of a CBOW model, setting the dimension of a self-setting word vector as 100, setting a window as 5, setting the minimum occurrence frequency as 5, setting the number of threads used by the training word vector as 9, embedding words through the CBOW model, accumulating the input word vectors, and finally finishing vectorization representation of the words through a two-classifier;
and step S32, reading the feature words extracted in the step S2, obtaining word vectors of the feature words by using the trained word vectors, calculating the first 5 words most similar to the feature words by using cosine distance, and performing synonym expansion.
Preferably, the CBOW model is a three-layer neural network model;
the first layer of the CBOW model is an input layer, and word vectors with known contexts are input;
the middle layer of the CBOW model is called a linear hidden layer and accumulates all input word vectors;
the third layer of the CBOW model is a two-classifier softmax, and corresponding word near-meaning word expansion is obtained through training.
Preferably, the step S4 includes: after the characteristic words and the synonyms are obtained, a vector space model is utilized, a cosine function is improved, the similarity between building construction contract laws and regulations is calculated, the cosine coefficient algorithm result is accurate and is the most common calculation method in VSM, the similarity between an input case and a text is calculated by using a similarity model in a third party tool genim of Python, the text is sequenced from large to small according to the similarity value, and finally, a corresponding law bar in a construction contract condition is obtained as an output result;
Figure RE-GDA0003060750020000041
where Sim (t _1, t _0) is the original query and Sim (t _1, t _ k) is the extended query, so that values between 0< λ <1 are taken, and after multiple verifications, λ is set to 0.7.
The invention also provides a building construction contract rule inquiry device, which is characterized by comprising the following components:
the construction contract rule acquisition and processing module is used for acquiring and collecting the construction contract rules and rules, electronizing the acquired and collected construction contract rules and establishing a construction contract rule library; performing text word segmentation and word removal processing on building construction contract rules based on a natural language processing technology, and calculating feature words through a word frequency inverse text algorithm;
the synonym expansion query module is connected with the construction contract rule acquisition and processing module and is used for carrying out synonym expansion query on the characteristic words through a self-built common term word bank and a continuous word bag model of the construction contract rule;
the contract rule retrieval module is connected with the synonym expansion query module, and is used for calculating the similarity of the contract rules based on a vector space model and a cosine function improvement method to obtain corresponding laws in the construction contract conditions;
the construction contract rule obtaining and processing module comprises a crawler algorithm and word segmentation stop words; the synonym expansion query module comprises a text vectorization and continuous bag-of-words model; the contract regulation retrieval module comprises a vector space model and cosine similarity calculation.
Preferably, the system further comprises a local server or an intelligent device, and the whole database and the query system are stored in the local server or the intelligent device.
Compared with the prior art, the invention has the beneficial effects that:
1. the query method and the query device provided by the invention can query the corresponding construction contract rules in the project contract at any time, are realized based on the natural language processing technology, can avoid the risks unfavorable to contractors and parties in the construction contract, avoid the damage of benefits of the parties, reduce the query time of laws and regulations, improve the query efficiency, and have better significance for improving the contract management level of the whole construction contract and the regulations query.
2. The invention can use mobile phone or flat board to inquire, user can input corresponding construction contract to obtain corresponding law in 'construction contract condition'; the invention can directly output the corresponding law in the construction contract condition after the project construction contract is input, thereby effectively avoiding the claim and dispute of contract risk, improving the management level of the construction contract and avoiding the benefit damage of all parties.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating steps of a method for inquiring a building construction contract rule provided by the present invention;
fig. 2 is a schematic diagram illustrating an example analysis of a method for querying a building construction contract rule provided by the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are one embodiment of the present invention, and not all embodiments of the present invention. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention without any creative work belong to the protection scope of the present invention.
Example one
Referring to fig. 1 and fig. 2, an embodiment of the present invention provides a method for inquiring a building construction contract rule, including the following steps:
step S1, collecting construction contract rules and regulations for each item through web search method, collecting construction contracts for each item in connection with each construction site, and electronizing these data to establish construction contract database in construction contract field.
And step S2, performing text word segmentation and word removal processing on the building construction contract rules based on the natural language processing technology, and calculating characteristic words through a word frequency inverse text algorithm (TF-IDF).
Specifically, the step S2 includes the following steps:
and step S21, performing word segmentation processing on the construction contract rules through jieba, wherein the jieba uses a prefix tree (also called a dictionary tree) to classify words for improving the searching efficiency.
Assuming that a computer searches for the word "supplier", generally speaking, the computer will scan all text Chinese character strings, which is inefficient, but the prefix tree can be searched from top to bottom, each time a Chinese character is determined, if the next node of a certain node does not meet the search requirement, the search will be stopped, and the method can greatly improve the efficiency. In addition, the prefix tree can be combined with a directed acyclic graph, so that the problem of dual understanding words is solved efficiently.
And step S22, removing the null words existing in the text of the construction contract rules by self-building a stop word library, wherein the stop words are extremely common words and have little value for helping to calculate the similarity of the text, and deleting the meaningless words can greatly reduce the size of the library and improve the retrieval efficiency.
Because the current NLP technology still has some limitations, some meaningless word symbols, such as symbol underlines, are generated after word segmentation, deleting the meaningless word symbol with the largest occurrence number can effectively reduce the data amount, and the operation of removing the stop word can be realized by importing the stop word list and then removing the words in the word list.
And step S23, selecting a word frequency inverse text algorithm (TF-IDF) to extract the characteristic words through algorithm comparison and selection, calculating the weight of the characteristic words, and extracting the characteristic words in the construction contract rules.
For example, in a single accident report, the number of occurrences (word frequency) of the three words "unit", "delay" and "fine" is the same, but their importance is different. "delay" and "fine" are more representative of the text than "units", that is, "delay" and "fine" need to be ranked before "units" when ranking keywords.
One way to solve this problem is to use TF-IDF (word frequency-inverse text frequency), i.e. a weight is calculated based on the word frequency according to the importance of the word, this weight is called "inverse text frequency", the size being inversely proportional to the degree of prevalence of the word.
Different weights are given to different words; less common words (e.g., "postponement", "fine") are given greater weight, more common words (e.g., "unit") are given lesser weight, and most common words (e.g., "yes") are given minimal weight.
And multiplying the word frequency (TF) and the inverse text frequency (IDF) to obtain a TF-IDF value of the word. The higher the importance of a word to a text, the larger its TF-IDF value. Therefore, the feature value extraction of the text can be completed according to the large-to-small ordering of the TF-IDF values.
The calculation method of the word frequency, the inverse text frequency and the word frequency-inverse text frequency is as follows:
word frequency (TF): the number of times a feature value appears in the text, i.e. if ti, k appears ni, k times in the text di
TFi,k=ni,k
In practical applications, to avoid statistical deviations due to too long text, a normalization process, Σ, is generally requiredmnm,kI.e. the total number of words of the text:
Figure RE-GDA0003060750020000081
inverse text frequency (IDF): the frequency of the feature items appearing in the total text set D is that if the total text set has M texts and the feature items ti, k appear in mi, k texts
Figure RE-GDA0003060750020000082
Wherein alpha is an empirical constant, and is generally 0.01; the more common the denominator of the word is, the smaller the inverse text frequency is; the reason for the denominator plus a is to avoid being 0, i.e. all text does not contain the word;
word frequency-inverse text frequency (IF-IDF): the IF-IDF calculation method is that the word frequency is multiplied by the inverse text frequency
wi,k=TFi,k*IDFi,k
The word frequency-inverse text frequency is inversely proportional to the occurrence number of a word in the whole total text library and is directly proportional to the occurrence number of the word in a specific text, so that the word frequency-inverse text frequency of the word is calculated, and the characteristic value is extracted by descending order.
Step S3, performing synonym expansion query of the feature words through a self-built common term lexicon and a Continuous Bag-of-Word Model (CBOW) of the construction contract rules.
The CBOW model is a three-layer neural network model;
the first layer of the CBOW model is an input layer, and word vectors with known contexts are input;
the middle layer of the CBOW model is called a linear hidden layer and accumulates all input word vectors;
the third layer of the CBOW model is a two-classifier softmax, and corresponding word near-meaning word expansion is obtained through training. Such as "postponement", "postponement" and "postponement" are words of similar meaning to each other.
Specifically, the step S3 includes the following steps:
step S31, giving a training text, namely a building construction contract rule base and Chinese Wikipedia, using one-hot codes as input of a CBOW model, setting the dimension of a self-setting word vector as 100, setting a window as 5, setting the minimum occurrence frequency as 5, setting the number of threads used by the training word vector as 9, embedding words through the CBOW model, accumulating the input word vectors, and finally finishing vectorization representation of the words through a two-classifier.
And step S32, reading the feature words extracted in the step S2, obtaining word vectors of the feature words by using the trained word vectors, calculating the first 5 words most similar to the feature words by using cosine distance, and performing synonym expansion. Such as "postponement", "postponement" and "postponement" are words of similar meaning to each other.
And step S4, calculating the similarity of contract rules based on the vector space model and the cosine function improvement method to obtain corresponding laws and regulations in the construction contract conditions.
Specifically, after the feature words and the synonyms are obtained, the similarity between the building construction contract rules is calculated by utilizing a vector space model and improving a cosine function, the cosine coefficient algorithm result is accurate and is the most common calculation method in VSM, and therefore the cosine coefficient method is used for calculating the similarity. Similarity between an input case and a text is calculated by using a similarity model in a third-party tool genesis of Python, and sequencing is performed according to the similarity value from large to small texts, and finally, a corresponding law bar in a construction contract condition is obtained as an output result;
Figure RE-GDA0003060750020000101
where Sim (t _1, t _0) is the original query and Sim (t _1, t _ k) is the extended query, so that values between 0< λ <1 are taken, and after multiple verifications, λ is set to 0.7.
Step S5, the whole database and query system is integrated into a local server or an intelligent device.
For example, the construction contract law clauses can be inquired by using a mobile phone or a tablet, and because the database and the inquiry system are local, the inquiry can be carried out regardless of whether a network exists, so that the required contract law clauses can be inquired in real time even in projects in remote mountain areas.
More specifically, a mobile phone or a tablet is used for inquiring, and a user inputs a corresponding construction contract to obtain a corresponding law in the construction contract conditions; the invention can directly output the corresponding law in the construction contract condition after the project construction contract is input, thereby effectively avoiding the claim and dispute of contract risk, improving the management level of the construction contract and avoiding the benefit damage of all parties.
Example two
The second embodiment of the present invention provides a building construction contract rule inquiry device, including:
the construction contract rule acquisition and processing module is used for acquiring and collecting the construction contract rules and rules, electronizing the acquired and collected construction contract rules and establishing a construction contract rule library; performing text word segmentation and word removal processing on building construction contract rules based on a natural language processing technology, and calculating feature words through a word frequency inverse text algorithm;
the synonym expansion query module is connected with the construction contract rule acquisition and processing module and is used for carrying out synonym expansion query on the characteristic words through a self-built common term word bank and a continuous word bag model of the construction contract rule;
the contract rule retrieval module is connected with the synonym expansion query module, and is used for calculating the similarity of the contract rules based on a vector space model and a cosine function improvement method to obtain corresponding laws in the construction contract conditions;
the construction contract rule obtaining and processing module comprises a crawler algorithm and word segmentation stop words; the synonym expansion query module comprises a text vectorization and continuous bag-of-words model; the contract regulation retrieval module comprises a vector space model and cosine similarity calculation.
The system also comprises a local server or intelligent equipment, wherein the whole database and the query system are stored in the local server or the intelligent equipment.
For example, the construction contract law clauses can be inquired by using a mobile phone or a tablet, and because the database and the inquiry system are local, the inquiry can be carried out regardless of whether a network exists, so that the required contract law clauses can be inquired in real time even in projects in remote mountain areas.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (9)

1. A method for inquiring a building construction contract rule is characterized by comprising the following steps:
step S1, collecting the construction contract laws and regulations, electronizing the construction contract laws and regulations, and establishing a construction contract laws and regulations library;
step S2, performing text word segmentation and word removal processing on the building construction contract rules based on the natural language processing technology, and calculating characteristic words through a word frequency inverse text algorithm;
step S3, performing synonym expansion query of the feature words through a self-built common term word bank and a continuous word bag model of the construction contract rules;
step S4, calculating the similarity of contract rules based on a vector space model and a cosine function improvement method to obtain corresponding laws and regulations in construction contract conditions;
step S5, the whole database and query system is integrated into a local server or an intelligent device.
2. The inquiry method of construction contract laws and regulations according to claim 1, wherein said step S2 includes the steps of:
step S21, performing word segmentation processing on the construction contract rules through jieba, wherein the jieba word segmentation uses a prefix tree to classify words for improving the searching efficiency;
step S22, removing the null words existing in the text of the construction contract rules by self-building a stop word library, wherein the stop words are extremely common words and have little value for helping to calculate the similarity of the text, and the size of the library can be greatly reduced and the retrieval efficiency can be improved by deleting the meaningless words;
and step S23, selecting a word frequency inverse text algorithm to extract the characteristic words through algorithm comparison and selection, calculating the weight of the characteristic words, and extracting the characteristic words in the construction contract rules.
3. The inquiry method of construction contract laws and regulations according to claim 2, wherein said step S23 includes the steps of:
step S231, calculating a weight according to the importance of the words on the basis of the word frequency, wherein the weight is called 'inverse text frequency', and the size of the weight is inversely proportional to the common degree of the words;
step S232, different weights are given to different words; giving a larger weight to less common words, giving a smaller weight to more common words, giving a minimum weight to most common words, and multiplying the word frequency and the inverse text frequency to obtain a TF-IDF value of the words;
and step S233, the higher the importance of the word to the text, the larger the TF-IDF value of the word is, and the feature value extraction of the text can be completed according to the descending order of the TF-IDF value.
4. The method for inquiring the construction contract rules and regulations as claimed in claim 3, wherein the word frequency, the inverse text frequency and the word frequency-inverse text frequency are calculated as follows:
word frequency TF: the number of times a feature value appears in the text, i.e. if ti, k appears ni, k times in the text di
TFi,k=ni,k
In practical applications, to avoid statistical deviations due to too long text, a normalization process is generally required, Σmnm,kI.e. the total number of words of the text:
Figure FDA0002991280460000021
inverse text frequency IDF: the frequency of the feature items appearing in the total text set D is that if the total text set has M texts and the feature items ti, k appear in mi, k texts
Figure FDA0002991280460000022
Wherein alpha is an empirical constant, and is generally 0.01; the more common the denominator of the word is, the smaller the inverse text frequency is; the reason for the denominator plus a is to avoid being 0, i.e. all text does not contain the word;
word frequency-inverse text frequency IF-IDF: the IF-IDF calculation method is that the word frequency is multiplied by the inverse text frequency
wi,k=TFi,k*IDFi,k
The word frequency-inverse text frequency is inversely proportional to the occurrence number of a word in the whole total text library and is directly proportional to the occurrence number of the word in a specific text, so that the word frequency-inverse text frequency of the word is calculated, and the characteristic value is extracted by descending order.
5. The inquiry method of construction contract laws and regulations according to claim 1, wherein said step S3 includes the steps of:
step S31, giving a training text, namely a building construction contract rule base and a Chinese Wikipedia, using one-hot codes as input of a CBOW model, setting the dimension of a self-setting word vector as 100, setting a window as 5, setting the minimum occurrence frequency as 5, setting the number of threads used by the training word vector as 9, embedding words through the CBOW model, accumulating the input word vectors, and finally finishing vectorization representation of the words through a two-classifier;
and step S32, reading the feature words extracted in the step S2, obtaining word vectors of the feature words by using the trained word vectors, calculating the first 5 words most similar to the feature words by using cosine distance, and performing synonym expansion.
6. The method as claimed in claim 5, wherein the CBOW model is a three-layer neural network model;
the first layer of the CBOW model is an input layer, and word vectors with known contexts are input;
the middle layer of the CBOW model is called a linear hidden layer and accumulates all input word vectors;
the third layer of the CBOW model is a two-classifier softmax, and corresponding word near-meaning word expansion is obtained through training.
7. The inquiry method of construction contract laws and regulations according to claim 1, wherein said step S4 comprises: after the characteristic words and the synonyms are obtained, a vector space model is utilized, a cosine function is improved, the similarity between building construction contract laws and regulations is calculated, the cosine coefficient algorithm result is accurate and is the most common calculation method in VSM, the similarity between an input case and a text is calculated by using a similarity model in a third party tool genim of Python, the text is sequenced from large to small according to the similarity value, and finally, a corresponding law bar in a construction contract condition is obtained as an output result;
Figure FDA0002991280460000041
where Sim (t _1, t _0) is the original query and Sim (t _1, t _ k) is the extended query, so that values between 0< λ <1 are taken, and after multiple verifications, λ is set to 0.7.
8. A building construction contract regulation inquiry unit is characterized by comprising:
the construction contract rule acquisition and processing module is used for acquiring and collecting the construction contract rules and rules, electronizing the acquired and collected construction contract rules and establishing a construction contract rule library; performing text word segmentation and word removal processing on building construction contract rules based on a natural language processing technology, and calculating feature words through a word frequency inverse text algorithm;
the synonym expansion query module is connected with the construction contract rule acquisition and processing module and is used for carrying out synonym expansion query on the characteristic words through a self-built common term word bank and a continuous word bag model of the construction contract rule;
the contract rule retrieval module is connected with the synonym expansion query module, and is used for calculating the similarity of the contract rules based on a vector space model and a cosine function improvement method to obtain corresponding laws in the construction contract conditions;
the construction contract rule obtaining and processing module comprises a crawler algorithm and word segmentation stop words; the synonym expansion query module comprises a text vectorization and continuous bag-of-words model; the contract regulation retrieval module comprises a vector space model and cosine similarity calculation.
9. The inquiry unit of building construction contract laws and regulations as claimed in claim 8, further comprising a local server or an intelligent device, wherein the whole database and the inquiry system are stored in the local server or the intelligent device.
CN202110315094.1A 2021-03-24 2021-03-24 Building construction contract rule query method and device Pending CN113033197A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110315094.1A CN113033197A (en) 2021-03-24 2021-03-24 Building construction contract rule query method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110315094.1A CN113033197A (en) 2021-03-24 2021-03-24 Building construction contract rule query method and device

Publications (1)

Publication Number Publication Date
CN113033197A true CN113033197A (en) 2021-06-25

Family

ID=76473927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110315094.1A Pending CN113033197A (en) 2021-03-24 2021-03-24 Building construction contract rule query method and device

Country Status (1)

Country Link
CN (1) CN113033197A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101655857A (en) * 2009-09-18 2010-02-24 西安建筑科技大学 Method for mining data in construction regulation field based on associative regulation mining technology
CN106156272A (en) * 2016-06-21 2016-11-23 北京工业大学 A kind of information retrieval method based on multi-source semantic analysis
CN106844350A (en) * 2017-02-15 2017-06-13 广州索答信息科技有限公司 A kind of computational methods of short text semantic similarity
CN108491462A (en) * 2018-03-05 2018-09-04 昆明理工大学 A kind of semantic query expansion method and device based on word2vec
CN108804421A (en) * 2018-05-28 2018-11-13 中国科学技术信息研究所 Text similarity analysis method, device, electronic equipment and computer storage media
CN110275936A (en) * 2019-05-09 2019-09-24 浙江工业大学 A kind of similar law case retrieving method based on from coding neural network
CN110597949A (en) * 2019-08-01 2019-12-20 湖北工业大学 Court similar case recommendation model based on word vectors and word frequency

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101655857A (en) * 2009-09-18 2010-02-24 西安建筑科技大学 Method for mining data in construction regulation field based on associative regulation mining technology
CN106156272A (en) * 2016-06-21 2016-11-23 北京工业大学 A kind of information retrieval method based on multi-source semantic analysis
CN106844350A (en) * 2017-02-15 2017-06-13 广州索答信息科技有限公司 A kind of computational methods of short text semantic similarity
CN108491462A (en) * 2018-03-05 2018-09-04 昆明理工大学 A kind of semantic query expansion method and device based on word2vec
CN108804421A (en) * 2018-05-28 2018-11-13 中国科学技术信息研究所 Text similarity analysis method, device, electronic equipment and computer storage media
CN110275936A (en) * 2019-05-09 2019-09-24 浙江工业大学 A kind of similar law case retrieving method based on from coding neural network
CN110597949A (en) * 2019-08-01 2019-12-20 湖北工业大学 Court similar case recommendation model based on word vectors and word frequency

Similar Documents

Publication Publication Date Title
CN110222160B (en) Intelligent semantic document recommendation method and device and computer readable storage medium
CN103605665B (en) Keyword based evaluation expert intelligent search and recommendation method
CN108038096A (en) Knowledge database documents method for quickly retrieving, application server computer readable storage medium storing program for executing
WO2020237856A1 (en) Smart question and answer method and apparatus based on knowledge graph, and computer storage medium
CN104899322A (en) Search engine and implementation method thereof
CN106708929B (en) Video program searching method and device
CN104392006B (en) A kind of event query processing method and processing device
CN101097570A (en) Advertisement classification method capable of automatic recognizing classified advertisement type
CN103577416A (en) Query expansion method and system
CN113660541B (en) Method and device for generating abstract of news video
CN106844482B (en) Search engine-based retrieval information matching method and device
CN102789452A (en) Similar content extraction method
CN113190702A (en) Method and apparatus for generating information
CN112328805A (en) Entity mapping method of vulnerability description information and database table based on NLP
CN106570196B (en) Video program searching method and device
CN114328800A (en) Text processing method and device, electronic equipment and computer readable storage medium
Senthilkumar et al. A Survey On Feature Selection Method For Product Review
CN112949304A (en) Construction case knowledge reuse query method and device
CN113704494B (en) Entity retrieval method, device, equipment and storage medium based on knowledge graph
CN113033197A (en) Building construction contract rule query method and device
CN106777191B (en) Search engine-based retrieval mode generation method and device
CN116662633A (en) Search method, model training method, device, electronic equipment and storage medium
CN112926297B (en) Method, apparatus, device and storage medium for processing information
CN114691835A (en) Audit plan data generation method, device and equipment based on text mining
CN108345605B (en) Text search method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination