CN113033197A - Building construction contract rule query method and device - Google Patents
Building construction contract rule query method and device Download PDFInfo
- Publication number
- CN113033197A CN113033197A CN202110315094.1A CN202110315094A CN113033197A CN 113033197 A CN113033197 A CN 113033197A CN 202110315094 A CN202110315094 A CN 202110315094A CN 113033197 A CN113033197 A CN 113033197A
- Authority
- CN
- China
- Prior art keywords
- word
- text
- construction contract
- words
- frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000009435 building construction Methods 0.000 title claims abstract description 33
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000010276 construction Methods 0.000 claims abstract description 84
- 239000013598 vector Substances 0.000 claims abstract description 34
- 238000012545 processing Methods 0.000 claims abstract description 19
- 230000011218 segmentation Effects 0.000 claims abstract description 16
- 238000004364 calculation method Methods 0.000 claims abstract description 11
- 238000005516 engineering process Methods 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims abstract description 10
- 238000003058 natural language processing Methods 0.000 claims abstract description 10
- 230000006872 improvement Effects 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 230000008901 benefit Effects 0.000 abstract description 6
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Tourism & Hospitality (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Probability & Statistics with Applications (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Technology Law (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method and a device for inquiring building construction contract laws and regulations, wherein the method comprises the following steps: collecting a building construction contract regulation, electronizing the building construction contract regulation, and establishing a building construction contract regulation library; performing text word segmentation and word removal processing on building construction contract rules based on a natural language processing technology, and calculating feature words through a word frequency inverse text algorithm; performing synonym expansion query of the feature words through a self-built common term lexicon and a continuous word bag model of the building construction contract rules; similarity calculation of contract rules is carried out based on a vector space model and a cosine function improvement method, and corresponding laws and regulations in construction contract conditions are obtained; the whole database and the query system are integrated into a local server or intelligent equipment, so that the level of construction contract management can be greatly improved, the benefit of each party is prevented from being damaged, the query time is saved, and the query efficiency is improved.
Description
Technical Field
The invention relates to the technical field of building construction contract regulation management, in particular to a building construction contract regulation query method and a device thereof based on a natural language processing technology.
Background
The construction of the engineering project is a comprehensive production activity of multiple categories, the construction period of the engineering project is long, and a plurality of uncertain factors exist in the construction process. Typically, contractor participants specify their roles and responsibilities by contracting to prevent such claims and disputes from occurring. However, in the process, favorable terms that may be beneficial to the contractor are often revised or even deleted, thereby presenting a significant potential risk to the contractor. Therefore, there is a need for efficient querying of relevant construction contract regulatory terms to avoid future risks. If an automatic inquiry system of the building construction contract laws and regulations is established, the corresponding construction contract laws and regulations are searched by inputting keywords, the level of construction contract management can be greatly improved, and the benefit damage of each party is avoided.
Currently, regarding the construction contract management, contractors and parties can write in a direction beneficial to themselves and present the written text in an unstructured manner, and it takes a lot of time and is inefficient for the parties to check the corresponding laws and regulations, so that there is a great gap in the contract management field in the inquiry of the construction contract regulations.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a building construction contract rule query method and a device thereof, which can greatly improve the level of construction contract management, avoid the loss of benefits of all parties, save query time and improve query efficiency.
In order to achieve the purpose, the invention provides a method for inquiring a building construction contract rule, which comprises the following steps:
step S1, collecting the construction contract laws and regulations, electronizing the construction contract laws and regulations, and establishing a construction contract laws and regulations library;
step S2, performing text word segmentation and word removal processing on the building construction contract rules based on the natural language processing technology, and calculating characteristic words through a word frequency inverse text algorithm;
step S3, performing synonym expansion query of the feature words through a self-built common term word bank and a continuous word bag model of the construction contract rules;
step S4, calculating the similarity of contract rules based on a vector space model and a cosine function improvement method to obtain corresponding laws and regulations in construction contract conditions;
step S5, the whole database and query system is integrated into a local server or an intelligent device.
Preferably, the step S2 includes the following steps:
step S21, performing word segmentation processing on the construction contract rules through jieba, wherein the jieba word segmentation uses a prefix tree to classify words for improving the searching efficiency;
step S22, removing the null words existing in the text of the construction contract rules by self-building a stop word library, wherein the stop words are extremely common words and have little value for helping to calculate the similarity of the text, and the size of the library can be greatly reduced and the retrieval efficiency can be improved by deleting the meaningless words;
and step S23, selecting a word frequency inverse text algorithm to extract the characteristic words through algorithm comparison and selection, calculating the weight of the characteristic words, and extracting the characteristic words in the construction contract rules.
Preferably, the step S23 includes the following steps:
step S231, calculating a weight according to the importance of the words on the basis of the word frequency, wherein the weight is called 'inverse text frequency', and the size of the weight is inversely proportional to the common degree of the words;
step S232, different weights are given to different words; giving a larger weight to less common words, giving a smaller weight to more common words, giving a minimum weight to most common words, and multiplying the word frequency and the inverse text frequency to obtain a TF-IDF value of the words;
and step S233, the higher the importance of the word to the text, the larger the TF-IDF value of the word is, and the feature value extraction of the text can be completed according to the descending order of the TF-IDF value.
Preferably, the word frequency, the inverse text frequency, and the word frequency-inverse text frequency are calculated as follows:
word frequency TF: the number of times a feature value appears in the text, i.e. if ti, k appears ni, k times in the text di
TFi,k=ni,k
In practical applications, to avoid statistical deviations due to too long text, a normalization process, Σ, is generally requiredmnm,kI.e. the total number of words of the text:
inverse text frequency IDF: the frequency of the feature items appearing in the total text set D is that if the total text set has M texts and the feature items ti, k appear in mi, k texts
Wherein alpha is an empirical constant, and is generally 0.01; the more common the denominator of the word is, the smaller the inverse text frequency is; the reason for the denominator plus a is to avoid being 0, i.e. all text does not contain the word;
word frequency-inverse text frequency IF-IDF: the IF-IDF calculation method is that the word frequency is multiplied by the inverse text frequency
wi,k=TFi,k*IDFi,k
The word frequency-inverse text frequency is inversely proportional to the occurrence number of a word in the whole total text library and is directly proportional to the occurrence number of the word in a specific text, so that the word frequency-inverse text frequency of the word is calculated, and the characteristic value is extracted by descending order.
Preferably, the step S3 includes the following steps:
step S31, giving a training text, namely a building construction contract rule base and a Chinese Wikipedia, using one-hot codes as input of a CBOW model, setting the dimension of a self-setting word vector as 100, setting a window as 5, setting the minimum occurrence frequency as 5, setting the number of threads used by the training word vector as 9, embedding words through the CBOW model, accumulating the input word vectors, and finally finishing vectorization representation of the words through a two-classifier;
and step S32, reading the feature words extracted in the step S2, obtaining word vectors of the feature words by using the trained word vectors, calculating the first 5 words most similar to the feature words by using cosine distance, and performing synonym expansion.
Preferably, the CBOW model is a three-layer neural network model;
the first layer of the CBOW model is an input layer, and word vectors with known contexts are input;
the middle layer of the CBOW model is called a linear hidden layer and accumulates all input word vectors;
the third layer of the CBOW model is a two-classifier softmax, and corresponding word near-meaning word expansion is obtained through training.
Preferably, the step S4 includes: after the characteristic words and the synonyms are obtained, a vector space model is utilized, a cosine function is improved, the similarity between building construction contract laws and regulations is calculated, the cosine coefficient algorithm result is accurate and is the most common calculation method in VSM, the similarity between an input case and a text is calculated by using a similarity model in a third party tool genim of Python, the text is sequenced from large to small according to the similarity value, and finally, a corresponding law bar in a construction contract condition is obtained as an output result;
where Sim (t _1, t _0) is the original query and Sim (t _1, t _ k) is the extended query, so that values between 0< λ <1 are taken, and after multiple verifications, λ is set to 0.7.
The invention also provides a building construction contract rule inquiry device, which is characterized by comprising the following components:
the construction contract rule acquisition and processing module is used for acquiring and collecting the construction contract rules and rules, electronizing the acquired and collected construction contract rules and establishing a construction contract rule library; performing text word segmentation and word removal processing on building construction contract rules based on a natural language processing technology, and calculating feature words through a word frequency inverse text algorithm;
the synonym expansion query module is connected with the construction contract rule acquisition and processing module and is used for carrying out synonym expansion query on the characteristic words through a self-built common term word bank and a continuous word bag model of the construction contract rule;
the contract rule retrieval module is connected with the synonym expansion query module, and is used for calculating the similarity of the contract rules based on a vector space model and a cosine function improvement method to obtain corresponding laws in the construction contract conditions;
the construction contract rule obtaining and processing module comprises a crawler algorithm and word segmentation stop words; the synonym expansion query module comprises a text vectorization and continuous bag-of-words model; the contract regulation retrieval module comprises a vector space model and cosine similarity calculation.
Preferably, the system further comprises a local server or an intelligent device, and the whole database and the query system are stored in the local server or the intelligent device.
Compared with the prior art, the invention has the beneficial effects that:
1. the query method and the query device provided by the invention can query the corresponding construction contract rules in the project contract at any time, are realized based on the natural language processing technology, can avoid the risks unfavorable to contractors and parties in the construction contract, avoid the damage of benefits of the parties, reduce the query time of laws and regulations, improve the query efficiency, and have better significance for improving the contract management level of the whole construction contract and the regulations query.
2. The invention can use mobile phone or flat board to inquire, user can input corresponding construction contract to obtain corresponding law in 'construction contract condition'; the invention can directly output the corresponding law in the construction contract condition after the project construction contract is input, thereby effectively avoiding the claim and dispute of contract risk, improving the management level of the construction contract and avoiding the benefit damage of all parties.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating steps of a method for inquiring a building construction contract rule provided by the present invention;
fig. 2 is a schematic diagram illustrating an example analysis of a method for querying a building construction contract rule provided by the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are one embodiment of the present invention, and not all embodiments of the present invention. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention without any creative work belong to the protection scope of the present invention.
Example one
Referring to fig. 1 and fig. 2, an embodiment of the present invention provides a method for inquiring a building construction contract rule, including the following steps:
step S1, collecting construction contract rules and regulations for each item through web search method, collecting construction contracts for each item in connection with each construction site, and electronizing these data to establish construction contract database in construction contract field.
And step S2, performing text word segmentation and word removal processing on the building construction contract rules based on the natural language processing technology, and calculating characteristic words through a word frequency inverse text algorithm (TF-IDF).
Specifically, the step S2 includes the following steps:
and step S21, performing word segmentation processing on the construction contract rules through jieba, wherein the jieba uses a prefix tree (also called a dictionary tree) to classify words for improving the searching efficiency.
Assuming that a computer searches for the word "supplier", generally speaking, the computer will scan all text Chinese character strings, which is inefficient, but the prefix tree can be searched from top to bottom, each time a Chinese character is determined, if the next node of a certain node does not meet the search requirement, the search will be stopped, and the method can greatly improve the efficiency. In addition, the prefix tree can be combined with a directed acyclic graph, so that the problem of dual understanding words is solved efficiently.
And step S22, removing the null words existing in the text of the construction contract rules by self-building a stop word library, wherein the stop words are extremely common words and have little value for helping to calculate the similarity of the text, and deleting the meaningless words can greatly reduce the size of the library and improve the retrieval efficiency.
Because the current NLP technology still has some limitations, some meaningless word symbols, such as symbol underlines, are generated after word segmentation, deleting the meaningless word symbol with the largest occurrence number can effectively reduce the data amount, and the operation of removing the stop word can be realized by importing the stop word list and then removing the words in the word list.
And step S23, selecting a word frequency inverse text algorithm (TF-IDF) to extract the characteristic words through algorithm comparison and selection, calculating the weight of the characteristic words, and extracting the characteristic words in the construction contract rules.
For example, in a single accident report, the number of occurrences (word frequency) of the three words "unit", "delay" and "fine" is the same, but their importance is different. "delay" and "fine" are more representative of the text than "units", that is, "delay" and "fine" need to be ranked before "units" when ranking keywords.
One way to solve this problem is to use TF-IDF (word frequency-inverse text frequency), i.e. a weight is calculated based on the word frequency according to the importance of the word, this weight is called "inverse text frequency", the size being inversely proportional to the degree of prevalence of the word.
Different weights are given to different words; less common words (e.g., "postponement", "fine") are given greater weight, more common words (e.g., "unit") are given lesser weight, and most common words (e.g., "yes") are given minimal weight.
And multiplying the word frequency (TF) and the inverse text frequency (IDF) to obtain a TF-IDF value of the word. The higher the importance of a word to a text, the larger its TF-IDF value. Therefore, the feature value extraction of the text can be completed according to the large-to-small ordering of the TF-IDF values.
The calculation method of the word frequency, the inverse text frequency and the word frequency-inverse text frequency is as follows:
word frequency (TF): the number of times a feature value appears in the text, i.e. if ti, k appears ni, k times in the text di
TFi,k=ni,k
In practical applications, to avoid statistical deviations due to too long text, a normalization process, Σ, is generally requiredmnm,kI.e. the total number of words of the text:
inverse text frequency (IDF): the frequency of the feature items appearing in the total text set D is that if the total text set has M texts and the feature items ti, k appear in mi, k texts
Wherein alpha is an empirical constant, and is generally 0.01; the more common the denominator of the word is, the smaller the inverse text frequency is; the reason for the denominator plus a is to avoid being 0, i.e. all text does not contain the word;
word frequency-inverse text frequency (IF-IDF): the IF-IDF calculation method is that the word frequency is multiplied by the inverse text frequency
wi,k=TFi,k*IDFi,k
The word frequency-inverse text frequency is inversely proportional to the occurrence number of a word in the whole total text library and is directly proportional to the occurrence number of the word in a specific text, so that the word frequency-inverse text frequency of the word is calculated, and the characteristic value is extracted by descending order.
Step S3, performing synonym expansion query of the feature words through a self-built common term lexicon and a Continuous Bag-of-Word Model (CBOW) of the construction contract rules.
The CBOW model is a three-layer neural network model;
the first layer of the CBOW model is an input layer, and word vectors with known contexts are input;
the middle layer of the CBOW model is called a linear hidden layer and accumulates all input word vectors;
the third layer of the CBOW model is a two-classifier softmax, and corresponding word near-meaning word expansion is obtained through training. Such as "postponement", "postponement" and "postponement" are words of similar meaning to each other.
Specifically, the step S3 includes the following steps:
step S31, giving a training text, namely a building construction contract rule base and Chinese Wikipedia, using one-hot codes as input of a CBOW model, setting the dimension of a self-setting word vector as 100, setting a window as 5, setting the minimum occurrence frequency as 5, setting the number of threads used by the training word vector as 9, embedding words through the CBOW model, accumulating the input word vectors, and finally finishing vectorization representation of the words through a two-classifier.
And step S32, reading the feature words extracted in the step S2, obtaining word vectors of the feature words by using the trained word vectors, calculating the first 5 words most similar to the feature words by using cosine distance, and performing synonym expansion. Such as "postponement", "postponement" and "postponement" are words of similar meaning to each other.
And step S4, calculating the similarity of contract rules based on the vector space model and the cosine function improvement method to obtain corresponding laws and regulations in the construction contract conditions.
Specifically, after the feature words and the synonyms are obtained, the similarity between the building construction contract rules is calculated by utilizing a vector space model and improving a cosine function, the cosine coefficient algorithm result is accurate and is the most common calculation method in VSM, and therefore the cosine coefficient method is used for calculating the similarity. Similarity between an input case and a text is calculated by using a similarity model in a third-party tool genesis of Python, and sequencing is performed according to the similarity value from large to small texts, and finally, a corresponding law bar in a construction contract condition is obtained as an output result;
where Sim (t _1, t _0) is the original query and Sim (t _1, t _ k) is the extended query, so that values between 0< λ <1 are taken, and after multiple verifications, λ is set to 0.7.
Step S5, the whole database and query system is integrated into a local server or an intelligent device.
For example, the construction contract law clauses can be inquired by using a mobile phone or a tablet, and because the database and the inquiry system are local, the inquiry can be carried out regardless of whether a network exists, so that the required contract law clauses can be inquired in real time even in projects in remote mountain areas.
More specifically, a mobile phone or a tablet is used for inquiring, and a user inputs a corresponding construction contract to obtain a corresponding law in the construction contract conditions; the invention can directly output the corresponding law in the construction contract condition after the project construction contract is input, thereby effectively avoiding the claim and dispute of contract risk, improving the management level of the construction contract and avoiding the benefit damage of all parties.
Example two
The second embodiment of the present invention provides a building construction contract rule inquiry device, including:
the construction contract rule acquisition and processing module is used for acquiring and collecting the construction contract rules and rules, electronizing the acquired and collected construction contract rules and establishing a construction contract rule library; performing text word segmentation and word removal processing on building construction contract rules based on a natural language processing technology, and calculating feature words through a word frequency inverse text algorithm;
the synonym expansion query module is connected with the construction contract rule acquisition and processing module and is used for carrying out synonym expansion query on the characteristic words through a self-built common term word bank and a continuous word bag model of the construction contract rule;
the contract rule retrieval module is connected with the synonym expansion query module, and is used for calculating the similarity of the contract rules based on a vector space model and a cosine function improvement method to obtain corresponding laws in the construction contract conditions;
the construction contract rule obtaining and processing module comprises a crawler algorithm and word segmentation stop words; the synonym expansion query module comprises a text vectorization and continuous bag-of-words model; the contract regulation retrieval module comprises a vector space model and cosine similarity calculation.
The system also comprises a local server or intelligent equipment, wherein the whole database and the query system are stored in the local server or the intelligent equipment.
For example, the construction contract law clauses can be inquired by using a mobile phone or a tablet, and because the database and the inquiry system are local, the inquiry can be carried out regardless of whether a network exists, so that the required contract law clauses can be inquired in real time even in projects in remote mountain areas.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (9)
1. A method for inquiring a building construction contract rule is characterized by comprising the following steps:
step S1, collecting the construction contract laws and regulations, electronizing the construction contract laws and regulations, and establishing a construction contract laws and regulations library;
step S2, performing text word segmentation and word removal processing on the building construction contract rules based on the natural language processing technology, and calculating characteristic words through a word frequency inverse text algorithm;
step S3, performing synonym expansion query of the feature words through a self-built common term word bank and a continuous word bag model of the construction contract rules;
step S4, calculating the similarity of contract rules based on a vector space model and a cosine function improvement method to obtain corresponding laws and regulations in construction contract conditions;
step S5, the whole database and query system is integrated into a local server or an intelligent device.
2. The inquiry method of construction contract laws and regulations according to claim 1, wherein said step S2 includes the steps of:
step S21, performing word segmentation processing on the construction contract rules through jieba, wherein the jieba word segmentation uses a prefix tree to classify words for improving the searching efficiency;
step S22, removing the null words existing in the text of the construction contract rules by self-building a stop word library, wherein the stop words are extremely common words and have little value for helping to calculate the similarity of the text, and the size of the library can be greatly reduced and the retrieval efficiency can be improved by deleting the meaningless words;
and step S23, selecting a word frequency inverse text algorithm to extract the characteristic words through algorithm comparison and selection, calculating the weight of the characteristic words, and extracting the characteristic words in the construction contract rules.
3. The inquiry method of construction contract laws and regulations according to claim 2, wherein said step S23 includes the steps of:
step S231, calculating a weight according to the importance of the words on the basis of the word frequency, wherein the weight is called 'inverse text frequency', and the size of the weight is inversely proportional to the common degree of the words;
step S232, different weights are given to different words; giving a larger weight to less common words, giving a smaller weight to more common words, giving a minimum weight to most common words, and multiplying the word frequency and the inverse text frequency to obtain a TF-IDF value of the words;
and step S233, the higher the importance of the word to the text, the larger the TF-IDF value of the word is, and the feature value extraction of the text can be completed according to the descending order of the TF-IDF value.
4. The method for inquiring the construction contract rules and regulations as claimed in claim 3, wherein the word frequency, the inverse text frequency and the word frequency-inverse text frequency are calculated as follows:
word frequency TF: the number of times a feature value appears in the text, i.e. if ti, k appears ni, k times in the text di
TFi,k=ni,k
In practical applications, to avoid statistical deviations due to too long text, a normalization process is generally required, Σmnm,kI.e. the total number of words of the text:
inverse text frequency IDF: the frequency of the feature items appearing in the total text set D is that if the total text set has M texts and the feature items ti, k appear in mi, k texts
Wherein alpha is an empirical constant, and is generally 0.01; the more common the denominator of the word is, the smaller the inverse text frequency is; the reason for the denominator plus a is to avoid being 0, i.e. all text does not contain the word;
word frequency-inverse text frequency IF-IDF: the IF-IDF calculation method is that the word frequency is multiplied by the inverse text frequency
wi,k=TFi,k*IDFi,k
The word frequency-inverse text frequency is inversely proportional to the occurrence number of a word in the whole total text library and is directly proportional to the occurrence number of the word in a specific text, so that the word frequency-inverse text frequency of the word is calculated, and the characteristic value is extracted by descending order.
5. The inquiry method of construction contract laws and regulations according to claim 1, wherein said step S3 includes the steps of:
step S31, giving a training text, namely a building construction contract rule base and a Chinese Wikipedia, using one-hot codes as input of a CBOW model, setting the dimension of a self-setting word vector as 100, setting a window as 5, setting the minimum occurrence frequency as 5, setting the number of threads used by the training word vector as 9, embedding words through the CBOW model, accumulating the input word vectors, and finally finishing vectorization representation of the words through a two-classifier;
and step S32, reading the feature words extracted in the step S2, obtaining word vectors of the feature words by using the trained word vectors, calculating the first 5 words most similar to the feature words by using cosine distance, and performing synonym expansion.
6. The method as claimed in claim 5, wherein the CBOW model is a three-layer neural network model;
the first layer of the CBOW model is an input layer, and word vectors with known contexts are input;
the middle layer of the CBOW model is called a linear hidden layer and accumulates all input word vectors;
the third layer of the CBOW model is a two-classifier softmax, and corresponding word near-meaning word expansion is obtained through training.
7. The inquiry method of construction contract laws and regulations according to claim 1, wherein said step S4 comprises: after the characteristic words and the synonyms are obtained, a vector space model is utilized, a cosine function is improved, the similarity between building construction contract laws and regulations is calculated, the cosine coefficient algorithm result is accurate and is the most common calculation method in VSM, the similarity between an input case and a text is calculated by using a similarity model in a third party tool genim of Python, the text is sequenced from large to small according to the similarity value, and finally, a corresponding law bar in a construction contract condition is obtained as an output result;
where Sim (t _1, t _0) is the original query and Sim (t _1, t _ k) is the extended query, so that values between 0< λ <1 are taken, and after multiple verifications, λ is set to 0.7.
8. A building construction contract regulation inquiry unit is characterized by comprising:
the construction contract rule acquisition and processing module is used for acquiring and collecting the construction contract rules and rules, electronizing the acquired and collected construction contract rules and establishing a construction contract rule library; performing text word segmentation and word removal processing on building construction contract rules based on a natural language processing technology, and calculating feature words through a word frequency inverse text algorithm;
the synonym expansion query module is connected with the construction contract rule acquisition and processing module and is used for carrying out synonym expansion query on the characteristic words through a self-built common term word bank and a continuous word bag model of the construction contract rule;
the contract rule retrieval module is connected with the synonym expansion query module, and is used for calculating the similarity of the contract rules based on a vector space model and a cosine function improvement method to obtain corresponding laws in the construction contract conditions;
the construction contract rule obtaining and processing module comprises a crawler algorithm and word segmentation stop words; the synonym expansion query module comprises a text vectorization and continuous bag-of-words model; the contract regulation retrieval module comprises a vector space model and cosine similarity calculation.
9. The inquiry unit of building construction contract laws and regulations as claimed in claim 8, further comprising a local server or an intelligent device, wherein the whole database and the inquiry system are stored in the local server or the intelligent device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110315094.1A CN113033197A (en) | 2021-03-24 | 2021-03-24 | Building construction contract rule query method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110315094.1A CN113033197A (en) | 2021-03-24 | 2021-03-24 | Building construction contract rule query method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113033197A true CN113033197A (en) | 2021-06-25 |
Family
ID=76473927
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110315094.1A Pending CN113033197A (en) | 2021-03-24 | 2021-03-24 | Building construction contract rule query method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113033197A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101655857A (en) * | 2009-09-18 | 2010-02-24 | 西安建筑科技大学 | Method for mining data in construction regulation field based on associative regulation mining technology |
CN106156272A (en) * | 2016-06-21 | 2016-11-23 | 北京工业大学 | A kind of information retrieval method based on multi-source semantic analysis |
CN106844350A (en) * | 2017-02-15 | 2017-06-13 | 广州索答信息科技有限公司 | A kind of computational methods of short text semantic similarity |
CN108491462A (en) * | 2018-03-05 | 2018-09-04 | 昆明理工大学 | A kind of semantic query expansion method and device based on word2vec |
CN108804421A (en) * | 2018-05-28 | 2018-11-13 | 中国科学技术信息研究所 | Text similarity analysis method, device, electronic equipment and computer storage media |
CN110275936A (en) * | 2019-05-09 | 2019-09-24 | 浙江工业大学 | A kind of similar law case retrieving method based on from coding neural network |
CN110597949A (en) * | 2019-08-01 | 2019-12-20 | 湖北工业大学 | Court similar case recommendation model based on word vectors and word frequency |
-
2021
- 2021-03-24 CN CN202110315094.1A patent/CN113033197A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101655857A (en) * | 2009-09-18 | 2010-02-24 | 西安建筑科技大学 | Method for mining data in construction regulation field based on associative regulation mining technology |
CN106156272A (en) * | 2016-06-21 | 2016-11-23 | 北京工业大学 | A kind of information retrieval method based on multi-source semantic analysis |
CN106844350A (en) * | 2017-02-15 | 2017-06-13 | 广州索答信息科技有限公司 | A kind of computational methods of short text semantic similarity |
CN108491462A (en) * | 2018-03-05 | 2018-09-04 | 昆明理工大学 | A kind of semantic query expansion method and device based on word2vec |
CN108804421A (en) * | 2018-05-28 | 2018-11-13 | 中国科学技术信息研究所 | Text similarity analysis method, device, electronic equipment and computer storage media |
CN110275936A (en) * | 2019-05-09 | 2019-09-24 | 浙江工业大学 | A kind of similar law case retrieving method based on from coding neural network |
CN110597949A (en) * | 2019-08-01 | 2019-12-20 | 湖北工业大学 | Court similar case recommendation model based on word vectors and word frequency |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110222160B (en) | Intelligent semantic document recommendation method and device and computer readable storage medium | |
CN103605665B (en) | Keyword based evaluation expert intelligent search and recommendation method | |
CN108038096A (en) | Knowledge database documents method for quickly retrieving, application server computer readable storage medium storing program for executing | |
WO2020237856A1 (en) | Smart question and answer method and apparatus based on knowledge graph, and computer storage medium | |
CN104899322A (en) | Search engine and implementation method thereof | |
CN106708929B (en) | Video program searching method and device | |
CN104392006B (en) | A kind of event query processing method and processing device | |
CN101097570A (en) | Advertisement classification method capable of automatic recognizing classified advertisement type | |
CN103577416A (en) | Query expansion method and system | |
CN113660541B (en) | Method and device for generating abstract of news video | |
CN106844482B (en) | Search engine-based retrieval information matching method and device | |
CN102789452A (en) | Similar content extraction method | |
CN113190702A (en) | Method and apparatus for generating information | |
CN112328805A (en) | Entity mapping method of vulnerability description information and database table based on NLP | |
CN106570196B (en) | Video program searching method and device | |
CN114328800A (en) | Text processing method and device, electronic equipment and computer readable storage medium | |
Senthilkumar et al. | A Survey On Feature Selection Method For Product Review | |
CN112949304A (en) | Construction case knowledge reuse query method and device | |
CN113704494B (en) | Entity retrieval method, device, equipment and storage medium based on knowledge graph | |
CN113033197A (en) | Building construction contract rule query method and device | |
CN106777191B (en) | Search engine-based retrieval mode generation method and device | |
CN116662633A (en) | Search method, model training method, device, electronic equipment and storage medium | |
CN112926297B (en) | Method, apparatus, device and storage medium for processing information | |
CN114691835A (en) | Audit plan data generation method, device and equipment based on text mining | |
CN108345605B (en) | Text search method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |