CN112417835A - Intelligent inspection method and system for purchase file based on natural language processing technology - Google Patents
Intelligent inspection method and system for purchase file based on natural language processing technology Download PDFInfo
- Publication number
- CN112417835A CN112417835A CN202011299881.3A CN202011299881A CN112417835A CN 112417835 A CN112417835 A CN 112417835A CN 202011299881 A CN202011299881 A CN 202011299881A CN 112417835 A CN112417835 A CN 112417835A
- Authority
- CN
- China
- Prior art keywords
- book
- similarity
- data
- technical specification
- core field
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000005516 engineering process Methods 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000003058 natural language processing Methods 0.000 title claims abstract description 27
- 238000007689 inspection Methods 0.000 title claims description 9
- 238000007781 pre-processing Methods 0.000 claims abstract description 24
- 238000012552 review Methods 0.000 claims abstract description 7
- 230000011218 segmentation Effects 0.000 claims description 17
- 238000010276 construction Methods 0.000 claims description 13
- 239000013598 vector Substances 0.000 claims description 13
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000011161 development Methods 0.000 claims description 10
- 230000018109 developmental process Effects 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 10
- 230000009467 reduction Effects 0.000 claims description 8
- 238000007711 solidification Methods 0.000 claims description 8
- 230000008023 solidification Effects 0.000 claims description 8
- 238000011156 evaluation Methods 0.000 claims description 7
- 230000006870 function Effects 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000000354 decomposition reaction Methods 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 4
- 238000002360 preparation method Methods 0.000 claims description 4
- 230000007704 transition Effects 0.000 claims description 4
- 125000004432 carbon atom Chemical group C* 0.000 claims description 3
- 238000000513 principal component analysis Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000003672 processing method Methods 0.000 claims description 2
- 230000003252 repetitive effect Effects 0.000 abstract 1
- 238000007726 management method Methods 0.000 description 5
- 238000012550 audit Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000013439 planning Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 239000011718 vitamin C Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
- G06F16/986—Document structures and storage, e.g. HTML extensions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to a method and a system for intelligently reviewing purchase files based on a natural language processing technology, and belongs to the technical field of intelligent text review of project purchase data. Firstly, solidifying an online template of a document by adopting a web technology and a frame for a technical specification book and an exploratable estimation book; exporting the core field data of the work item part of the solidified technical specification and the exploratable estimation book, and carrying out data preprocessing; and analyzing the processed core field data of the technical specification and the processed core field data of the exploitable estimation book by adopting a similarity algorithm to obtain an examination report. The invention reduces the repetitive and fussy work in manual examination, avoids the detail error caused by high-load manual examination, and is easy to popularize and apply.
Description
Technical Field
The invention belongs to the technical field of intelligent text review of project purchase data, and particularly relates to a purchase file intelligent review method and system based on a natural language processing technology.
Background
With the promotion of digital transformation of a power grid, an information center serves as a project construction main body, the number of informationized projects is increased year by year, 275 informationized projects reaching the center are expected to be issued by companies in 2020, and the total investment is nearly 3 hundred million. The templates and requirements related to the whole process of the information project are more, the planning construction department is used as a function management department for project construction and bid procurement, the examination of the project construction process templates and procurement files is realized in a manual processing mode, the efficiency is low, and errors are easy to occur. With the enhancement of audit consciousness and the improvement of lean project management, project managers need to carry out point-to-point examination on technical specifications and exploitable and estimable work items, ensure that the technical specifications are in an exploitable scope and have no defects, and avoid audit risks; meanwhile, key point examination needs to be conducted on the purchasing element list and the technical specification, and the completeness and reasonability of the purchasing file are ensured. However, due to the rapid increase of the number of projects and the high requirement of timeliness of the bidding work, project management is obligated to examine the project quantity by up to 59 sub-packages in two days, and the contradiction between the quality and the time of manual examination is increasingly prominent. Once the quality problem of examination occurs, the influence on project purchase and subsequent project construction is brought. Therefore, how to overcome the defects of the prior art is a problem which needs to be solved in the technical field of intelligent text review of the current procurement data.
Disclosure of Invention
The invention aims to solve the defects of the prior art and provides a purchase file intelligent examination method and a purchase file intelligent examination system based on a natural language processing technology.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
the intelligent inspection method of the purchase file based on the natural language processing technology comprises the following steps:
step (1), the solidification of the online template of the document is realized by adopting web technology and a frame for the technical specification book and the exploratable estimation book;
step (2), exporting the core field data of the solidified technical specification and the exploratable estimation book work item part, and carrying out data preprocessing;
the core field in the technical specification comprises project early-stage preparation, project development and project popularization and implementation; the core fields in the applicable evaluation book include construction cost and equipment purchase cost;
and (3) analyzing the core field data of the technical specification and the core field data of the searchable estimation book processed in the step (2) by adopting a similarity algorithm to obtain an examination report.
Further, it is preferable that both the technical specification and the exploratory estimate are standard document templates; and (3) solidifying the document by adopting a web technology and a control until the contents in the document can only be copied and identified and cannot be modified to serve as a standard for document comparison.
Further, preferably, the specific method for solidifying the document by using the web technology and the control is as follows: aiming at project file templates in a technical specification book and an estimable book, writing a corresponding form page by adopting an element component library; and exporting the data in the form into corresponding word and excel files by using an ActiveXObject control.
Further, preferably, the construction cost field includes project development, project implementation, integrated development, project test, and technical consultation; the device purchase fee includes a hardware device purchase and a system software purchase.
Further, in the step (2), preferably, the data preprocessing manner includes text word segmentation, regular matching, stop word processing, character string processing and data reduction.
Further, preferably, the text segmentation adopts a BilSTM + CRF segmentation method.
Further, preferably, after the text word segmentation is completed, the text character strings are cleaned in a regular matching mode, and the special symbols and stop words are filtered to obtain a dictionary database.
Further, it is preferable that the reduced data uses a principal component analysis algorithm, specifically as follows:
original data X ═ X, X2,x3,...,xnNeeds to be reduced to k dimension, x1To xnRepresenting the extracted word vector matrix;
1) de-centering, each eigenvector value minus the average of the respective eigenvector
3) Covariance matrix calculation by singular value decompositionThe eigenvalues and eigenvectors of (a);
4) sorting the eigenvalues from small to large, and selecting the largest kAnThen corresponding k to itAnThe eigenvectors are respectively used as row vectors to form an eigenvector matrix P;
5) converting data to kAnIn the new space constructed by the feature vector, i.e. Y is PX, and Y is reduced from n dimension to kVitamin C After thatAnd (6) obtaining the result.
Further, preferably, in the step (3), the similarity algorithm adopts a comprehensive similarity algorithm, that is, three different similarity algorithms are respectively adopted to calculate the similarity of the core field data, and then the comprehensive similarity is obtained by using a weighted average method for each similarity, where the specific point is
The principle mode is as follows:
edit distance similarity of characters:
adding operation:
d1=ED(Ai-1,Bj)+1
and (3) deleting operation:
d2=ED(Ai,Bj-1)+1
and (3) modifying operation:
taking the minimum one of the above 3 as the minimum edit distance to obtain a state transition equation:
in the above formula, d1,d2,d3Respectively representing the edit distance similarity of the adding, deleting and modifying operations; a and B represent two strings to be compared; ED is an edit distance function;representing a minimum edit distance; l isA,LBRespectively, when A or B is length, AiDenotes the i-th in AAnA character; b isjDenotes the j-th in BAnA character;
jaccard coefficient similarity:
in the above formula, the first and second carbon atoms are,the number of attributes of A and B which are 0 at the same time is represented;the number of attributes in which the attribute A is 0 and the attribute B is 1 is represented;the number of attributes in which the attribute A is 1 and the attribute B is 0 is represented;the number of attributes of which the attributes A and B are 1 at the same time is represented;
cosine similarity:
where cos α is the cosine distance between two strings, xiAnd yiA word vector of two characters;
and obtaining comprehensive similarity by adopting a weighted average mode for the three similarities:
in the formula, lambda and lambda are coefficients corresponding to three similarity distances;
with sentences as minimum detection units, obtaining core field data of the technical specification book and core field data similarity of the estimable book through comprehensive similarity;
and outputting an examination report of the core field.
The invention also provides a purchase file intelligent examination system based on the natural language processing technology, and the purchase file intelligent examination method based on the natural language processing technology comprises the following steps:
the data acquisition device is used for acquiring the technical specification and the exploitable estimation book;
the template curing module is connected with the data acquisition device and is used for curing the acquired technical specification book and the researched evaluation book until the contents can only be copied and identified and cannot be modified;
the core field data export module is connected with the template curing module and is used for exporting the core field data of the technical specification book and the exploitable estimate book work item part after curing;
the data preprocessing module is connected with the core field data everywhere module and is used for preprocessing the exported core field data;
the similarity analysis module is connected with the data preprocessing module and is used for analyzing the core field data of the technical specification book and the core field data of the searchable estimation book processed by the data preprocessing module by adopting a similarity algorithm;
and the report output module is used for outputting the unmatched items in a report form mode to obtain an examination report.
Further, it is preferable that the similarity of 90% or more is regarded as a match, and otherwise, it is regarded as a mismatch.
Compared with the prior art, the invention has the beneficial effects that:
the invention solidifies the technical specification and the estimators on line, and aims to solve the problem of document requirement of project responsible personnel in the whole process of an information project, reduce unnecessary communication cost caused by inconsistent document templates, reduce the reduction of project construction efficiency caused by repeated reworking of editing and reviewing work, and improve purchasing efficiency.
The first intelligent examination of the consistency of the technical specification and the searchable estimation book is realized through the research on the intelligent examination technology of the technical specification and the searchable estimation book, an examination report is formed by comparing examination results, and a suggestion whether manual review is needed or not is provided according to the examination results and the content needing review is prompted. The artificial intelligence means is applied to the information project management work, so that the workload of manual examination is greatly reduced, the examination efficiency of technical specifications is improved, the problem of low accuracy possibly caused by manual examination is solved, the audit risk is reduced, and the purchasing quality and the project management quality are improved.
Comparing the key parts of the technical specification and the exploitable estimation book by a natural language processing technology to realize the application of the natural language technology in the technical specification and the exploitable estimation book; the web technology is adopted to apply the web document solidification technology to the technical specification and the exploitable estimation book, so that the application of the web document solidification technology in the examination of the technical specification and the exploitable estimation book in the power industry is realized; and exporting the data in the form into corresponding word and excel files by using an ActiveXObject control.
Drawings
FIG. 1 is a flow chart of the BilSTM + CRF word segmentation method;
FIG. 2 is a schematic structural diagram of an intelligent examination system for procurement files based on natural language processing technology;
101, a data acquisition device; 102. a template curing module; 103. the core field data is distributed to the modules; 104. a data preprocessing module; 105. a similarity analysis module; 106. and a report output module.
Detailed Description
The present invention will be described in further detail with reference to examples.
It will be appreciated by those skilled in the art that the following examples are illustrative of the invention only and should not be taken as limiting the scope of the invention. The examples do not specify particular techniques or conditions, and are performed according to the techniques or conditions described in the literature in the art or according to the product specifications. The materials or equipment used are not indicated by manufacturers, and all are conventional products available by purchase.
Example 1
The intelligent inspection method of the purchase file based on the natural language processing technology is characterized by comprising the following steps:
step (1), the solidification of the online template of the document is realized by adopting web technology and a frame for the technical specification book and the exploratable estimation book;
step (2), exporting the core field data of the solidified technical specification and the exploratable estimation book work item part, and carrying out data preprocessing;
the core field in the technical specification comprises project early-stage preparation, project development and project popularization and implementation; the core fields in the applicable evaluation book include construction cost and equipment purchase cost;
and (3) analyzing the core field data of the technical specification and the core field data of the searchable estimation book processed in the step (2) by adopting a similarity algorithm to obtain an examination report.
Example 2
The intelligent inspection method of the purchase file based on the natural language processing technology is characterized by comprising the following steps:
step (1), the solidification of the online template of the document is realized by adopting web technology and a frame for the technical specification book and the exploratable estimation book;
step (2), exporting the core field data of the solidified technical specification and the exploratable estimation book work item part, and carrying out data preprocessing;
the core field in the technical specification comprises project early-stage preparation, project development and project popularization and implementation; the core fields in the applicable evaluation book include construction cost and equipment purchase cost;
and (3) analyzing the core field data of the technical specification and the core field data of the searchable estimation book processed in the step (2) by adopting a similarity algorithm to obtain an examination report.
The technical specification and the estimators are standard document templates; and (3) solidifying the document by adopting a web technology and a control until the contents in the document can only be copied and identified and cannot be modified to serve as a standard for document comparison.
The concrete method for solidifying the document by adopting the web technology and the control comprises the following steps: aiming at project file templates in a technical specification book and an estimable book, writing a corresponding form page by adopting an element component library; and exporting the data in the form into corresponding word and excel files by using an ActiveXObject control.
The construction cost field comprises project development, project implementation, integrated development, project test and technical consultation; the device purchase fee includes a hardware device purchase and a system software purchase.
In the step (2), the data preprocessing mode comprises text word segmentation, regular matching, stop word processing, character string processing and data reduction.
The text word segmentation adopts a BilSTM + CRF word segmentation method.
After the text word segmentation is finished, the text character strings are cleaned in a regular matching mode, and the special symbols and stop words are filtered to obtain a dictionary base.
The reduced data uses a principal component analysis algorithm as follows:
original data X ═ X, X2,x3,...,xnNeeds to be reduced to k dimension, x1To xnRepresenting the extracted word vector matrix;
2) de-centering, each eigenvector value minus the average of the respective eigenvector
3) Covariance matrix calculation by singular value decompositionThe eigenvalues and eigenvectors of (a);
4) sorting the eigenvalues from small to large, selecting the largest k eigenvectors, and then taking the corresponding k eigenvectors as row vectors respectively to form an eigenvector matrix P;
5) the data is converted into a new space constructed by k eigenvectors, i.e. Y ═ PX, Y is the result of the reduction from n dimension to k dimension.
In the step (3), the similarity algorithm adopts a comprehensive similarity algorithm, that is, three different similarity algorithms are respectively adopted to calculate the similarity of the core field data, and then the comprehensive similarity is obtained by utilizing a weighted average mode for each similarity, and the specific processing mode is as follows:
edit distance similarity of characters:
adding operation:
d1=ED(Ai-1,Bj)+1
and (3) deleting operation:
d2=ED(Ai,Bj-1)+1
and (3) modifying operation:
taking the minimum one of the above 3 as the minimum edit distance to obtain a state transition equation:
in the above formula, d1,d2,d3Respectively representing the edit distance similarity of the adding, deleting and modifying operations; a and B represent two strings to be compared; ED is an edit distance function;representing a minimum edit distance; l isA,LBRespectively, when A or B is length, AiDenotes the i-th in AAnA character; b isjDenotes the j-th in BAnA character;
Jaccardcoefficient similarity:
in the above formula, the first and second carbon atoms are,the number of attributes of A and B which are 0 at the same time is represented;the number of attributes in which the attribute A is 0 and the attribute B is 1 is represented;the number of attributes in which the attribute A is 1 and the attribute B is 0 is represented;indicates that the A and B attributes are the sameThe number of attributes whose time is 1;
cosine similarity:
where cos α is the cosine distance between two strings, xiAnd yiA word vector of two characters;
and obtaining comprehensive similarity by adopting a weighted average mode for the three similarities:
in the formula, lambda and lambda are coefficients corresponding to three similarity distances;
with sentences as minimum detection units, obtaining core field data of the technical specification book and core field data similarity of the estimable book through comprehensive similarity;
and outputting an examination report of the core field.
As shown in fig. 2, the system for intelligently reviewing a purchase file based on a natural language processing technology, which adopts the method for intelligently reviewing a purchase file based on a natural language processing technology, is characterized by comprising:
the data acquisition device 101 is used for acquiring a technical specification and an estimative book;
the template curing module 102 is connected with the data acquisition device 101 and is used for curing the acquired technical specification book and the acquired estimators until the contents can be copied and identified and cannot be modified;
a core field data export module 103 connected with the template curing module 102 and used for exporting the core field data of the work item part of the cured technical specification and the exploratable estimation book;
the data preprocessing module 104 is connected with the core field data everywhere module 103 and is used for preprocessing the derived core field data;
the similarity analysis module 105 is connected with the data preprocessing module 104 and is used for analyzing the core field data of the technical specification book and the core field data of the searchable estimation book processed by the data preprocessing module 104 by adopting a similarity algorithm;
and the report output module 106 is configured to output the unmatched items in a report form to obtain an examination report.
Example 3
The intelligent inspection method of the purchase file based on the natural language processing technology comprises the following steps:
(1) the solidification of the online template of the document is realized by adopting a web technology and a frame for the technical specification book and the exploratable estimation book;
(2) leading out the core field data of the technical specification and the exploratable estimation book work item part through document solidification, and carrying out data preprocessing;
(3) and analyzing the document to be examined by adopting a similarity algorithm to obtain a preliminary examination report.
In the step (1), the technical specification and the exploratory estimation book are standard document templates. And solidifying the document by adopting web technology and a framework to serve as a standard for document comparison.
In the step (2), the data preprocessing mode comprises regular matching, text word segmentation, stop word processing, character string processing and data reduction.
The text word segmentation adopts a recurrent neural network word segmentation method (the flow is shown in figure 1);
after word segmentation is finished, cleaning text character strings by using a regular expression to filter special symbols and stop words to obtain a dictionary library;
the principle of the reduction of the data is as follows:
original data X ═ X, X2,x3,...,xnNeeds to be reduced to k dimension, x1To xnRepresenting the extracted word vector matrix;
1) de-centering, each eigenvector value minus the average of the respective eigenvector, i.e. x1To xnVarious decentralization of vector matrixes;
3) Covariance matrix solving by eigenvalue decomposition methodThe eigenvalues and eigenvectors of (a);
4) sorting the eigenvalues from small to large, selecting the largest k eigenvectors, and then taking the corresponding k eigenvectors as row vectors respectively to form an eigenvector matrix P;
5) the data is converted into a new space constructed by k eigenvectors, i.e. Y ═ PX, i.e. the result after decreasing from n dimensions to k dimensions.
In the step (3), the similarity algorithm adopts a comprehensive similarity algorithm, that is, three different similarity algorithms are respectively adopted to calculate the similarity of the core field data, and then the comprehensive similarity is obtained by using a weighted average method for each similarity, and the specific processing method is as follows:
edit distance similarity of characters:
adding operation:
d1=ED(Ai-1,Bj)+1
and (3) deleting operation:
d2=ED(Ai,Bj-1)+1
and (3) modifying operation:
taking the minimum one of the above 3 as the minimum edit distance to obtain a state transition equation:
Jaccardcoefficient similarity:
cosine similarity:
and obtaining comprehensive similarity by adopting a weighted average mode for the three similarities:
in the formula, lambda and lambda are coefficients corresponding to three similarity distances; preferably 0.2, 0.4.
With sentences as the minimum detection unit, obtaining the similarity between the core field data of the document to be checked and the core field data of the solidified template through comprehensive similarity;
and outputting the examination report of the core field of the document to be examined.
Preferably, the technical specification and the estimable book of the specified template are imported, the names of the function items (core field data) in the function item tables in the two documents are extracted, similarity calculation is carried out, the matching is determined when the similarity is 90% or more, and the unmatched items are provided for a project manager in a report form for manual check.
Examples of the applications
(1) And (5) solidifying the document. The specification documents of the technical specification and the exploitable estimate are first solidified.
(2) And (6) document comparison. Firstly, exporting the core field data of the solidified technical specification and the exploitable estimate book work item part, and preprocessing the core field by adopting a natural language processing technology. And secondly, calculating the similarity between the core field and the core field in the solidified document, and if the similarity is more than 90%, determining that the document is qualified, otherwise, determining that the document is unqualified.
(3) And (5) outputting the document. Exporting the data in the form into corresponding word and excel files through an ActiveXObject control. Wherein the unmatched items are output in the form of a report.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (10)
1. The intelligent inspection method of the purchase file based on the natural language processing technology is characterized by comprising the following steps:
step (1), the solidification of the online template of the document is realized by adopting web technology and a frame for the technical specification book and the exploratable estimation book;
step (2), exporting the core field data of the solidified technical specification and the exploratable estimation book work item part, and carrying out data preprocessing;
the core field in the technical specification comprises project early-stage preparation, project development and project popularization and implementation; the core fields in the applicable evaluation book include construction cost and equipment purchase cost;
and (3) analyzing the core field data of the technical specification and the core field data of the searchable estimation book processed in the step (2) by adopting a similarity algorithm to obtain an examination report.
2. The intelligent natural language processing technology-based procurement file review method according to claim 1, characterized by comprising the steps of, a technical specification and an estimatable book being standard document templates; and (3) solidifying the document by adopting a web technology and a control until the contents in the document can only be copied and identified and cannot be modified to serve as a standard for document comparison.
3. The intelligent examination method for procurement files based on natural language processing technology as claimed in claim 2, characterized in that the concrete method for solidifying the documents by adopting web technology and controls is as follows: aiming at project file templates in a technical specification book and an estimable book, writing a corresponding form page by adopting an element component library; and exporting the data in the form into corresponding word and excel files by using an ActiveXObject control.
4. The intelligent examination method for procurement files based on natural language processing technology as claimed in claim 1, characterized in that the construction cost field comprises project development, project implementation, integrated development, project test, technical consultation; the device purchase fee includes a hardware device purchase and a system software purchase.
5. The intelligent examination method for procurement files based on natural language processing technology as claimed in claim 1, characterized in that in step (2), the data preprocessing mode comprises text word segmentation, regular matching, stop word processing, character string processing and data reduction.
6. The intelligent method for examining purchase documents based on natural language processing technology as claimed in claim 5, wherein the text segmentation adopts BilSTM + CRF segmentation.
7. The intelligent examination method for procurement files based on natural language processing technology as claimed in claim 5, characterized in that after the text word segmentation is completed, the text character strings are cleaned in a regular matching mode, and the special symbols and stop words are filtered to obtain a dictionary database.
8. The intelligent method for reviewing procurement files based on natural language processing technology as claimed in claim 5, wherein the reduced data is obtained by using a principal component analysis algorithm, and the method comprises the following steps:
original data X ═ X, X2,x3,...,xnNeeds to be reduced to k dimension, x1To xnRepresenting the extracted word vector matrix;
1) de-centering, each eigenvector value minus the average of the respective eigenvector
3) Covariance matrix calculation by singular value decompositionThe eigenvalues and eigenvectors of (a);
4) sorting the eigenvalues from small to large, selecting the largest k eigenvectors, and then taking the corresponding k eigenvectors as row vectors respectively to form an eigenvector matrix P;
5) the data is converted into a new space constructed by k eigenvectors, i.e. Y ═ PX, Y is the result of the reduction from n dimension to k dimension.
9. The intelligent examination method for procurement files based on natural language processing technology as claimed in claim 1, wherein in step (3), the similarity algorithm adopts a comprehensive similarity algorithm, that is, three different similarity algorithms are respectively adopted to calculate the similarity of core field data, and then the comprehensive similarity is obtained by using a weighted average method for each similarity, and the specific processing method is as follows:
edit distance similarity of characters:
adding operation:
d1=ED(Ai-1,Bj)+1
and (3) deleting operation:
d2=ED(Ai,Bj-1)+1
and (3) modifying operation:
taking the minimum one of the above 3 as the minimum edit distance to obtain a state transition equation:
in the above formula, d1,d2,d3Respectively representing the edit distance similarity of the adding, deleting and modifying operations; a and B represent two strings to be compared; ED is an edit distance function;representing a minimum edit distance; l isA,LBRespectively, when A or B is length, AiRepresenting the ith character in A; b isjRepresents the jth character in B;
jaccard coefficient similarity:
in the above formula, the first and second carbon atoms are,the number of attributes of A and B which are 0 at the same time is represented;the number of attributes in which the attribute A is 0 and the attribute B is 1 is represented;the number of attributes in which the attribute A is 1 and the attribute B is 0 is represented;the number of attributes of which the attributes A and B are 1 at the same time is represented;
cosine similarity:
where cos α is the cosine distance between two strings, xiAnd yiA word vector of two characters;
and obtaining comprehensive similarity by adopting a weighted average mode for the three similarities:
in the formula, λ1、λ2、λ3The coefficients corresponding to the three similarity distances;
with sentences as minimum detection units, obtaining core field data of the technical specification book and core field data similarity of the estimable book through comprehensive similarity;
and outputting an examination report of the core field.
10. The intelligent inspection system for the procurement files based on the natural language processing technology adopts the intelligent inspection method for the procurement files based on the natural language processing technology, which is characterized by comprising the following steps:
the data acquisition device (101) is used for acquiring the technical specification and the exploratory estimation book;
the template curing module (102) is connected with the data acquisition device (101) and is used for curing the acquired technical specification book and the researched evaluation book until the contents in the technical specification book and the researched evaluation book can be copied and identified and cannot be modified;
the core field data export module (103) is connected with the template curing module (102) and is used for exporting the core field data of the technical specification book and the exploratable estimation book work item part after curing;
the data preprocessing module (104) is connected with the core field data everywhere module (103) and is used for preprocessing the derived core field data;
the similarity analysis module (105) is connected with the data preprocessing module (104) and is used for analyzing the core field data of the technical specification book and the core field data of the estimable book processed by the data preprocessing module (104) by adopting a similarity algorithm;
and the report output module (106) is used for outputting the unmatched items in a report form mode to obtain an examination report.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011299881.3A CN112417835B (en) | 2020-11-18 | 2020-11-18 | Intelligent purchasing file examination method and system based on natural language processing technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011299881.3A CN112417835B (en) | 2020-11-18 | 2020-11-18 | Intelligent purchasing file examination method and system based on natural language processing technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112417835A true CN112417835A (en) | 2021-02-26 |
CN112417835B CN112417835B (en) | 2023-11-14 |
Family
ID=74773489
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011299881.3A Active CN112417835B (en) | 2020-11-18 | 2020-11-18 | Intelligent purchasing file examination method and system based on natural language processing technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112417835B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113112246A (en) * | 2021-05-06 | 2021-07-13 | 成都文驰科技有限公司 | Citation standard validity detection method |
CN113378560A (en) * | 2021-07-02 | 2021-09-10 | 贵州电网有限责任公司 | Test report intelligent diagnosis analysis method based on natural language processing |
CN115239211A (en) * | 2022-09-22 | 2022-10-25 | 国家电投集团科学技术研究院有限公司 | Method, device and system for researching and examining photovoltaic power generation project and electronic equipment |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170039176A1 (en) * | 2015-08-03 | 2017-02-09 | BlackBoiler, LLC | Method and System for Suggesting Revisions to an Electronic Document |
CN107102998A (en) * | 2016-02-22 | 2017-08-29 | 阿里巴巴集团控股有限公司 | A kind of String distance computational methods and device |
CN108241605A (en) * | 2017-12-13 | 2018-07-03 | 广西电网有限责任公司电力科学研究院 | A kind of technical report standardization write method based on VC |
CN109190092A (en) * | 2018-08-15 | 2019-01-11 | 深圳平安综合金融服务有限公司上海分公司 | The consistency checking method of separate sources file |
CN110110982A (en) * | 2019-04-26 | 2019-08-09 | 特赞(上海)信息科技有限公司 | The checking method and device of intention material |
CN111104794A (en) * | 2019-12-25 | 2020-05-05 | 同方知网(北京)技术有限公司 | Text similarity matching method based on subject words |
CN111709235A (en) * | 2020-05-28 | 2020-09-25 | 上海发电设备成套设计研究院有限责任公司 | Text data statistical analysis system and method based on natural language processing |
CN111861366A (en) * | 2020-06-08 | 2020-10-30 | 远光软件股份有限公司 | Project-ground intelligent auditing system and computer |
-
2020
- 2020-11-18 CN CN202011299881.3A patent/CN112417835B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170039176A1 (en) * | 2015-08-03 | 2017-02-09 | BlackBoiler, LLC | Method and System for Suggesting Revisions to an Electronic Document |
CN107102998A (en) * | 2016-02-22 | 2017-08-29 | 阿里巴巴集团控股有限公司 | A kind of String distance computational methods and device |
CN108241605A (en) * | 2017-12-13 | 2018-07-03 | 广西电网有限责任公司电力科学研究院 | A kind of technical report standardization write method based on VC |
CN109190092A (en) * | 2018-08-15 | 2019-01-11 | 深圳平安综合金融服务有限公司上海分公司 | The consistency checking method of separate sources file |
CN110110982A (en) * | 2019-04-26 | 2019-08-09 | 特赞(上海)信息科技有限公司 | The checking method and device of intention material |
CN111104794A (en) * | 2019-12-25 | 2020-05-05 | 同方知网(北京)技术有限公司 | Text similarity matching method based on subject words |
CN111709235A (en) * | 2020-05-28 | 2020-09-25 | 上海发电设备成套设计研究院有限责任公司 | Text data statistical analysis system and method based on natural language processing |
CN111861366A (en) * | 2020-06-08 | 2020-10-30 | 远光软件股份有限公司 | Project-ground intelligent auditing system and computer |
Non-Patent Citations (3)
Title |
---|
D. A, AHMED I, SAFAA S: "A Comparative Study on using Principle Component Analysis with different Text Classifiers", INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS, vol. 180, no. 31, pages 1 - 7 * |
张驰;徐承松;: "工程服务类技术规范书标准化路径研究", 科技创新导报, no. 30, pages 164 * |
王煜: "自然语言处理技术在建筑工程中的应用研究综述", 图学学报, vol. 41, no. 04, pages 501 - 511 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113112246A (en) * | 2021-05-06 | 2021-07-13 | 成都文驰科技有限公司 | Citation standard validity detection method |
CN113378560A (en) * | 2021-07-02 | 2021-09-10 | 贵州电网有限责任公司 | Test report intelligent diagnosis analysis method based on natural language processing |
CN113378560B (en) * | 2021-07-02 | 2023-07-18 | 贵州电网有限责任公司 | Test report intelligent diagnosis analysis method based on natural language processing |
CN115239211A (en) * | 2022-09-22 | 2022-10-25 | 国家电投集团科学技术研究院有限公司 | Method, device and system for researching and examining photovoltaic power generation project and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN112417835B (en) | 2023-11-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112417835B (en) | Intelligent purchasing file examination method and system based on natural language processing technology | |
Liu et al. | Learning to spot and refactor inconsistent method names | |
CN107451153A (en) | The method and apparatus of export structure query statement | |
CN109597994A (en) | Short text problem semantic matching method and system | |
CN112100401B (en) | Knowledge graph construction method, device, equipment and storage medium for science and technology services | |
Kashmira et al. | Generating entity relationship diagram from requirement specification based on nlp | |
CN115547466B (en) | Medical institution registration and review system and method based on big data | |
Zhou et al. | Survey of knowledge graph approaches and applications | |
CN116245107A (en) | Electric power audit text entity identification method, device, equipment and storage medium | |
CN111651569A (en) | Knowledge base question-answering method and system in electric power field | |
CN113157918B (en) | Commodity name short text classification method and system based on attention mechanism | |
CN114239579A (en) | Electric power searchable document extraction method and device based on regular expression and CRF model | |
CN117540035A (en) | RPA knowledge graph construction method based on entity type information fusion | |
CN115688729A (en) | Power transmission and transformation project cost data integrated management system and method thereof | |
Ma et al. | A Legacy ERP System Integration Framework based on Ontology Learning. | |
Chen et al. | A Meta-Learning Framework for Predicting Power Digital Equipment Defect Texts via Hypergraph Modeling | |
CN111814457B (en) | Power grid engineering contract text generation method | |
CN112698833B (en) | Feature attachment code taste detection method based on local and global features | |
CN118093439B (en) | Microservice extraction method and system based on consistent graph clustering | |
CN116128058B (en) | Heterogeneous power generation equipment state judging method, heterogeneous power generation equipment state judging device, storage medium and heterogeneous power generation equipment | |
Liu | Price Prediction of TSLA, BYD and NIO Based on ARIMA Model | |
CN116383341A (en) | Power technology standard deviation clause identification method, system and readable storage medium | |
Melnyk et al. | TOWARDS THE DEVELOPMENT OF A CLASSIFICATION MODEL FOR TECHNICAL DOCUMENTS IN KNOWLEDGE DISCOVERY SYSTEMS. | |
CN114169452A (en) | Information loss prevention method and system for industrial big data feature extraction | |
Wang et al. | A Span Information Fusion-Based End-to-End Relation Extraction Model for Power Knowledge Graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |