CN112417835A - Intelligent inspection method and system for purchase file based on natural language processing technology - Google Patents

Intelligent inspection method and system for purchase file based on natural language processing technology Download PDF

Info

Publication number
CN112417835A
CN112417835A CN202011299881.3A CN202011299881A CN112417835A CN 112417835 A CN112417835 A CN 112417835A CN 202011299881 A CN202011299881 A CN 202011299881A CN 112417835 A CN112417835 A CN 112417835A
Authority
CN
China
Prior art keywords
book
similarity
data
technical specification
core field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011299881.3A
Other languages
Chinese (zh)
Other versions
CN112417835B (en
Inventor
汤力
姜劲
杜洁
李芹
王菁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Center of Yunnan Power Grid Co Ltd
Original Assignee
Information Center of Yunnan Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Center of Yunnan Power Grid Co Ltd filed Critical Information Center of Yunnan Power Grid Co Ltd
Priority to CN202011299881.3A priority Critical patent/CN112417835B/en
Publication of CN112417835A publication Critical patent/CN112417835A/en
Application granted granted Critical
Publication of CN112417835B publication Critical patent/CN112417835B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a method and a system for intelligently reviewing purchase files based on a natural language processing technology, and belongs to the technical field of intelligent text review of project purchase data. Firstly, solidifying an online template of a document by adopting a web technology and a frame for a technical specification book and an exploratable estimation book; exporting the core field data of the work item part of the solidified technical specification and the exploratable estimation book, and carrying out data preprocessing; and analyzing the processed core field data of the technical specification and the processed core field data of the exploitable estimation book by adopting a similarity algorithm to obtain an examination report. The invention reduces the repetitive and fussy work in manual examination, avoids the detail error caused by high-load manual examination, and is easy to popularize and apply.

Description

Intelligent inspection method and system for purchase file based on natural language processing technology
Technical Field
The invention belongs to the technical field of intelligent text review of project purchase data, and particularly relates to a purchase file intelligent review method and system based on a natural language processing technology.
Background
With the promotion of digital transformation of a power grid, an information center serves as a project construction main body, the number of informationized projects is increased year by year, 275 informationized projects reaching the center are expected to be issued by companies in 2020, and the total investment is nearly 3 hundred million. The templates and requirements related to the whole process of the information project are more, the planning construction department is used as a function management department for project construction and bid procurement, the examination of the project construction process templates and procurement files is realized in a manual processing mode, the efficiency is low, and errors are easy to occur. With the enhancement of audit consciousness and the improvement of lean project management, project managers need to carry out point-to-point examination on technical specifications and exploitable and estimable work items, ensure that the technical specifications are in an exploitable scope and have no defects, and avoid audit risks; meanwhile, key point examination needs to be conducted on the purchasing element list and the technical specification, and the completeness and reasonability of the purchasing file are ensured. However, due to the rapid increase of the number of projects and the high requirement of timeliness of the bidding work, project management is obligated to examine the project quantity by up to 59 sub-packages in two days, and the contradiction between the quality and the time of manual examination is increasingly prominent. Once the quality problem of examination occurs, the influence on project purchase and subsequent project construction is brought. Therefore, how to overcome the defects of the prior art is a problem which needs to be solved in the technical field of intelligent text review of the current procurement data.
Disclosure of Invention
The invention aims to solve the defects of the prior art and provides a purchase file intelligent examination method and a purchase file intelligent examination system based on a natural language processing technology.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
the intelligent inspection method of the purchase file based on the natural language processing technology comprises the following steps:
step (1), the solidification of the online template of the document is realized by adopting web technology and a frame for the technical specification book and the exploratable estimation book;
step (2), exporting the core field data of the solidified technical specification and the exploratable estimation book work item part, and carrying out data preprocessing;
the core field in the technical specification comprises project early-stage preparation, project development and project popularization and implementation; the core fields in the applicable evaluation book include construction cost and equipment purchase cost;
and (3) analyzing the core field data of the technical specification and the core field data of the searchable estimation book processed in the step (2) by adopting a similarity algorithm to obtain an examination report.
Further, it is preferable that both the technical specification and the exploratory estimate are standard document templates; and (3) solidifying the document by adopting a web technology and a control until the contents in the document can only be copied and identified and cannot be modified to serve as a standard for document comparison.
Further, preferably, the specific method for solidifying the document by using the web technology and the control is as follows: aiming at project file templates in a technical specification book and an estimable book, writing a corresponding form page by adopting an element component library; and exporting the data in the form into corresponding word and excel files by using an ActiveXObject control.
Further, preferably, the construction cost field includes project development, project implementation, integrated development, project test, and technical consultation; the device purchase fee includes a hardware device purchase and a system software purchase.
Further, in the step (2), preferably, the data preprocessing manner includes text word segmentation, regular matching, stop word processing, character string processing and data reduction.
Further, preferably, the text segmentation adopts a BilSTM + CRF segmentation method.
Further, preferably, after the text word segmentation is completed, the text character strings are cleaned in a regular matching mode, and the special symbols and stop words are filtered to obtain a dictionary database.
Further, it is preferable that the reduced data uses a principal component analysis algorithm, specifically as follows:
original data X ═ X, X2,x3,...,xnNeeds to be reduced to k dimension, x1To xnRepresenting the extracted word vector matrix;
1) de-centering, each eigenvector value minus the average of the respective eigenvector
2) Calculating covariance
Figure BDA0002786515380000031
3) Covariance matrix calculation by singular value decomposition
Figure BDA0002786515380000032
The eigenvalues and eigenvectors of (a);
4) sorting the eigenvalues from small to large, and selecting the largest kAnThen corresponding k to itAnThe eigenvectors are respectively used as row vectors to form an eigenvector matrix P;
5) converting data to kAnIn the new space constructed by the feature vector, i.e. Y is PX, and Y is reduced from n dimension to kVitamin C After thatAnd (6) obtaining the result.
Further, preferably, in the step (3), the similarity algorithm adopts a comprehensive similarity algorithm, that is, three different similarity algorithms are respectively adopted to calculate the similarity of the core field data, and then the comprehensive similarity is obtained by using a weighted average method for each similarity, where the specific point is
The principle mode is as follows:
edit distance similarity of characters:
adding operation:
d1=ED(Ai-1,Bj)+1
and (3) deleting operation:
d2=ED(Ai,Bj-1)+1
and (3) modifying operation:
Figure BDA0002786515380000033
taking the minimum one of the above 3 as the minimum edit distance to obtain a state transition equation:
Figure BDA0002786515380000034
in the above formula, d1,d2,d3Respectively representing the edit distance similarity of the adding, deleting and modifying operations; a and B represent two strings to be compared; ED is an edit distance function;
Figure BDA0002786515380000035
representing a minimum edit distance; l isA,LBRespectively, when A or B is length, AiDenotes the i-th in AAnA character; b isjDenotes the j-th in BAnA character;
jaccard coefficient similarity:
Figure BDA0002786515380000041
in the above formula, the first and second carbon atoms are,
Figure BDA0002786515380000042
the number of attributes of A and B which are 0 at the same time is represented;
Figure BDA0002786515380000043
the number of attributes in which the attribute A is 0 and the attribute B is 1 is represented;
Figure BDA0002786515380000044
the number of attributes in which the attribute A is 1 and the attribute B is 0 is represented;
Figure BDA0002786515380000045
the number of attributes of which the attributes A and B are 1 at the same time is represented;
cosine similarity:
Figure BDA0002786515380000046
where cos α is the cosine distance between two strings, xiAnd yiA word vector of two characters;
and obtaining comprehensive similarity by adopting a weighted average mode for the three similarities:
Figure BDA0002786515380000047
in the formula, lambda and lambda are coefficients corresponding to three similarity distances;
with sentences as minimum detection units, obtaining core field data of the technical specification book and core field data similarity of the estimable book through comprehensive similarity;
and outputting an examination report of the core field.
The invention also provides a purchase file intelligent examination system based on the natural language processing technology, and the purchase file intelligent examination method based on the natural language processing technology comprises the following steps:
the data acquisition device is used for acquiring the technical specification and the exploitable estimation book;
the template curing module is connected with the data acquisition device and is used for curing the acquired technical specification book and the researched evaluation book until the contents can only be copied and identified and cannot be modified;
the core field data export module is connected with the template curing module and is used for exporting the core field data of the technical specification book and the exploitable estimate book work item part after curing;
the data preprocessing module is connected with the core field data everywhere module and is used for preprocessing the exported core field data;
the similarity analysis module is connected with the data preprocessing module and is used for analyzing the core field data of the technical specification book and the core field data of the searchable estimation book processed by the data preprocessing module by adopting a similarity algorithm;
and the report output module is used for outputting the unmatched items in a report form mode to obtain an examination report.
Further, it is preferable that the similarity of 90% or more is regarded as a match, and otherwise, it is regarded as a mismatch.
Compared with the prior art, the invention has the beneficial effects that:
the invention solidifies the technical specification and the estimators on line, and aims to solve the problem of document requirement of project responsible personnel in the whole process of an information project, reduce unnecessary communication cost caused by inconsistent document templates, reduce the reduction of project construction efficiency caused by repeated reworking of editing and reviewing work, and improve purchasing efficiency.
The first intelligent examination of the consistency of the technical specification and the searchable estimation book is realized through the research on the intelligent examination technology of the technical specification and the searchable estimation book, an examination report is formed by comparing examination results, and a suggestion whether manual review is needed or not is provided according to the examination results and the content needing review is prompted. The artificial intelligence means is applied to the information project management work, so that the workload of manual examination is greatly reduced, the examination efficiency of technical specifications is improved, the problem of low accuracy possibly caused by manual examination is solved, the audit risk is reduced, and the purchasing quality and the project management quality are improved.
Comparing the key parts of the technical specification and the exploitable estimation book by a natural language processing technology to realize the application of the natural language technology in the technical specification and the exploitable estimation book; the web technology is adopted to apply the web document solidification technology to the technical specification and the exploitable estimation book, so that the application of the web document solidification technology in the examination of the technical specification and the exploitable estimation book in the power industry is realized; and exporting the data in the form into corresponding word and excel files by using an ActiveXObject control.
Drawings
FIG. 1 is a flow chart of the BilSTM + CRF word segmentation method;
FIG. 2 is a schematic structural diagram of an intelligent examination system for procurement files based on natural language processing technology;
101, a data acquisition device; 102. a template curing module; 103. the core field data is distributed to the modules; 104. a data preprocessing module; 105. a similarity analysis module; 106. and a report output module.
Detailed Description
The present invention will be described in further detail with reference to examples.
It will be appreciated by those skilled in the art that the following examples are illustrative of the invention only and should not be taken as limiting the scope of the invention. The examples do not specify particular techniques or conditions, and are performed according to the techniques or conditions described in the literature in the art or according to the product specifications. The materials or equipment used are not indicated by manufacturers, and all are conventional products available by purchase.
Example 1
The intelligent inspection method of the purchase file based on the natural language processing technology is characterized by comprising the following steps:
step (1), the solidification of the online template of the document is realized by adopting web technology and a frame for the technical specification book and the exploratable estimation book;
step (2), exporting the core field data of the solidified technical specification and the exploratable estimation book work item part, and carrying out data preprocessing;
the core field in the technical specification comprises project early-stage preparation, project development and project popularization and implementation; the core fields in the applicable evaluation book include construction cost and equipment purchase cost;
and (3) analyzing the core field data of the technical specification and the core field data of the searchable estimation book processed in the step (2) by adopting a similarity algorithm to obtain an examination report.
Example 2
The intelligent inspection method of the purchase file based on the natural language processing technology is characterized by comprising the following steps:
step (1), the solidification of the online template of the document is realized by adopting web technology and a frame for the technical specification book and the exploratable estimation book;
step (2), exporting the core field data of the solidified technical specification and the exploratable estimation book work item part, and carrying out data preprocessing;
the core field in the technical specification comprises project early-stage preparation, project development and project popularization and implementation; the core fields in the applicable evaluation book include construction cost and equipment purchase cost;
and (3) analyzing the core field data of the technical specification and the core field data of the searchable estimation book processed in the step (2) by adopting a similarity algorithm to obtain an examination report.
The technical specification and the estimators are standard document templates; and (3) solidifying the document by adopting a web technology and a control until the contents in the document can only be copied and identified and cannot be modified to serve as a standard for document comparison.
The concrete method for solidifying the document by adopting the web technology and the control comprises the following steps: aiming at project file templates in a technical specification book and an estimable book, writing a corresponding form page by adopting an element component library; and exporting the data in the form into corresponding word and excel files by using an ActiveXObject control.
The construction cost field comprises project development, project implementation, integrated development, project test and technical consultation; the device purchase fee includes a hardware device purchase and a system software purchase.
In the step (2), the data preprocessing mode comprises text word segmentation, regular matching, stop word processing, character string processing and data reduction.
The text word segmentation adopts a BilSTM + CRF word segmentation method.
After the text word segmentation is finished, the text character strings are cleaned in a regular matching mode, and the special symbols and stop words are filtered to obtain a dictionary base.
The reduced data uses a principal component analysis algorithm as follows:
original data X ═ X, X2,x3,...,xnNeeds to be reduced to k dimension, x1To xnRepresenting the extracted word vector matrix;
2) de-centering, each eigenvector value minus the average of the respective eigenvector
3) Calculating covariance
Figure BDA0002786515380000071
3) Covariance matrix calculation by singular value decomposition
Figure BDA0002786515380000072
The eigenvalues and eigenvectors of (a);
4) sorting the eigenvalues from small to large, selecting the largest k eigenvectors, and then taking the corresponding k eigenvectors as row vectors respectively to form an eigenvector matrix P;
5) the data is converted into a new space constructed by k eigenvectors, i.e. Y ═ PX, Y is the result of the reduction from n dimension to k dimension.
In the step (3), the similarity algorithm adopts a comprehensive similarity algorithm, that is, three different similarity algorithms are respectively adopted to calculate the similarity of the core field data, and then the comprehensive similarity is obtained by utilizing a weighted average mode for each similarity, and the specific processing mode is as follows:
edit distance similarity of characters:
adding operation:
d1=ED(Ai-1,Bj)+1
and (3) deleting operation:
d2=ED(Ai,Bj-1)+1
and (3) modifying operation:
Figure BDA0002786515380000081
taking the minimum one of the above 3 as the minimum edit distance to obtain a state transition equation:
Figure BDA0002786515380000082
in the above formula, d1,d2,d3Respectively representing the edit distance similarity of the adding, deleting and modifying operations; a and B represent two strings to be compared; ED is an edit distance function;
Figure BDA0002786515380000088
representing a minimum edit distance; l isA,LBRespectively, when A or B is length, AiDenotes the i-th in AAnA character; b isjDenotes the j-th in BAnA character;
Jaccardcoefficient similarity:
Figure BDA0002786515380000083
in the above formula, the first and second carbon atoms are,
Figure BDA0002786515380000084
the number of attributes of A and B which are 0 at the same time is represented;
Figure BDA0002786515380000085
the number of attributes in which the attribute A is 0 and the attribute B is 1 is represented;
Figure BDA0002786515380000086
the number of attributes in which the attribute A is 1 and the attribute B is 0 is represented;
Figure BDA0002786515380000087
indicates that the A and B attributes are the sameThe number of attributes whose time is 1;
cosine similarity:
Figure BDA0002786515380000091
where cos α is the cosine distance between two strings, xiAnd yiA word vector of two characters;
and obtaining comprehensive similarity by adopting a weighted average mode for the three similarities:
Figure BDA0002786515380000092
in the formula, lambda and lambda are coefficients corresponding to three similarity distances;
with sentences as minimum detection units, obtaining core field data of the technical specification book and core field data similarity of the estimable book through comprehensive similarity;
and outputting an examination report of the core field.
As shown in fig. 2, the system for intelligently reviewing a purchase file based on a natural language processing technology, which adopts the method for intelligently reviewing a purchase file based on a natural language processing technology, is characterized by comprising:
the data acquisition device 101 is used for acquiring a technical specification and an estimative book;
the template curing module 102 is connected with the data acquisition device 101 and is used for curing the acquired technical specification book and the acquired estimators until the contents can be copied and identified and cannot be modified;
a core field data export module 103 connected with the template curing module 102 and used for exporting the core field data of the work item part of the cured technical specification and the exploratable estimation book;
the data preprocessing module 104 is connected with the core field data everywhere module 103 and is used for preprocessing the derived core field data;
the similarity analysis module 105 is connected with the data preprocessing module 104 and is used for analyzing the core field data of the technical specification book and the core field data of the searchable estimation book processed by the data preprocessing module 104 by adopting a similarity algorithm;
and the report output module 106 is configured to output the unmatched items in a report form to obtain an examination report.
Example 3
The intelligent inspection method of the purchase file based on the natural language processing technology comprises the following steps:
(1) the solidification of the online template of the document is realized by adopting a web technology and a frame for the technical specification book and the exploratable estimation book;
(2) leading out the core field data of the technical specification and the exploratable estimation book work item part through document solidification, and carrying out data preprocessing;
(3) and analyzing the document to be examined by adopting a similarity algorithm to obtain a preliminary examination report.
In the step (1), the technical specification and the exploratory estimation book are standard document templates. And solidifying the document by adopting web technology and a framework to serve as a standard for document comparison.
In the step (2), the data preprocessing mode comprises regular matching, text word segmentation, stop word processing, character string processing and data reduction.
The text word segmentation adopts a recurrent neural network word segmentation method (the flow is shown in figure 1);
after word segmentation is finished, cleaning text character strings by using a regular expression to filter special symbols and stop words to obtain a dictionary library;
the principle of the reduction of the data is as follows:
original data X ═ X, X2,x3,...,xnNeeds to be reduced to k dimension, x1To xnRepresenting the extracted word vector matrix;
1) de-centering, each eigenvector value minus the average of the respective eigenvector, i.e. x1To xnVarious decentralization of vector matrixes;
2) calculating covariance
Figure BDA0002786515380000101
3) Covariance matrix solving by eigenvalue decomposition method
Figure BDA0002786515380000102
The eigenvalues and eigenvectors of (a);
4) sorting the eigenvalues from small to large, selecting the largest k eigenvectors, and then taking the corresponding k eigenvectors as row vectors respectively to form an eigenvector matrix P;
5) the data is converted into a new space constructed by k eigenvectors, i.e. Y ═ PX, i.e. the result after decreasing from n dimensions to k dimensions.
In the step (3), the similarity algorithm adopts a comprehensive similarity algorithm, that is, three different similarity algorithms are respectively adopted to calculate the similarity of the core field data, and then the comprehensive similarity is obtained by using a weighted average method for each similarity, and the specific processing method is as follows:
edit distance similarity of characters:
adding operation:
d1=ED(Ai-1,Bj)+1
and (3) deleting operation:
d2=ED(Ai,Bj-1)+1
and (3) modifying operation:
Figure BDA0002786515380000111
taking the minimum one of the above 3 as the minimum edit distance to obtain a state transition equation:
Figure BDA0002786515380000112
Jaccardcoefficient similarity:
Figure BDA0002786515380000113
cosine similarity:
Figure BDA0002786515380000114
and obtaining comprehensive similarity by adopting a weighted average mode for the three similarities:
Figure BDA0002786515380000115
in the formula, lambda and lambda are coefficients corresponding to three similarity distances; preferably 0.2, 0.4.
With sentences as the minimum detection unit, obtaining the similarity between the core field data of the document to be checked and the core field data of the solidified template through comprehensive similarity;
and outputting the examination report of the core field of the document to be examined.
Preferably, the technical specification and the estimable book of the specified template are imported, the names of the function items (core field data) in the function item tables in the two documents are extracted, similarity calculation is carried out, the matching is determined when the similarity is 90% or more, and the unmatched items are provided for a project manager in a report form for manual check.
Examples of the applications
(1) And (5) solidifying the document. The specification documents of the technical specification and the exploitable estimate are first solidified.
(2) And (6) document comparison. Firstly, exporting the core field data of the solidified technical specification and the exploitable estimate book work item part, and preprocessing the core field by adopting a natural language processing technology. And secondly, calculating the similarity between the core field and the core field in the solidified document, and if the similarity is more than 90%, determining that the document is qualified, otherwise, determining that the document is unqualified.
(3) And (5) outputting the document. Exporting the data in the form into corresponding word and excel files through an ActiveXObject control. Wherein the unmatched items are output in the form of a report.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (10)

1. The intelligent inspection method of the purchase file based on the natural language processing technology is characterized by comprising the following steps:
step (1), the solidification of the online template of the document is realized by adopting web technology and a frame for the technical specification book and the exploratable estimation book;
step (2), exporting the core field data of the solidified technical specification and the exploratable estimation book work item part, and carrying out data preprocessing;
the core field in the technical specification comprises project early-stage preparation, project development and project popularization and implementation; the core fields in the applicable evaluation book include construction cost and equipment purchase cost;
and (3) analyzing the core field data of the technical specification and the core field data of the searchable estimation book processed in the step (2) by adopting a similarity algorithm to obtain an examination report.
2. The intelligent natural language processing technology-based procurement file review method according to claim 1, characterized by comprising the steps of, a technical specification and an estimatable book being standard document templates; and (3) solidifying the document by adopting a web technology and a control until the contents in the document can only be copied and identified and cannot be modified to serve as a standard for document comparison.
3. The intelligent examination method for procurement files based on natural language processing technology as claimed in claim 2, characterized in that the concrete method for solidifying the documents by adopting web technology and controls is as follows: aiming at project file templates in a technical specification book and an estimable book, writing a corresponding form page by adopting an element component library; and exporting the data in the form into corresponding word and excel files by using an ActiveXObject control.
4. The intelligent examination method for procurement files based on natural language processing technology as claimed in claim 1, characterized in that the construction cost field comprises project development, project implementation, integrated development, project test, technical consultation; the device purchase fee includes a hardware device purchase and a system software purchase.
5. The intelligent examination method for procurement files based on natural language processing technology as claimed in claim 1, characterized in that in step (2), the data preprocessing mode comprises text word segmentation, regular matching, stop word processing, character string processing and data reduction.
6. The intelligent method for examining purchase documents based on natural language processing technology as claimed in claim 5, wherein the text segmentation adopts BilSTM + CRF segmentation.
7. The intelligent examination method for procurement files based on natural language processing technology as claimed in claim 5, characterized in that after the text word segmentation is completed, the text character strings are cleaned in a regular matching mode, and the special symbols and stop words are filtered to obtain a dictionary database.
8. The intelligent method for reviewing procurement files based on natural language processing technology as claimed in claim 5, wherein the reduced data is obtained by using a principal component analysis algorithm, and the method comprises the following steps:
original data X ═ X, X2,x3,...,xnNeeds to be reduced to k dimension, x1To xnRepresenting the extracted word vector matrix;
1) de-centering, each eigenvector value minus the average of the respective eigenvector
2) Calculating covariance
Figure FDA0002786515370000021
3) Covariance matrix calculation by singular value decomposition
Figure FDA0002786515370000022
The eigenvalues and eigenvectors of (a);
4) sorting the eigenvalues from small to large, selecting the largest k eigenvectors, and then taking the corresponding k eigenvectors as row vectors respectively to form an eigenvector matrix P;
5) the data is converted into a new space constructed by k eigenvectors, i.e. Y ═ PX, Y is the result of the reduction from n dimension to k dimension.
9. The intelligent examination method for procurement files based on natural language processing technology as claimed in claim 1, wherein in step (3), the similarity algorithm adopts a comprehensive similarity algorithm, that is, three different similarity algorithms are respectively adopted to calculate the similarity of core field data, and then the comprehensive similarity is obtained by using a weighted average method for each similarity, and the specific processing method is as follows:
edit distance similarity of characters:
adding operation:
d1=ED(Ai-1,Bj)+1
and (3) deleting operation:
d2=ED(Ai,Bj-1)+1
and (3) modifying operation:
Figure FDA0002786515370000023
taking the minimum one of the above 3 as the minimum edit distance to obtain a state transition equation:
Figure FDA0002786515370000031
in the above formula, d1,d2,d3Respectively representing the edit distance similarity of the adding, deleting and modifying operations; a and B represent two strings to be compared; ED is an edit distance function;
Figure FDA0002786515370000038
representing a minimum edit distance; l isA,LBRespectively, when A or B is length, AiRepresenting the ith character in A; b isjRepresents the jth character in B;
jaccard coefficient similarity:
Figure FDA0002786515370000032
in the above formula, the first and second carbon atoms are,
Figure FDA0002786515370000033
the number of attributes of A and B which are 0 at the same time is represented;
Figure FDA0002786515370000034
the number of attributes in which the attribute A is 0 and the attribute B is 1 is represented;
Figure FDA0002786515370000035
the number of attributes in which the attribute A is 1 and the attribute B is 0 is represented;
Figure FDA0002786515370000036
the number of attributes of which the attributes A and B are 1 at the same time is represented;
cosine similarity:
Figure FDA0002786515370000037
where cos α is the cosine distance between two strings, xiAnd yiA word vector of two characters;
and obtaining comprehensive similarity by adopting a weighted average mode for the three similarities:
Figure FDA0002786515370000039
in the formula, λ1、λ2、λ3The coefficients corresponding to the three similarity distances;
with sentences as minimum detection units, obtaining core field data of the technical specification book and core field data similarity of the estimable book through comprehensive similarity;
and outputting an examination report of the core field.
10. The intelligent inspection system for the procurement files based on the natural language processing technology adopts the intelligent inspection method for the procurement files based on the natural language processing technology, which is characterized by comprising the following steps:
the data acquisition device (101) is used for acquiring the technical specification and the exploratory estimation book;
the template curing module (102) is connected with the data acquisition device (101) and is used for curing the acquired technical specification book and the researched evaluation book until the contents in the technical specification book and the researched evaluation book can be copied and identified and cannot be modified;
the core field data export module (103) is connected with the template curing module (102) and is used for exporting the core field data of the technical specification book and the exploratable estimation book work item part after curing;
the data preprocessing module (104) is connected with the core field data everywhere module (103) and is used for preprocessing the derived core field data;
the similarity analysis module (105) is connected with the data preprocessing module (104) and is used for analyzing the core field data of the technical specification book and the core field data of the estimable book processed by the data preprocessing module (104) by adopting a similarity algorithm;
and the report output module (106) is used for outputting the unmatched items in a report form mode to obtain an examination report.
CN202011299881.3A 2020-11-18 2020-11-18 Intelligent purchasing file examination method and system based on natural language processing technology Active CN112417835B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011299881.3A CN112417835B (en) 2020-11-18 2020-11-18 Intelligent purchasing file examination method and system based on natural language processing technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011299881.3A CN112417835B (en) 2020-11-18 2020-11-18 Intelligent purchasing file examination method and system based on natural language processing technology

Publications (2)

Publication Number Publication Date
CN112417835A true CN112417835A (en) 2021-02-26
CN112417835B CN112417835B (en) 2023-11-14

Family

ID=74773489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011299881.3A Active CN112417835B (en) 2020-11-18 2020-11-18 Intelligent purchasing file examination method and system based on natural language processing technology

Country Status (1)

Country Link
CN (1) CN112417835B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113112246A (en) * 2021-05-06 2021-07-13 成都文驰科技有限公司 Citation standard validity detection method
CN113378560A (en) * 2021-07-02 2021-09-10 贵州电网有限责任公司 Test report intelligent diagnosis analysis method based on natural language processing
CN115239211A (en) * 2022-09-22 2022-10-25 国家电投集团科学技术研究院有限公司 Method, device and system for researching and examining photovoltaic power generation project and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170039176A1 (en) * 2015-08-03 2017-02-09 BlackBoiler, LLC Method and System for Suggesting Revisions to an Electronic Document
CN107102998A (en) * 2016-02-22 2017-08-29 阿里巴巴集团控股有限公司 A kind of String distance computational methods and device
CN108241605A (en) * 2017-12-13 2018-07-03 广西电网有限责任公司电力科学研究院 A kind of technical report standardization write method based on VC
CN109190092A (en) * 2018-08-15 2019-01-11 深圳平安综合金融服务有限公司上海分公司 The consistency checking method of separate sources file
CN110110982A (en) * 2019-04-26 2019-08-09 特赞(上海)信息科技有限公司 The checking method and device of intention material
CN111104794A (en) * 2019-12-25 2020-05-05 同方知网(北京)技术有限公司 Text similarity matching method based on subject words
CN111709235A (en) * 2020-05-28 2020-09-25 上海发电设备成套设计研究院有限责任公司 Text data statistical analysis system and method based on natural language processing
CN111861366A (en) * 2020-06-08 2020-10-30 远光软件股份有限公司 Project-ground intelligent auditing system and computer

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170039176A1 (en) * 2015-08-03 2017-02-09 BlackBoiler, LLC Method and System for Suggesting Revisions to an Electronic Document
CN107102998A (en) * 2016-02-22 2017-08-29 阿里巴巴集团控股有限公司 A kind of String distance computational methods and device
CN108241605A (en) * 2017-12-13 2018-07-03 广西电网有限责任公司电力科学研究院 A kind of technical report standardization write method based on VC
CN109190092A (en) * 2018-08-15 2019-01-11 深圳平安综合金融服务有限公司上海分公司 The consistency checking method of separate sources file
CN110110982A (en) * 2019-04-26 2019-08-09 特赞(上海)信息科技有限公司 The checking method and device of intention material
CN111104794A (en) * 2019-12-25 2020-05-05 同方知网(北京)技术有限公司 Text similarity matching method based on subject words
CN111709235A (en) * 2020-05-28 2020-09-25 上海发电设备成套设计研究院有限责任公司 Text data statistical analysis system and method based on natural language processing
CN111861366A (en) * 2020-06-08 2020-10-30 远光软件股份有限公司 Project-ground intelligent auditing system and computer

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
D. A, AHMED I, SAFAA S: "A Comparative Study on using Principle Component Analysis with different Text Classifiers", INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS, vol. 180, no. 31, pages 1 - 7 *
张驰;徐承松;: "工程服务类技术规范书标准化路径研究", 科技创新导报, no. 30, pages 164 *
王煜: "自然语言处理技术在建筑工程中的应用研究综述", 图学学报, vol. 41, no. 04, pages 501 - 511 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113112246A (en) * 2021-05-06 2021-07-13 成都文驰科技有限公司 Citation standard validity detection method
CN113378560A (en) * 2021-07-02 2021-09-10 贵州电网有限责任公司 Test report intelligent diagnosis analysis method based on natural language processing
CN113378560B (en) * 2021-07-02 2023-07-18 贵州电网有限责任公司 Test report intelligent diagnosis analysis method based on natural language processing
CN115239211A (en) * 2022-09-22 2022-10-25 国家电投集团科学技术研究院有限公司 Method, device and system for researching and examining photovoltaic power generation project and electronic equipment

Also Published As

Publication number Publication date
CN112417835B (en) 2023-11-14

Similar Documents

Publication Publication Date Title
CN112417835B (en) Intelligent purchasing file examination method and system based on natural language processing technology
Liu et al. Learning to spot and refactor inconsistent method names
CN107451153A (en) The method and apparatus of export structure query statement
CN109597994A (en) Short text problem semantic matching method and system
CN112100401B (en) Knowledge graph construction method, device, equipment and storage medium for science and technology services
Kashmira et al. Generating entity relationship diagram from requirement specification based on nlp
CN115547466B (en) Medical institution registration and review system and method based on big data
Zhou et al. Survey of knowledge graph approaches and applications
CN116245107A (en) Electric power audit text entity identification method, device, equipment and storage medium
CN111651569A (en) Knowledge base question-answering method and system in electric power field
CN113157918B (en) Commodity name short text classification method and system based on attention mechanism
CN114239579A (en) Electric power searchable document extraction method and device based on regular expression and CRF model
CN117540035A (en) RPA knowledge graph construction method based on entity type information fusion
CN115688729A (en) Power transmission and transformation project cost data integrated management system and method thereof
Ma et al. A Legacy ERP System Integration Framework based on Ontology Learning.
Chen et al. A Meta-Learning Framework for Predicting Power Digital Equipment Defect Texts via Hypergraph Modeling
CN111814457B (en) Power grid engineering contract text generation method
CN112698833B (en) Feature attachment code taste detection method based on local and global features
CN118093439B (en) Microservice extraction method and system based on consistent graph clustering
CN116128058B (en) Heterogeneous power generation equipment state judging method, heterogeneous power generation equipment state judging device, storage medium and heterogeneous power generation equipment
Liu Price Prediction of TSLA, BYD and NIO Based on ARIMA Model
CN116383341A (en) Power technology standard deviation clause identification method, system and readable storage medium
Melnyk et al. TOWARDS THE DEVELOPMENT OF A CLASSIFICATION MODEL FOR TECHNICAL DOCUMENTS IN KNOWLEDGE DISCOVERY SYSTEMS.
CN114169452A (en) Information loss prevention method and system for industrial big data feature extraction
Wang et al. A Span Information Fusion-Based End-to-End Relation Extraction Model for Power Knowledge Graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant