CN112417835B - Intelligent purchasing file examination method and system based on natural language processing technology - Google Patents

Intelligent purchasing file examination method and system based on natural language processing technology Download PDF

Info

Publication number
CN112417835B
CN112417835B CN202011299881.3A CN202011299881A CN112417835B CN 112417835 B CN112417835 B CN 112417835B CN 202011299881 A CN202011299881 A CN 202011299881A CN 112417835 B CN112417835 B CN 112417835B
Authority
CN
China
Prior art keywords
book
similarity
data
technical specification
core field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011299881.3A
Other languages
Chinese (zh)
Other versions
CN112417835A (en
Inventor
汤力
姜劲
杜洁
李芹
王菁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Center of Yunnan Power Grid Co Ltd
Original Assignee
Information Center of Yunnan Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Center of Yunnan Power Grid Co Ltd filed Critical Information Center of Yunnan Power Grid Co Ltd
Priority to CN202011299881.3A priority Critical patent/CN112417835B/en
Publication of CN112417835A publication Critical patent/CN112417835A/en
Application granted granted Critical
Publication of CN112417835B publication Critical patent/CN112417835B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention relates to a purchase file intelligent examination method and system based on natural language processing technology, and belongs to the technical field of intelligent text examination of project purchase data. Firstly, solidifying a template on a document line by adopting a web technology and a framework for a technical specification book and a research evaluation book; exporting core field data of the work item part of the technical specification book and the scientific estimation book after solidification, and carrying out data preprocessing; and analyzing the core field data of the technical specification book and the core field data of the research evaluation book by adopting a similarity algorithm to obtain an inspection report. The invention reduces the work of repeatability and complexity in manual examination, avoids detail errors caused by high-load manual examination, and is easy to popularize and apply.

Description

Intelligent purchasing file examination method and system based on natural language processing technology
Technical Field
The invention belongs to the technical field of intelligent text examination of project purchasing materials, and particularly relates to an intelligent examination method and system of purchasing files based on a natural language processing technology.
Background
Along with the promotion of the digital transformation of the power grid, the information center is used as a project construction subject, the number of informationized projects rises year by year, 275 informationized projects reaching the center are estimated by the company in 2020 province, and the total investment is nearly 3 hundred million. The template and the requirement related to the whole process of the informationized project are more, the project construction part is used as a function management department of project construction and bid-inviting purchasing, and the inspection of the template and the purchasing file in the project construction process is realized in a manual processing mode, so that the efficiency is low and errors are easy to occur. With the enhancement of audit consciousness and the improvement of project management lean, project management personnel need to carry out point-to-point examination on technical specification books and the abradable estimation work items, so that the technical specification books are ensured to be in an abradable category and have no leakage, and audit risks are avoided; and meanwhile, key point auditing needs to be carried out on the purchasing element list and the technical specification, so that the integrity and rationality of the purchasing file are ensured. However, due to the proliferation of project numbers, the timeliness requirement of bidding work is high, project management responsibilities need to examine project amounts of up to 59 sub-packets for two days, and the contradiction between manual examination quality and time is increasingly prominent. Once the quality inspection problem occurs, the project purchasing and the subsequent project construction are influenced. Therefore, how to overcome the defects of the prior art is a problem to be solved in the technical field of intelligent text examination of purchasing data at present.
Disclosure of Invention
The invention aims to solve the defects of the prior art and provides an intelligent examination method and system for purchasing files based on natural language processing technology.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
the intelligent purchasing file examining method based on natural language processing technology includes the following steps:
the method comprises the following steps of (1) solidifying a template on a document line by adopting a web technology and a framework for a technical specification book and a research evaluation book;
step (2), deriving core field data of the work item part of the technical specification book and the scientific estimation book after solidification, and carrying out data preprocessing;
the core fields in the technical specification comprise item current period preparation, item development and item popularization and implementation; the core fields in the research evaluation book comprise construction fees and equipment acquisition fees;
and (3) analyzing the core field data of the technical specification book processed in the step (2) and the core field data of the scientific evaluation book by adopting a similarity algorithm to obtain an inspection report.
Further, it is preferable that the technical specification book and the scientific evaluation book are standard document templates; the web technology and the control are adopted to solidify the content only to copy and identify the content, and the content cannot be modified, so that the content is used as a standard for document comparison.
Further, it is preferable that the specific method for solidifying the document by adopting the web technology and the control is as follows: aiming at project file templates in technical specification books and research evaluation books, writing corresponding form pages by using an element component library; the ActiveXObject control is used to export the data in the form into corresponding word and excel files.
Further, it is preferable that the construction fee field includes project development, project implementation, integrated development, project test, technical consultation; the equipment purchase fee includes hardware equipment purchase and system software purchase.
Further, preferably, in the step (2), the data preprocessing includes text word segmentation, regular matching, stop word processing, character string processing and reduction data.
Further, it is preferable that the text segmentation adopts the BiLSTM+CRF segmentation method.
Further, preferably, after the text word segmentation is completed, the text character strings are cleaned by using a regular matching mode, and special symbols and deactivated words are filtered to obtain a dictionary library.
Further, it is preferable that the normalized data use a principal component analysis algorithm, specifically as follows:
raw data x= { X, X 2 ,x 3 ,...,x n Required to be reduced to k dimension, x 1 To x n Representing the extracted word vector matrix;
1) Decentralizing, each eigenvector value minus the average of the respective eigenvector
2) Calculating covariance
3) Singular value decomposition method for covariance matrixIs defined as the feature value and feature vector;
4) Sorting the characteristic values from small to large, and selecting the largest kPersonal (S)Then corresponding to kPersonal (S)The feature vectors are respectively used as row vectors to form a feature vector matrix P;
5) Converting data to kPersonal (S)In the new space of feature vector construction, i.e. y=px, Y, i.e. X, is reduced from n-dimension to kDimension(s) After thatAs a result.
Further, preferably, in the step (3), the similarity algorithm adopts a comprehensive similarity algorithm, that is, three different similarity algorithms are adopted to calculate the similarity of the core field data, and then the weighted average mode is used for each similarity to obtain the comprehensive similarity, specifically
The method is as follows:
edit distance similarity of characters:
the adding operation:
d 1 =ED(A i-1 ,B j )+1
deletion operation:
d 2 =ED(A i ,B j-1 )+1
modification operation:
taking the smallest one of the 3 as the minimum editing distance to obtain a state transition equation:
in the above, d 1 ,d 2 ,d 3 Editing distance similarity of the adding, deleting and modifying operations is respectively represented; a and B represent two character strings to be compared; ED is an edit distance function;representing a minimum edit distance; l (L) A ,L B Respectively, the length when A or B is, A i Represents the ith in APersonal (S)A character; b (B) j Represents the j in BPersonal (S)A character;
jaccard coefficient similarity:
in the above-mentioned method, the step of,the number of the attributes of A and B is 0 at the same time; />The number of attributes with an attribute A of 0 and an attribute B of 1 is represented; />The number of attributes with an attribute A of 1 and an attribute B of 0 is represented; />The number of the attributes of A and B is 1 at the same time;
cosine similarity:
where cos α is the cosine distance between two strings, x i And y i Word vectors that are two characters;
the three kinds of similarity are subjected to a weighted average mode to obtain comprehensive similarity:
wherein lambda, lambda and lambda are coefficients corresponding to three similarity distances;
taking sentences as the minimum detection unit, obtaining the core field data of the technical specification book and the core field data similarity of the scientific estimation book through comprehensive similarity;
outputting the examination report of the core field.
The invention also provides a purchase file intelligent examination system based on the natural language processing technology, which adopts the purchase file intelligent examination method based on the natural language processing technology, and comprises the following steps:
the data acquisition device is used for acquiring technical specification books and a research evaluation book;
the template solidifying module is connected with the data acquisition device and is used for solidifying the acquired technical specification book and the research evaluation book until the content in the technical specification book and the research evaluation book can be copied only and identified, and the content cannot be modified;
the core field data deriving module is connected with the template solidifying module and is used for deriving core field data of the work item part of the technical specification book and the scientific estimation book after solidification;
the data preprocessing module is connected with the core field data processing module and is used for preprocessing the exported core field data;
the similarity analysis module is connected with the data preprocessing module and is used for analyzing the core field data of the technical specification book and the core field data of the scientific estimation book processed by the data preprocessing module by adopting a similarity algorithm;
and the report output module is used for outputting the unmatched items in a report form to obtain an examination report.
Further, it is preferable that the similarity degree is 90% or more, and is regarded as matching, otherwise, is not matched.
Compared with the prior art, the invention has the beneficial effects that:
according to the invention, the technical specification book and the research evaluation book are cured on line, so that the problem of document demand of a project responsible person in the whole process of informationized projects is solved, unnecessary communication cost caused by inconsistent document templates is reduced, the reduction of project construction efficiency caused by repeated reworking of editing and examination works is reduced, and purchasing efficiency is improved.
Through the research of the intelligent examination technology of the technical specification book and the research evaluation book, the first intelligent examination of the consistency of the technical specification book and the research evaluation book is realized, examination results are compared to form an examination report, and according to the examination results, whether manual examination is needed in the next step or not is proposed and the content needing to be reviewed is prompted. The artificial intelligence means is applied to informationized project management work, thereby greatly reducing the workload of manual examination, improving the examination efficiency of technical specifications, avoiding the problem that the accuracy of manual examination is low, reducing the audit risk and improving the purchase quality and project management quality.
The key parts of the technical specification book and the research evaluation book are compared through the natural language processing technology, so that the application of the natural language technology in the technical specification book and the research evaluation book is realized; the application of the web document curing technology in the examination of the technical specification book and the scientific estimation book in the power industry is realized by adopting the web technology to the technical specification book and the scientific estimation book; the ActiveXObject control is used to export the data in the form into corresponding word and excel files.
Drawings
FIG. 1 is a flow chart of BiLSTM+CRF segmentation;
FIG. 2 is a schematic diagram of a system for intelligently inspecting purchasing files based on natural language processing technology;
101, a data acquisition device; 102. a template curing module; 103. a core field data handling module; 104. a data preprocessing module; 105. a similarity analysis module; 106. and a report output module.
Detailed Description
The present invention will be described in further detail with reference to examples.
It will be appreciated by those skilled in the art that the following examples are illustrative of the present invention and should not be construed as limiting the scope of the invention. The specific techniques or conditions are not identified in the examples and are performed according to techniques or conditions described in the literature in this field or according to the product specifications. The materials or equipment used are conventional products available from commercial sources, not identified to the manufacturer.
Example 1
The intelligent purchasing file examining method based on natural language processing technology is characterized by comprising the following steps:
the method comprises the following steps of (1) solidifying a template on a document line by adopting a web technology and a framework for a technical specification book and a research evaluation book;
step (2), deriving core field data of the work item part of the technical specification book and the scientific estimation book after solidification, and carrying out data preprocessing;
the core fields in the technical specification comprise item current period preparation, item development and item popularization and implementation; the core fields in the research evaluation book comprise construction fees and equipment acquisition fees;
and (3) analyzing the core field data of the technical specification book processed in the step (2) and the core field data of the scientific evaluation book by adopting a similarity algorithm to obtain an inspection report.
Example 2
The intelligent purchasing file examining method based on natural language processing technology is characterized by comprising the following steps:
the method comprises the following steps of (1) solidifying a template on a document line by adopting a web technology and a framework for a technical specification book and a research evaluation book;
step (2), deriving core field data of the work item part of the technical specification book and the scientific estimation book after solidification, and carrying out data preprocessing;
the core fields in the technical specification comprise item current period preparation, item development and item popularization and implementation; the core fields in the research evaluation book comprise construction fees and equipment acquisition fees;
and (3) analyzing the core field data of the technical specification book processed in the step (2) and the core field data of the scientific evaluation book by adopting a similarity algorithm to obtain an inspection report.
The technical specification book and the research evaluation book are standard document templates; the web technology and the control are adopted to solidify the content only to copy and identify the content, and the content cannot be modified, so that the content is used as a standard for document comparison.
The specific method for solidifying the document by adopting the web technology and the control comprises the following steps: aiming at project file templates in technical specification books and research evaluation books, writing corresponding form pages by using an element component library; the ActiveXObject control is used to export the data in the form into corresponding word and excel files.
The construction fee field comprises project development, project implementation, integrated development, project test and technical consultation; the equipment purchase fee includes hardware equipment purchase and system software purchase.
In the step (2), the data preprocessing mode comprises text word segmentation, regular matching, stop word processing, character string processing and data normalization.
The text word segmentation adopts BiLSTM+CRF word segmentation method.
After the text word segmentation is completed, the text character strings are cleaned in a regular matching mode, and special symbols and stop words are filtered to obtain a dictionary library.
The normalized data uses a principal component analysis algorithm, specifically as follows:
raw data x= { X, X 2 ,x 3 ,...,x n Required to be reduced to k dimension, x 1 To x n Representing the extracted word vector matrix;
2) Decentralizing, each eigenvector value minus the average of the respective eigenvector
3) Calculating covariance
3) Singular value decomposition method for covariance matrixIs defined as the feature value and feature vector;
4) Sorting the eigenvalues from small to large, selecting the largest k eigenvectors, and then respectively taking the k eigenvectors corresponding to the largest k eigenvectors as row vectors to form an eigenvector matrix P;
5) The data is converted into a new space built up of k eigenvectors, i.e., y=px, and Y, i.e., X, is the result of the reduction from n dimensions to k dimensions.
In the step (3), the similarity algorithm adopts a comprehensive similarity algorithm, namely, three different similarity algorithms are adopted to calculate the similarity of the core field data, and then the comprehensive similarity is obtained by using a weighted average mode for each similarity, and the specific processing mode is as follows:
edit distance similarity of characters:
the adding operation:
d 1 =ED(A i-1 ,B j )+1
deletion operation:
d 2 =ED(A i ,B j-1 )+1
modification operation:
taking the smallest one of the 3 as the minimum editing distance to obtain a state transition equation:
in the above, d 1 ,d 2 ,d 3 Editing distance similarity of the adding, deleting and modifying operations is respectively represented; a and B represent two character strings to be compared; ED is an edit distance function;representing a minimum edit distance; l (L) A ,L B Respectively, the length when A or B is, A i Represents the ith in APersonal (S)A character; b (B) j Represents the j in BPersonal (S)A character;
Jaccardcoefficient similarity:
in the above-mentioned method, the step of,the number of the attributes of A and B is 0 at the same time; />The number of attributes with an attribute A of 0 and an attribute B of 1 is represented; />The number of attributes with an attribute A of 1 and an attribute B of 0 is represented; />The number of the attributes of A and B is 1 at the same time;
cosine similarity:
in the middle ofCos α is the cosine distance between two strings, x i And y i Word vectors that are two characters;
the three kinds of similarity are subjected to a weighted average mode to obtain comprehensive similarity:
wherein lambda, lambda and lambda are coefficients corresponding to three similarity distances;
taking sentences as the minimum detection unit, obtaining the core field data of the technical specification book and the core field data similarity of the scientific estimation book through comprehensive similarity;
outputting the examination report of the core field.
As shown in fig. 2, the intelligent inspection system for purchasing files based on natural language processing technology adopts the intelligent inspection method for purchasing files based on natural language processing technology, and is characterized by comprising the following steps:
the data acquisition device 101 is used for acquiring technical specification books and a research evaluation book;
the template solidifying module 102 is connected with the data acquisition device 101 and is used for solidifying the acquired technical specification book and the research evaluation book until the content in the technical specification book and the research evaluation book can be copied and identified and cannot be modified;
the core field data export module 103 is connected with the template curing module 102 and is used for exporting the core field data of the work item part of the technical specification book and the scientific estimation book after curing;
the data preprocessing module 104 is connected with the core field data processing module 103 and is used for performing data preprocessing on the exported core field data;
the similarity analysis module 105 is connected with the data preprocessing module 104 and is used for analyzing the core field data of the technical specification book and the core field data of the scientific estimation book processed by the data preprocessing module 104 by adopting a similarity algorithm;
and the report output module 106 is used for outputting the mismatch item in a report form to obtain an examination report.
Example 3
The intelligent purchasing file examining method based on natural language processing technology includes the following steps:
(1) The solidification of the template on the document line is realized by adopting web technology and a framework for the technical specification book and the research evaluation book;
(2) The technical specification book and core field data of a work item part of the scientific evaluation book are exported through document solidification, and data preprocessing is carried out;
(3) And analyzing the document to be inspected by adopting a similarity algorithm to obtain a preliminary inspection report.
In the step (1), the technical specification book and the scientific estimation book are standard document templates. The method adopts web technology and a framework to solidify the document, and is used as a standard for document comparison.
In the step (2), the data preprocessing mode comprises regular matching, text word segmentation, stop word processing, character string processing and data normalization.
The text word segmentation adopts a cyclic neural network word segmentation method (the flow is shown in figure 1);
after word segmentation is completed, cleaning text character strings by using a regular expression, and filtering special symbols and stop words to obtain a dictionary library;
the principle of data reduction is as follows:
raw data x= { X, X 2 ,x 3 ,...,x n Required to be reduced to k dimension, x 1 To x n Representing the extracted word vector matrix;
1) Decentralizing, each eigenvector value minus the average of the respective eigenvector, i.e. x 1 To x n Vector matrix various decentration;
2) Calculating covariance
3) Covariance matrix solving by eigenvalue decomposition methodIs defined as the feature value and feature vector;
4) Sorting the eigenvalues from small to large, selecting the largest k eigenvectors, and then respectively taking the k eigenvectors corresponding to the largest k eigenvectors as row vectors to form an eigenvector matrix P;
5) The data is converted into a new space built up of k eigenvectors, i.e., y=px, and Y is the result of the reduction from n to k dimensions.
In the step (3), the similarity algorithm adopts a comprehensive similarity algorithm, namely, three different similarity algorithms are adopted to calculate the similarity of the core field data, and then the comprehensive similarity is obtained by using a weighted average mode for each similarity, and the specific processing mode is as follows:
edit distance similarity of characters:
the adding operation:
d 1 =ED(A i-1 ,B j )+1
deletion operation:
d 2 =ED(A i ,B j-1 )+1
modification operation:
taking the smallest one of the 3 as the minimum editing distance to obtain a state transition equation:
Jaccardcoefficient similarity:
cosine similarity:
the three kinds of similarity are subjected to a weighted average mode to obtain comprehensive similarity:
wherein lambda, lambda and lambda are coefficients corresponding to three similarity distances; preferably 0.2, 0.4.
Taking sentences as a minimum detection unit, and obtaining the similarity of core field data of the document to be checked and core field data of the curing template through comprehensive similarity;
outputting a review report of the core field of the document to be reviewed.
Preferably, a technical specification book and a research evaluation book of a specified template are imported, function item names (core field data) in a function item table in the two documents are extracted, similarity calculation is carried out, 90% and more similarity is determined as matching, and unmatched items are provided for a project manager to carry out manual check in a report mode.
Application instance
(1) The document solidifies. The technical specification book and the specification document of the scientific evaluation book are solidified first.
(2) And (5) document comparison. (1) And (3) deriving the core field data of the work item part of the solidified technical specification book and the scientific estimation book, and preprocessing the core field by adopting a natural language processing technology. (2) And calculating the similarity between the core field and the core field in the solidified document, and if the similarity is greater than 90%, considering the document as qualified, otherwise, disqualified.
(3) And outputting the document. And exporting the data in the form into corresponding word and excel files through an ActiveXObject control. Wherein the unmatched items are output in the form of a report.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (8)

1. The intelligent purchasing file examining method based on natural language processing technology is characterized by comprising the following steps:
the method comprises the following steps of (1) solidifying a template on a document line by adopting a web technology and a framework for a technical specification book and a research evaluation book;
step (2), deriving core field data of the work item part of the technical specification book and the scientific estimation book after solidification, and carrying out data preprocessing;
the core fields in the technical specification comprise item current period preparation, item development and item popularization and implementation; the core fields in the research evaluation book comprise construction fees and equipment acquisition fees;
step (3), analyzing the core field data of the technical specification book processed in the step (2) and the core field data of the research evaluation book by adopting a similarity algorithm to obtain an inspection report;
the technical specification book and the research evaluation book are standard document templates; solidifying the content by adopting a web technology and a control until the content can only be copied and identified, and the content cannot be modified, so that the content can be used as a standard for document comparison;
the specific method for solidifying the document by adopting the web technology and the control comprises the following steps: aiming at project file templates in technical specification books and research evaluation books, writing corresponding form pages by using an element component library; the ActiveXObject control is used to export the data in the form into corresponding word and excel files.
2. The intelligent review method for purchasing files based on natural language processing technology according to claim 1, wherein the construction fee field comprises project development, project implementation, integrated development, project test, and technical consultation; the equipment purchase fee includes hardware equipment purchase and system software purchase.
3. The intelligent review method for purchasing files based on natural language processing technology according to claim 1, wherein in the step (2), the data preprocessing mode includes text word segmentation, regular matching, stop word processing, character string processing and normalized data.
4. The intelligent examination method for purchasing files based on natural language processing technology as claimed in claim 3, wherein the text word segmentation adopts BiLSTM+CRF word segmentation method.
5. The intelligent examination method of purchasing files based on natural language processing technology as claimed in claim 3, wherein after text word segmentation is completed, text character strings are cleaned by using a regular matching mode, and special symbols and deactivated words are filtered to obtain a dictionary library.
6. The intelligent review method for purchasing files based on natural language processing technology as claimed in claim 3, wherein the normalized data uses a principal component analysis algorithm, specifically comprising the following steps:
raw data x= { X, X 2 ,x 3 ,...,x n Required to be reduced to k dimensions, x1 to x n Representing the extracted word vector matrix;
1) Decentralizing, wherein each characteristic vector value subtracts the average value of the respective characteristic vector;
2) Calculating covariance
3) Singular value decomposition method for covariance matrixIs defined as the feature value and feature vector;
4) Sorting the eigenvalues from small to large, selecting the largest k eigenvectors, and then respectively taking the k eigenvectors corresponding to the largest k eigenvectors as row vectors to form an eigenvector matrix P;
5) The data is converted into a new space built up of k eigenvectors, i.e., y=px, and Y, i.e., X, is the result of the reduction from n dimensions to k dimensions.
7. The intelligent review method of purchasing files based on natural language processing technology according to claim 1, wherein in the step (3), the similarity algorithm adopts a comprehensive similarity algorithm, namely, three different similarity algorithms are adopted to calculate the similarity of the core field data, and then the comprehensive similarity is obtained by using a weighted average mode for each similarity, wherein the specific processing mode is as follows:
edit distance similarity of characters:
the adding operation:
d 1 =ED(A i-1 ,B j )+1
deletion operation:
d 2 =ED(A i ,B j-1 )+1
modification operation:
taking the smallest one of the 3 as the minimum editing distance to obtain a state transition equation:
in the above, d 1 ,d 2 ,d 3 Editing distance similarity of the adding, deleting and modifying operations is respectively represented; a and B represent two character strings to be compared; ED is an edit distance function;representing a minimum edit distance; l (L) A ,L B Respectively the length when a or B is,A i representing the ith character in A; b (B) j Represents the j-th character in B;
jaccard coefficient similarity:
in the above-mentioned method, the step of,the number of the attributes of A and B is 0 at the same time; />The number of attributes with an attribute A of 0 and an attribute B of 1 is represented; />The number of attributes with an attribute A of 1 and an attribute B of 0 is represented; />The number of the attributes of A and B is 1 at the same time;
cosine similarity:
where cos α is the cosine distance between two strings, x i And y i Word vectors that are two characters;
the three kinds of similarity are subjected to a weighted average mode to obtain comprehensive similarity:
wherein lambda is 1 、λ 2 、λ 3 Coefficients corresponding to three similarity distances;
taking sentences as the minimum detection unit, obtaining the core field data of the technical specification book and the core field data similarity of the scientific estimation book through comprehensive similarity;
outputting the examination report of the core field.
8. The intelligent purchasing file inspection system based on natural language processing technology adopts the intelligent purchasing file inspection method based on natural language processing technology as set forth in any one of claims 1 to 7, and is characterized by comprising the following steps:
the data acquisition device (101) is used for acquiring technical specification books and a research evaluation book;
the template solidifying module (102) is connected with the data acquisition device (101) and is used for solidifying the acquired technical specification book and the research evaluation book until the content in the technical specification book and the research evaluation book can only be copied and identified and cannot be modified;
the core field data export module (103) is connected with the template solidifying module (102) and is used for exporting the core field data of the work item part of the technical specification book and the scientific estimation book after solidification;
the data preprocessing module (104) is connected with the core field data export module (103) and is used for preprocessing the exported core field data;
the similarity analysis module (105) is connected with the data preprocessing module (104) and is used for analyzing the core field data of the technical specification book and the core field data of the research evaluation book processed by the data preprocessing module (104) by adopting a similarity algorithm;
and the report output module (106) is used for outputting the mismatch item in a report form to obtain an examination report.
CN202011299881.3A 2020-11-18 2020-11-18 Intelligent purchasing file examination method and system based on natural language processing technology Active CN112417835B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011299881.3A CN112417835B (en) 2020-11-18 2020-11-18 Intelligent purchasing file examination method and system based on natural language processing technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011299881.3A CN112417835B (en) 2020-11-18 2020-11-18 Intelligent purchasing file examination method and system based on natural language processing technology

Publications (2)

Publication Number Publication Date
CN112417835A CN112417835A (en) 2021-02-26
CN112417835B true CN112417835B (en) 2023-11-14

Family

ID=74773489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011299881.3A Active CN112417835B (en) 2020-11-18 2020-11-18 Intelligent purchasing file examination method and system based on natural language processing technology

Country Status (1)

Country Link
CN (1) CN112417835B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113112246A (en) * 2021-05-06 2021-07-13 成都文驰科技有限公司 Citation standard validity detection method
CN113378560B (en) * 2021-07-02 2023-07-18 贵州电网有限责任公司 Test report intelligent diagnosis analysis method based on natural language processing
CN115239211A (en) * 2022-09-22 2022-10-25 国家电投集团科学技术研究院有限公司 Method, device and system for researching and examining photovoltaic power generation project and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107102998A (en) * 2016-02-22 2017-08-29 阿里巴巴集团控股有限公司 A kind of String distance computational methods and device
CN108241605A (en) * 2017-12-13 2018-07-03 广西电网有限责任公司电力科学研究院 A kind of technical report standardization write method based on VC
CN109190092A (en) * 2018-08-15 2019-01-11 深圳平安综合金融服务有限公司上海分公司 The consistency checking method of separate sources file
CN110110982A (en) * 2019-04-26 2019-08-09 特赞(上海)信息科技有限公司 The checking method and device of intention material
CN111104794A (en) * 2019-12-25 2020-05-05 同方知网(北京)技术有限公司 Text similarity matching method based on subject words
CN111709235A (en) * 2020-05-28 2020-09-25 上海发电设备成套设计研究院有限责任公司 Text data statistical analysis system and method based on natural language processing
CN111861366A (en) * 2020-06-08 2020-10-30 远光软件股份有限公司 Project-ground intelligent auditing system and computer

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10216715B2 (en) * 2015-08-03 2019-02-26 Blackboiler Llc Method and system for suggesting revisions to an electronic document

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107102998A (en) * 2016-02-22 2017-08-29 阿里巴巴集团控股有限公司 A kind of String distance computational methods and device
CN108241605A (en) * 2017-12-13 2018-07-03 广西电网有限责任公司电力科学研究院 A kind of technical report standardization write method based on VC
CN109190092A (en) * 2018-08-15 2019-01-11 深圳平安综合金融服务有限公司上海分公司 The consistency checking method of separate sources file
CN110110982A (en) * 2019-04-26 2019-08-09 特赞(上海)信息科技有限公司 The checking method and device of intention material
CN111104794A (en) * 2019-12-25 2020-05-05 同方知网(北京)技术有限公司 Text similarity matching method based on subject words
CN111709235A (en) * 2020-05-28 2020-09-25 上海发电设备成套设计研究院有限责任公司 Text data statistical analysis system and method based on natural language processing
CN111861366A (en) * 2020-06-08 2020-10-30 远光软件股份有限公司 Project-ground intelligent auditing system and computer

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Comparative Study on using Principle Component Analysis with different Text Classifiers;D. A, Ahmed I, Safaa S;International Journal of Computer Applications;第180卷(第31期);1-7 *
工程服务类技术规范书标准化路径研究;张驰;徐承松;;科技创新导报(第30期);164+166 *
自然语言处理技术在建筑工程中的应用研究综述;王煜;图学学报;第41卷(第04期);501-511 *

Also Published As

Publication number Publication date
CN112417835A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
CN112417835B (en) Intelligent purchasing file examination method and system based on natural language processing technology
CN109992668B (en) Self-attention-based enterprise public opinion analysis method and device
CN115547466B (en) Medical institution registration and review system and method based on big data
Ling et al. Intelligent document processing based on RPA and machine learning
CN113157918B (en) Commodity name short text classification method and system based on attention mechanism
CN111462752A (en) Client intention identification method based on attention mechanism, feature embedding and BI-L STM
Chen et al. Data analysis and knowledge discovery in web recruitment—based on big data related jobs
CN112632958A (en) Contract document examination and analysis method based on contract knowledge base
CN111931665B (en) Under-sampling face recognition method based on intra-class variation dictionary modeling
CN111814457A (en) Power grid engineering contract text generation method
CN116245107B (en) Electric power audit text entity identification method, device, equipment and storage medium
Chen et al. A Meta-Learning Framework for Predicting Power Digital Equipment Defect Texts via Hypergraph Modeling
Min Power patent classification method based on deep neural network
Wang et al. AIS: A nonlinear activation function for industrial safety engineering
CN114169452A (en) Information loss prevention method and system for industrial big data feature extraction
CN116128058B (en) Heterogeneous power generation equipment state judging method, heterogeneous power generation equipment state judging device, storage medium and heterogeneous power generation equipment
Du et al. Internal risk assessment of whole process engineering consulting consortium based on GRA-TOPSIS-FMEA
CN112698833B (en) Feature attachment code taste detection method based on local and global features
Xia Construction method of enterprise-level database operational measurement platform based on bidirectional coupling algorithm
Ma et al. Research on user conversational sentiment analysis based on deep learning
CN117540035A (en) RPA knowledge graph construction method based on entity type information fusion
Li et al. Research on construction method of knowledge graph-based on mobile phone quality detection
Wang et al. Core Index System of Provincial Power Grid Company’s Economic Activity Analysis
Kang Fusion analysis of management accounting and financial accounting based on data mining
CN116628190A (en) Positive and negative emotion analysis method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant