CN113449504A - Intelligent marking method and system - Google Patents
Intelligent marking method and system Download PDFInfo
- Publication number
- CN113449504A CN113449504A CN202110712073.3A CN202110712073A CN113449504A CN 113449504 A CN113449504 A CN 113449504A CN 202110712073 A CN202110712073 A CN 202110712073A CN 113449504 A CN113449504 A CN 113449504A
- Authority
- CN
- China
- Prior art keywords
- text
- bidding document
- module
- texts
- scoring
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Character Discrimination (AREA)
Abstract
The invention provides a method and a system for intelligently scoring a bidding document, which relate to the technical field of intelligent scoring of the bidding document, and the system for intelligently scoring the bidding document comprises the following steps: the triggering module is mainly used for receiving a triggering instruction of a user, generating a bidding document analysis instruction after receiving the triggering instruction of the user, and sending the bidding document analysis instruction to the transmission module; the storage module is mainly used for storing the uploaded bidding document file so as to facilitate subsequent processing; the rich text acquisition module extracts texts and related information thereof from the bidding document file, wherein the related information at least comprises texts, styles, positions and tables and pictures corresponding to the texts; and the transmission module is used for receiving the bid statement analysis instruction, formatting the bid statement analysis instruction and transmitting the formatted bid statement analysis instruction to the rich text acquisition module for text data acquisition operation. The invention can greatly improve the usability of the system and reduce the workload of the expert for review calculation by extracting the objective score calculation formula of the scoring standard, and has higher practicability and obvious progress.
Description
Technical Field
The invention relates to the technical field of intelligent bidding document scoring, in particular to an intelligent bidding document scoring method and system.
Background
With the popularity of electronic bidding, the amount of business is increasing, and the bidding documents for electronic bidding are increasing, generally, one bidding document is about 100 + 200 million, while the size of the extra large bidding document may be over giga. A benchmarking system is typically required for batch parsing.
The existing bidding document files need to be scored after being analyzed, the existing scoring is usually manually extracted, the existing bidding document files are troublesome and tedious, the efficiency is low, meanwhile, the error rate is high, the actual use is not facilitated, and the existing bidding document files need to be improved.
Disclosure of Invention
The invention aims to solve the defects in the prior art and provides an intelligent marking method and system for a bidding document.
In order to achieve the purpose, the invention adopts the following technical scheme: an intelligent bidding document scoring method and system are provided, wherein the intelligent bidding document scoring system comprises:
the triggering module is mainly used for receiving a triggering instruction of a user, generating a bidding document analysis instruction after receiving the triggering instruction of the user, and sending the bidding document analysis instruction to the transmission module;
the storage module is mainly used for storing the uploaded bidding document file so as to facilitate subsequent processing;
the rich text acquisition module extracts texts and related information thereof from the bidding document file, wherein the related information at least comprises texts, styles, positions and tables and pictures corresponding to the texts;
the transfer module is used for receiving the bid statement analysis instruction, formatting the bid statement analysis instruction and transmitting the formatted bid statement analysis instruction to the rich text acquisition module to perform text data acquisition operation;
the scoring standard word segmentation library establishing module is used for establishing a word library of scoring standard words by a word segmentation method so as to segment the words of the text;
the semantic rule model establishing module is used for analyzing the text after word segmentation, establishing a semantic rule needing to be extracted and finally outputting the semantic rule;
the matching module is used for performing matching operation on the bidding document text through the semantic rule, and performing formula calculation on the result after the result is matched to obtain a bidding document score;
and the text cleaning module is mainly used for replacing special characters, cleaning general texts, and correcting the easily-mixed characters after OCR recognition while cleaning the texts based on analysis configuration.
In order to improve the intelligence degree of the invention, the word segmentation method specifically comprises the following steps: and performing word segmentation processing by using a deep learning model comprising Bi-LSTM + CRF, and outputting a word segmentation word bank.
In order to improve the accuracy of the invention, the Bi-LSTM + CRF specifically comprises the following components: firstly training a Word vector, training a 50-dimensional vector for the Word of the corpus by using Word2vec, then accessing a Bi-LSTM for modeling the semantic information of the whole sentence, and finally accessing a CRF to complete Word segmentation.
In order to improve the scoring accuracy of the invention, after the text cleaning module corrects the confusable characters, redundant spaces are deleted, so that a clean text is obtained.
An intelligent scoring method for a bidding document comprises the following steps:
s1: triggering, namely after a user triggers an instruction, starting to intelligently analyze the bidding document, and jumping to the next step for execution;
s2: acquiring a rich text, extracting the text and related information from the bidding document, and transmitting the related information to the next step for execution;
s3: text cleaning, namely deleting special characters through a text cleaning module, cleaning general texts, correcting easily-mixed characters after OCR recognition, deleting redundant spaces to obtain clean texts, and transmitting the clean texts to the next step for execution;
s4: segmenting words, establishing a word bank of scoring standard segmentation by the clean text through a word segmentation method, thereby segmenting words of the text to obtain a segmented text;
s5: establishing semantic rules, analyzing the word text, establishing semantic rules needing to be extracted, and finally outputting the semantic rules;
s6: and matching results, namely performing matching operation on the word segmentation texts through the semantic rules, and performing formula calculation on the results after the results are matched to obtain the bidding document score.
In order to clearly show the relevant information, in step S2, the relevant information at least includes text, style, position, and table and picture corresponding to the text.
In order to facilitate the error elimination of the staff, in step S6, the bidding document score at least includes a calculation formula and a score.
In order to improve the accuracy of the present invention, in step S3, the step of correcting the confusing character recognized by the OCR is specifically: and correcting the confusing words after OCR recognition according to the context and common errors.
Compared with the prior art, the method has the advantages that the automatic calculation of word segmentation and bidding document grading is carried out through the deep learning model of Bi-LSTM + CRF, the method is simple and rapid, meanwhile, the paper bidding documents can be input through the rich text acquisition module and the text cleaning module, the universality of the method is improved, meanwhile, the text cleaning module can effectively reduce errors in character conversion, the conversion quality is improved, the extraction of the objective scoring calculation formula of the grading standard can greatly improve the usability of the system and reduce the workload of review calculation of experts, the practicability is high, and the progress is obvious.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is an architecture diagram of an intelligent bidding document scoring method and system according to the present invention;
fig. 2 is a step diagram of an intelligent bidding document scoring method and system provided by the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In a first embodiment, referring to fig. 1-2, an intelligent bidding document scoring system includes:
the triggering module is mainly used for receiving a triggering instruction of a user, generating a bidding document analysis instruction after receiving the triggering instruction of the user, and sending the bidding document analysis instruction to the transmission module;
the storage module is mainly used for storing the uploaded bidding document file so as to facilitate subsequent processing;
the rich text acquisition module extracts texts and related information thereof from the bidding document file, wherein the related information at least comprises texts, styles, positions and tables and pictures corresponding to the texts;
the transfer module is used for receiving the bid statement analysis instruction, formatting the bid statement analysis instruction and transmitting the formatted bid statement analysis instruction to the rich text acquisition module to perform text data acquisition operation;
the scoring standard word segmentation library establishing module is used for establishing a word library of scoring standard words by a word segmentation method so as to segment the words of the text;
the semantic rule model establishing module is used for analyzing the text after word segmentation, establishing a semantic rule needing to be extracted and finally outputting the semantic rule;
the matching module is used for performing matching operation on the bidding document text through the semantic rule, and performing formula calculation on the result after the result is matched to obtain a bidding document score;
and the text cleaning module is mainly used for replacing special characters, cleaning general texts, and correcting the easily-mixed characters after OCR recognition while cleaning the texts based on analysis configuration.
In this embodiment, the word segmentation method specifically includes: the method uses a deep learning model comprising Bi-LSTM + CRF to perform word segmentation processing, outputs a word segmentation word bank, and in a unidirectional cyclic neural network, the model only uses the information of 'upper part' in practice, but does not consider the information of 'lower part', and predicts the information which possibly needs to use the whole input sequence in a practical scene, so that the bidirectional cyclic neural network which is mainly used in the industry at present is a bidirectional cyclic neural network, namely Bi-LSTM, the bidirectional cyclic neural network combines a cyclic neural network moving from the sequence starting point and a cyclic neural network moving from the sequence end to the sequence starting point, and as an extension of the cyclic neural network, LSTM can naturally combine a reverse sequence to form a bidirectional long-short time memory network, and CRF models a target sequence on the basis of an observation sequence, thereby mainly solving the problem of serialization labeling, the conditional random domain model has the advantages of a discriminant model, and has the characteristics that the generation model considers the transition probability among context marks and performs global parameter optimization and decoding in a serialization mode.
In this embodiment, the Bi-LSTM + CRF specifically includes: firstly training a Word vector, training a 50-dimensional vector for the Word of the corpus by using Word2vec, then accessing a Bi-LSTM for modeling semantic information of the whole sentence, and finally accessing a CRF for completing Word segmentation, wherein the accuracy rate of Word segmentation can be greatly increased by using Bi-LSTM + CRF, and the practicability is high.
In the embodiment, after the text cleaning module corrects the confusing characters, redundant spaces are deleted, so that a clean text is obtained, errors in character conversion can be effectively reduced by the text cleaning module, and the conversion quality is improved.
An intelligent scoring method for a bidding document comprises the following steps:
s1: triggering, namely after a user triggers an instruction, starting to intelligently analyze the bidding document, and jumping to the next step for execution;
s2: acquiring a rich text, extracting the text and related information from the bidding document, and transmitting the related information to the next step for execution;
s3: text cleaning, namely deleting special characters through a text cleaning module, cleaning general texts, correcting easily-mixed characters after OCR recognition, deleting redundant spaces to obtain clean texts, and transmitting the clean texts to the next step for execution;
s4: segmenting words, establishing a word bank of scoring standard segmentation by the clean text through a word segmentation method, thereby segmenting words of the text to obtain a segmented text;
s5: establishing semantic rules, analyzing the word text, establishing semantic rules needing to be extracted, and finally outputting the semantic rules;
s6: and matching results, namely performing matching operation on the word segmentation texts through the semantic rules, and performing formula calculation on the results after the results are matched to obtain the bidding document score.
In this embodiment, in the step S2, the related information at least includes a text, a style, a position, and a table and a picture corresponding to the text, and the related information can be clearly expressed by the text, the style, the position, and the table and the picture corresponding to the text.
In this embodiment, in the step S6, the bidding document score at least includes a calculation formula and a score, and the score can be clearly and visually displayed in the calculation process through the calculation formula, so that the staff can conveniently check and debug.
In this embodiment, in step S3, the step of correcting the confusing character recognized by the OCR is specifically: and the confusing characters after OCR recognition are corrected according to the context and common errors, so that the influence of wrongly written characters generated in the OCR recognition process on the scoring process is avoided.
It can be seen from the above embodiments that the invention performs automatic calculation of word segmentation and bidding document scoring through the deep learning model of Bi-LSTM + CRF, which is simpler and faster, and can enter the paper bidding document through the rich text acquisition module and the text cleaning module, thereby increasing the versatility of the invention, and the text cleaning module can effectively reduce the error in text conversion, and improve the conversion quality.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (8)
1. An intelligent bidding document scoring system, comprising:
the triggering module is mainly used for receiving a triggering instruction of a user, generating a bidding document analysis instruction after receiving the triggering instruction of the user, and sending the bidding document analysis instruction to the transmission module;
the storage module is mainly used for storing the uploaded bidding document file so as to facilitate subsequent processing;
the rich text acquisition module extracts texts and related information thereof from the bidding document file, wherein the related information at least comprises texts, styles, positions and tables and pictures corresponding to the texts;
the transfer module is used for receiving the bid statement analysis instruction, formatting the bid statement analysis instruction and transmitting the formatted bid statement analysis instruction to the rich text acquisition module to perform text data acquisition operation;
the scoring standard word segmentation library establishing module is used for establishing a word library of scoring standard words by a word segmentation method so as to segment the words of the text;
the semantic rule model establishing module is used for analyzing the text after word segmentation, establishing a semantic rule needing to be extracted and finally outputting the semantic rule;
the matching module is used for performing matching operation on the bidding document text through the semantic rule, and performing formula calculation on the result after the result is matched to obtain a bidding document score;
and the text cleaning module is mainly used for replacing special characters, cleaning general texts, and correcting the easily-mixed characters after OCR recognition while cleaning the texts based on analysis configuration.
2. The intelligent bidding document scoring system according to claim 1, wherein: the word segmentation method specifically comprises the following steps: and performing word segmentation processing by using a deep learning model comprising Bi-LSTM + CRF, and outputting a word segmentation word bank.
3. The intelligent bidding document scoring system according to claim 2, wherein: the Bi-LSTM + CRF specifically comprises the following components: firstly training a Word vector, training a 50-dimensional vector for the Word of the corpus by using Word2vec, then accessing a Bi-LSTM for modeling the semantic information of the whole sentence, and finally accessing a CRF to complete Word segmentation.
4. The intelligent bidding document scoring system according to claim 1, wherein: after the text cleaning module corrects the confusing characters, redundant spaces are deleted, so that clean text is obtained.
5. The intelligent bidding document scoring method according to claim 1, comprising the following steps:
s1: triggering, namely after a user triggers an instruction, starting to intelligently analyze the bidding document, and jumping to the next step for execution;
s2: acquiring a rich text, extracting the text and related information from the bidding document, and transmitting the related information to the next step for execution;
s3: text cleaning, namely deleting special characters through a text cleaning module, cleaning general texts, correcting easily-mixed characters after OCR recognition, deleting redundant spaces to obtain clean texts, and transmitting the clean texts to the next step for execution;
s4: segmenting words, establishing a word bank of scoring standard segmentation by the clean text through a word segmentation method, thereby segmenting words of the text to obtain a segmented text;
s5: establishing semantic rules, analyzing the word text, establishing semantic rules needing to be extracted, and finally outputting the semantic rules;
s6: and matching results, namely performing matching operation on the word segmentation texts through the semantic rules, and performing formula calculation on the results after the results are matched to obtain the bidding document score.
6. The intelligent bidding document scoring method according to claim 5, wherein: in step S2, the related information at least includes text, style, position, and table and picture corresponding to the text.
7. The intelligent bidding document scoring method according to claim 5, wherein: in step S6, the bidding score includes at least a calculation formula and a score.
8. The intelligent bidding document scoring method according to claim 5, wherein: in step S3, the step of correcting the confusing character recognized by OCR is specifically: and correcting the confusing words after OCR recognition according to the context and common errors.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110712073.3A CN113449504A (en) | 2021-06-25 | 2021-06-25 | Intelligent marking method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110712073.3A CN113449504A (en) | 2021-06-25 | 2021-06-25 | Intelligent marking method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113449504A true CN113449504A (en) | 2021-09-28 |
Family
ID=77812858
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110712073.3A Pending CN113449504A (en) | 2021-06-25 | 2021-06-25 | Intelligent marking method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113449504A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114580362A (en) * | 2022-05-09 | 2022-06-03 | 四川野马科技有限公司 | System and method for generating return mark file |
-
2021
- 2021-06-25 CN CN202110712073.3A patent/CN113449504A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114580362A (en) * | 2022-05-09 | 2022-06-03 | 四川野马科技有限公司 | System and method for generating return mark file |
CN114580362B (en) * | 2022-05-09 | 2022-11-01 | 四川野马科技有限公司 | System and method for generating return mark file |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110363194B (en) | NLP-based intelligent examination paper reading method, device, equipment and storage medium | |
CN110717031B (en) | Intelligent conference summary generation method and system | |
CN108287858B (en) | Semantic extraction method and device for natural language | |
CN104050160B (en) | Interpreter's method and apparatus that a kind of machine is blended with human translation | |
CN109145260B (en) | Automatic text information extraction method | |
US11113323B2 (en) | Answer selection using a compare-aggregate model with language model and condensed similarity information from latent clustering | |
CN109446885B (en) | Text-based component identification method, system, device and storage medium | |
CN111353306B (en) | Entity relationship and dependency Tree-LSTM-based combined event extraction method | |
US20220414463A1 (en) | Automated troubleshooter | |
CN111581367A (en) | Method and system for inputting questions | |
CN112016320A (en) | English punctuation adding method, system and equipment based on data enhancement | |
CN113076720B (en) | Long text segmentation method and device, storage medium and electronic device | |
CN113408287B (en) | Entity identification method and device, electronic equipment and storage medium | |
CN115759119B (en) | Financial text emotion analysis method, system, medium and equipment | |
CN111401012A (en) | Text error correction method, electronic device and computer readable storage medium | |
CN109446522B (en) | Automatic test question classification system and method | |
CN110781291A (en) | Text abstract extraction method, device, server and readable storage medium | |
CN112151019A (en) | Text processing method and device and computing equipment | |
CN113449504A (en) | Intelligent marking method and system | |
CN113343701A (en) | Extraction method and device for text named entities of power equipment fault defects | |
CN112818693A (en) | Automatic extraction method and system for electronic component model words | |
CN113435213B (en) | Method and device for returning answers to user questions and knowledge base | |
CN111090720B (en) | Hot word adding method and device | |
CN114078470A (en) | Model processing method and device, and voice recognition method and device | |
Malkadi et al. | Improving code extraction from coding screencasts using a code-aware encoder-decoder model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |