CN113449504A - Intelligent marking method and system - Google Patents

Intelligent marking method and system Download PDF

Info

Publication number
CN113449504A
CN113449504A CN202110712073.3A CN202110712073A CN113449504A CN 113449504 A CN113449504 A CN 113449504A CN 202110712073 A CN202110712073 A CN 202110712073A CN 113449504 A CN113449504 A CN 113449504A
Authority
CN
China
Prior art keywords
text
bidding document
module
texts
scoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110712073.3A
Other languages
Chinese (zh)
Inventor
陈伟
白彩云
李鑫
许真真
吴晓乐
李志慧
李帅康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Xinyuan Information Technology Co ltd
Original Assignee
Zhengzhou Xinyuan Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Xinyuan Information Technology Co ltd filed Critical Zhengzhou Xinyuan Information Technology Co ltd
Priority to CN202110712073.3A priority Critical patent/CN113449504A/en
Publication of CN113449504A publication Critical patent/CN113449504A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Character Discrimination (AREA)

Abstract

The invention provides a method and a system for intelligently scoring a bidding document, which relate to the technical field of intelligent scoring of the bidding document, and the system for intelligently scoring the bidding document comprises the following steps: the triggering module is mainly used for receiving a triggering instruction of a user, generating a bidding document analysis instruction after receiving the triggering instruction of the user, and sending the bidding document analysis instruction to the transmission module; the storage module is mainly used for storing the uploaded bidding document file so as to facilitate subsequent processing; the rich text acquisition module extracts texts and related information thereof from the bidding document file, wherein the related information at least comprises texts, styles, positions and tables and pictures corresponding to the texts; and the transmission module is used for receiving the bid statement analysis instruction, formatting the bid statement analysis instruction and transmitting the formatted bid statement analysis instruction to the rich text acquisition module for text data acquisition operation. The invention can greatly improve the usability of the system and reduce the workload of the expert for review calculation by extracting the objective score calculation formula of the scoring standard, and has higher practicability and obvious progress.

Description

Intelligent marking method and system
Technical Field
The invention relates to the technical field of intelligent bidding document scoring, in particular to an intelligent bidding document scoring method and system.
Background
With the popularity of electronic bidding, the amount of business is increasing, and the bidding documents for electronic bidding are increasing, generally, one bidding document is about 100 + 200 million, while the size of the extra large bidding document may be over giga. A benchmarking system is typically required for batch parsing.
The existing bidding document files need to be scored after being analyzed, the existing scoring is usually manually extracted, the existing bidding document files are troublesome and tedious, the efficiency is low, meanwhile, the error rate is high, the actual use is not facilitated, and the existing bidding document files need to be improved.
Disclosure of Invention
The invention aims to solve the defects in the prior art and provides an intelligent marking method and system for a bidding document.
In order to achieve the purpose, the invention adopts the following technical scheme: an intelligent bidding document scoring method and system are provided, wherein the intelligent bidding document scoring system comprises:
the triggering module is mainly used for receiving a triggering instruction of a user, generating a bidding document analysis instruction after receiving the triggering instruction of the user, and sending the bidding document analysis instruction to the transmission module;
the storage module is mainly used for storing the uploaded bidding document file so as to facilitate subsequent processing;
the rich text acquisition module extracts texts and related information thereof from the bidding document file, wherein the related information at least comprises texts, styles, positions and tables and pictures corresponding to the texts;
the transfer module is used for receiving the bid statement analysis instruction, formatting the bid statement analysis instruction and transmitting the formatted bid statement analysis instruction to the rich text acquisition module to perform text data acquisition operation;
the scoring standard word segmentation library establishing module is used for establishing a word library of scoring standard words by a word segmentation method so as to segment the words of the text;
the semantic rule model establishing module is used for analyzing the text after word segmentation, establishing a semantic rule needing to be extracted and finally outputting the semantic rule;
the matching module is used for performing matching operation on the bidding document text through the semantic rule, and performing formula calculation on the result after the result is matched to obtain a bidding document score;
and the text cleaning module is mainly used for replacing special characters, cleaning general texts, and correcting the easily-mixed characters after OCR recognition while cleaning the texts based on analysis configuration.
In order to improve the intelligence degree of the invention, the word segmentation method specifically comprises the following steps: and performing word segmentation processing by using a deep learning model comprising Bi-LSTM + CRF, and outputting a word segmentation word bank.
In order to improve the accuracy of the invention, the Bi-LSTM + CRF specifically comprises the following components: firstly training a Word vector, training a 50-dimensional vector for the Word of the corpus by using Word2vec, then accessing a Bi-LSTM for modeling the semantic information of the whole sentence, and finally accessing a CRF to complete Word segmentation.
In order to improve the scoring accuracy of the invention, after the text cleaning module corrects the confusable characters, redundant spaces are deleted, so that a clean text is obtained.
An intelligent scoring method for a bidding document comprises the following steps:
s1: triggering, namely after a user triggers an instruction, starting to intelligently analyze the bidding document, and jumping to the next step for execution;
s2: acquiring a rich text, extracting the text and related information from the bidding document, and transmitting the related information to the next step for execution;
s3: text cleaning, namely deleting special characters through a text cleaning module, cleaning general texts, correcting easily-mixed characters after OCR recognition, deleting redundant spaces to obtain clean texts, and transmitting the clean texts to the next step for execution;
s4: segmenting words, establishing a word bank of scoring standard segmentation by the clean text through a word segmentation method, thereby segmenting words of the text to obtain a segmented text;
s5: establishing semantic rules, analyzing the word text, establishing semantic rules needing to be extracted, and finally outputting the semantic rules;
s6: and matching results, namely performing matching operation on the word segmentation texts through the semantic rules, and performing formula calculation on the results after the results are matched to obtain the bidding document score.
In order to clearly show the relevant information, in step S2, the relevant information at least includes text, style, position, and table and picture corresponding to the text.
In order to facilitate the error elimination of the staff, in step S6, the bidding document score at least includes a calculation formula and a score.
In order to improve the accuracy of the present invention, in step S3, the step of correcting the confusing character recognized by the OCR is specifically: and correcting the confusing words after OCR recognition according to the context and common errors.
Compared with the prior art, the method has the advantages that the automatic calculation of word segmentation and bidding document grading is carried out through the deep learning model of Bi-LSTM + CRF, the method is simple and rapid, meanwhile, the paper bidding documents can be input through the rich text acquisition module and the text cleaning module, the universality of the method is improved, meanwhile, the text cleaning module can effectively reduce errors in character conversion, the conversion quality is improved, the extraction of the objective scoring calculation formula of the grading standard can greatly improve the usability of the system and reduce the workload of review calculation of experts, the practicability is high, and the progress is obvious.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is an architecture diagram of an intelligent bidding document scoring method and system according to the present invention;
fig. 2 is a step diagram of an intelligent bidding document scoring method and system provided by the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In a first embodiment, referring to fig. 1-2, an intelligent bidding document scoring system includes:
the triggering module is mainly used for receiving a triggering instruction of a user, generating a bidding document analysis instruction after receiving the triggering instruction of the user, and sending the bidding document analysis instruction to the transmission module;
the storage module is mainly used for storing the uploaded bidding document file so as to facilitate subsequent processing;
the rich text acquisition module extracts texts and related information thereof from the bidding document file, wherein the related information at least comprises texts, styles, positions and tables and pictures corresponding to the texts;
the transfer module is used for receiving the bid statement analysis instruction, formatting the bid statement analysis instruction and transmitting the formatted bid statement analysis instruction to the rich text acquisition module to perform text data acquisition operation;
the scoring standard word segmentation library establishing module is used for establishing a word library of scoring standard words by a word segmentation method so as to segment the words of the text;
the semantic rule model establishing module is used for analyzing the text after word segmentation, establishing a semantic rule needing to be extracted and finally outputting the semantic rule;
the matching module is used for performing matching operation on the bidding document text through the semantic rule, and performing formula calculation on the result after the result is matched to obtain a bidding document score;
and the text cleaning module is mainly used for replacing special characters, cleaning general texts, and correcting the easily-mixed characters after OCR recognition while cleaning the texts based on analysis configuration.
In this embodiment, the word segmentation method specifically includes: the method uses a deep learning model comprising Bi-LSTM + CRF to perform word segmentation processing, outputs a word segmentation word bank, and in a unidirectional cyclic neural network, the model only uses the information of 'upper part' in practice, but does not consider the information of 'lower part', and predicts the information which possibly needs to use the whole input sequence in a practical scene, so that the bidirectional cyclic neural network which is mainly used in the industry at present is a bidirectional cyclic neural network, namely Bi-LSTM, the bidirectional cyclic neural network combines a cyclic neural network moving from the sequence starting point and a cyclic neural network moving from the sequence end to the sequence starting point, and as an extension of the cyclic neural network, LSTM can naturally combine a reverse sequence to form a bidirectional long-short time memory network, and CRF models a target sequence on the basis of an observation sequence, thereby mainly solving the problem of serialization labeling, the conditional random domain model has the advantages of a discriminant model, and has the characteristics that the generation model considers the transition probability among context marks and performs global parameter optimization and decoding in a serialization mode.
In this embodiment, the Bi-LSTM + CRF specifically includes: firstly training a Word vector, training a 50-dimensional vector for the Word of the corpus by using Word2vec, then accessing a Bi-LSTM for modeling semantic information of the whole sentence, and finally accessing a CRF for completing Word segmentation, wherein the accuracy rate of Word segmentation can be greatly increased by using Bi-LSTM + CRF, and the practicability is high.
In the embodiment, after the text cleaning module corrects the confusing characters, redundant spaces are deleted, so that a clean text is obtained, errors in character conversion can be effectively reduced by the text cleaning module, and the conversion quality is improved.
An intelligent scoring method for a bidding document comprises the following steps:
s1: triggering, namely after a user triggers an instruction, starting to intelligently analyze the bidding document, and jumping to the next step for execution;
s2: acquiring a rich text, extracting the text and related information from the bidding document, and transmitting the related information to the next step for execution;
s3: text cleaning, namely deleting special characters through a text cleaning module, cleaning general texts, correcting easily-mixed characters after OCR recognition, deleting redundant spaces to obtain clean texts, and transmitting the clean texts to the next step for execution;
s4: segmenting words, establishing a word bank of scoring standard segmentation by the clean text through a word segmentation method, thereby segmenting words of the text to obtain a segmented text;
s5: establishing semantic rules, analyzing the word text, establishing semantic rules needing to be extracted, and finally outputting the semantic rules;
s6: and matching results, namely performing matching operation on the word segmentation texts through the semantic rules, and performing formula calculation on the results after the results are matched to obtain the bidding document score.
In this embodiment, in the step S2, the related information at least includes a text, a style, a position, and a table and a picture corresponding to the text, and the related information can be clearly expressed by the text, the style, the position, and the table and the picture corresponding to the text.
In this embodiment, in the step S6, the bidding document score at least includes a calculation formula and a score, and the score can be clearly and visually displayed in the calculation process through the calculation formula, so that the staff can conveniently check and debug.
In this embodiment, in step S3, the step of correcting the confusing character recognized by the OCR is specifically: and the confusing characters after OCR recognition are corrected according to the context and common errors, so that the influence of wrongly written characters generated in the OCR recognition process on the scoring process is avoided.
It can be seen from the above embodiments that the invention performs automatic calculation of word segmentation and bidding document scoring through the deep learning model of Bi-LSTM + CRF, which is simpler and faster, and can enter the paper bidding document through the rich text acquisition module and the text cleaning module, thereby increasing the versatility of the invention, and the text cleaning module can effectively reduce the error in text conversion, and improve the conversion quality.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (8)

1. An intelligent bidding document scoring system, comprising:
the triggering module is mainly used for receiving a triggering instruction of a user, generating a bidding document analysis instruction after receiving the triggering instruction of the user, and sending the bidding document analysis instruction to the transmission module;
the storage module is mainly used for storing the uploaded bidding document file so as to facilitate subsequent processing;
the rich text acquisition module extracts texts and related information thereof from the bidding document file, wherein the related information at least comprises texts, styles, positions and tables and pictures corresponding to the texts;
the transfer module is used for receiving the bid statement analysis instruction, formatting the bid statement analysis instruction and transmitting the formatted bid statement analysis instruction to the rich text acquisition module to perform text data acquisition operation;
the scoring standard word segmentation library establishing module is used for establishing a word library of scoring standard words by a word segmentation method so as to segment the words of the text;
the semantic rule model establishing module is used for analyzing the text after word segmentation, establishing a semantic rule needing to be extracted and finally outputting the semantic rule;
the matching module is used for performing matching operation on the bidding document text through the semantic rule, and performing formula calculation on the result after the result is matched to obtain a bidding document score;
and the text cleaning module is mainly used for replacing special characters, cleaning general texts, and correcting the easily-mixed characters after OCR recognition while cleaning the texts based on analysis configuration.
2. The intelligent bidding document scoring system according to claim 1, wherein: the word segmentation method specifically comprises the following steps: and performing word segmentation processing by using a deep learning model comprising Bi-LSTM + CRF, and outputting a word segmentation word bank.
3. The intelligent bidding document scoring system according to claim 2, wherein: the Bi-LSTM + CRF specifically comprises the following components: firstly training a Word vector, training a 50-dimensional vector for the Word of the corpus by using Word2vec, then accessing a Bi-LSTM for modeling the semantic information of the whole sentence, and finally accessing a CRF to complete Word segmentation.
4. The intelligent bidding document scoring system according to claim 1, wherein: after the text cleaning module corrects the confusing characters, redundant spaces are deleted, so that clean text is obtained.
5. The intelligent bidding document scoring method according to claim 1, comprising the following steps:
s1: triggering, namely after a user triggers an instruction, starting to intelligently analyze the bidding document, and jumping to the next step for execution;
s2: acquiring a rich text, extracting the text and related information from the bidding document, and transmitting the related information to the next step for execution;
s3: text cleaning, namely deleting special characters through a text cleaning module, cleaning general texts, correcting easily-mixed characters after OCR recognition, deleting redundant spaces to obtain clean texts, and transmitting the clean texts to the next step for execution;
s4: segmenting words, establishing a word bank of scoring standard segmentation by the clean text through a word segmentation method, thereby segmenting words of the text to obtain a segmented text;
s5: establishing semantic rules, analyzing the word text, establishing semantic rules needing to be extracted, and finally outputting the semantic rules;
s6: and matching results, namely performing matching operation on the word segmentation texts through the semantic rules, and performing formula calculation on the results after the results are matched to obtain the bidding document score.
6. The intelligent bidding document scoring method according to claim 5, wherein: in step S2, the related information at least includes text, style, position, and table and picture corresponding to the text.
7. The intelligent bidding document scoring method according to claim 5, wherein: in step S6, the bidding score includes at least a calculation formula and a score.
8. The intelligent bidding document scoring method according to claim 5, wherein: in step S3, the step of correcting the confusing character recognized by OCR is specifically: and correcting the confusing words after OCR recognition according to the context and common errors.
CN202110712073.3A 2021-06-25 2021-06-25 Intelligent marking method and system Pending CN113449504A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110712073.3A CN113449504A (en) 2021-06-25 2021-06-25 Intelligent marking method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110712073.3A CN113449504A (en) 2021-06-25 2021-06-25 Intelligent marking method and system

Publications (1)

Publication Number Publication Date
CN113449504A true CN113449504A (en) 2021-09-28

Family

ID=77812858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110712073.3A Pending CN113449504A (en) 2021-06-25 2021-06-25 Intelligent marking method and system

Country Status (1)

Country Link
CN (1) CN113449504A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580362A (en) * 2022-05-09 2022-06-03 四川野马科技有限公司 System and method for generating return mark file

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580362A (en) * 2022-05-09 2022-06-03 四川野马科技有限公司 System and method for generating return mark file
CN114580362B (en) * 2022-05-09 2022-11-01 四川野马科技有限公司 System and method for generating return mark file

Similar Documents

Publication Publication Date Title
CN110363194B (en) NLP-based intelligent examination paper reading method, device, equipment and storage medium
CN110717031B (en) Intelligent conference summary generation method and system
CN108287858B (en) Semantic extraction method and device for natural language
CN104050160B (en) Interpreter's method and apparatus that a kind of machine is blended with human translation
CN109145260B (en) Automatic text information extraction method
US11113323B2 (en) Answer selection using a compare-aggregate model with language model and condensed similarity information from latent clustering
CN109446885B (en) Text-based component identification method, system, device and storage medium
CN111353306B (en) Entity relationship and dependency Tree-LSTM-based combined event extraction method
US20220414463A1 (en) Automated troubleshooter
CN111581367A (en) Method and system for inputting questions
CN112016320A (en) English punctuation adding method, system and equipment based on data enhancement
CN113076720B (en) Long text segmentation method and device, storage medium and electronic device
CN113408287B (en) Entity identification method and device, electronic equipment and storage medium
CN115759119B (en) Financial text emotion analysis method, system, medium and equipment
CN111401012A (en) Text error correction method, electronic device and computer readable storage medium
CN109446522B (en) Automatic test question classification system and method
CN110781291A (en) Text abstract extraction method, device, server and readable storage medium
CN112151019A (en) Text processing method and device and computing equipment
CN113449504A (en) Intelligent marking method and system
CN113343701A (en) Extraction method and device for text named entities of power equipment fault defects
CN112818693A (en) Automatic extraction method and system for electronic component model words
CN113435213B (en) Method and device for returning answers to user questions and knowledge base
CN111090720B (en) Hot word adding method and device
CN114078470A (en) Model processing method and device, and voice recognition method and device
Malkadi et al. Improving code extraction from coding screencasts using a code-aware encoder-decoder model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination