CN114595661B

CN114595661B - Method, apparatus, and medium for reviewing bid document

Info

Publication number: CN114595661B
Application number: CN202210491352.6A
Authority: CN
Inventors: 邱冬; 张强; 敬军; 朱晓卿; 郑晓彬; 邹许红; 洪云强; 张超; 滕厚雪; 黄智华; 江展威; 林意强; 郑翀; 孙倩
Original assignee: GUANGDONG DONGGUAN QUALITY SUPERVISION TESTING CENTER; Shenzhen Changjiang Furniture Co ltd; Shenzhen Pingan Integrated Financial Services Co ltd
Current assignee: GUANGDONG DONGGUAN QUALITY SUPERVISION TESTING CENTER; Shenzhen Changjiang Furniture Co ltd; Shenzhen Pingan Integrated Financial Services Co ltd
Priority date: 2022-05-07
Filing date: 2022-05-07
Publication date: 2022-12-23
Anticipated expiration: 2042-05-07
Also published as: CN114595661A

Abstract

Embodiments of the present disclosure relate to methods, devices, and media for reviewing bid documents, including: performing noise content filtering on file contents extracted from a bid file, thereby obtaining a first text of the bid file; acquiring identification characteristics of a target section associated with the bid and standard characteristics of text information associated with the bid; identifying the first text based on the identifying characteristics, thereby determining a second text of the bid document; extracting text information in the second text based on the standard features, so as to obtain a bidding material name and a bidding material numerical value in the second text; performing data alignment of the acquired bid material name and bid material value to acquire bid data of the bid document to generate a bid text based on the acquired bid data; and calculating a similarity between the bid data and the standard data, thereby performing a review on the generated bid text.

Description

Method, apparatus, and medium for reviewing bid document

Technical Field

Embodiments of the present disclosure relate generally to the field of document processing, and more particularly, to a method, computing device, and computer-readable storage medium for reviewing bid documents.

Background

Currently, bidding purchasing is widely used as a special purchasing mode in various industries and fields in China and China. For example, in the initial bidding work of a city construction project, it is also necessary to write a large number of bid documents for participating in a bidding program.

Current bid text is typically written by a human operator from a fixed template. Because the template is roughly fixed, but the specific details are complicated and error is easy to occur, each bidding text needs to be reviewed by a person specially assigned to the first part, the effort is wasted, the efficiency is low, and errors of the bidding text are easy to miss.

In summary, the conventional scheme for reviewing bid documents has disadvantages in that: the traditional mode of manually reviewing the document contents of the bid document is labor-consuming and inefficient, errors of the bid text are easily missed, and writing errors or format errors contained in the bid document may be missed by adopting the mode of template extraction.

Disclosure of Invention

In view of the above problems, the present disclosure provides a method, apparatus, and medium for reviewing a bid document. Based on the scheme, the file content in the bid file can be efficiently reviewed (recruited), error correction and correction are carried out on the content, and finally the most key data information in the bid file can be extracted (recruited) based on the error correction and correction content, so that the processing efficiency of the bid file is improved.

According to a first aspect of the present disclosure, there is provided a method for reviewing a bid document, comprising: performing noise content filtering on file content extracted from a bid document, thereby obtaining a first text of the bid document, the file content being extracted based on a file format of the identified bid document; acquiring identification characteristics of a target section associated with the bid and standard characteristics of text information associated with the bid; identifying the first text based on the identifying characteristics, thereby determining a second text of the bid document; extracting text information in the second text based on the standard features, so as to obtain a bidding material name and a bidding material numerical value in the second text; performing data alignment of the acquired bid material name and bid material value to acquire bid data of the bid document to generate a bid text based on the acquired bid data; and calculating the similarity between the bid data and standard data, thereby performing a review of the generated bid text. Performing a review of the generated bid text can help the user obtain the structure, content, and possible errors of the bid document (e.g., bid document) as quickly as possible.

According to a second aspect of the present disclosure, there is provided a computing device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect of the disclosure.

In a third aspect of the present disclosure, a non-transitory computer readable storage medium is provided having stored thereon computer instructions for causing a computer to perform the method of the first aspect of the present disclosure.

In some embodiments, wherein performing noise content filtering on the document content extracted from the bid document comprises: filtering noise content data in the document content based on the identified file format of the bid document; determining whether a line feed meeting a predetermined condition exists in the file format of the identified bid file; in response to the fact that the identified file format of the bid file has line feeds meeting the preset conditions, filtering line feeds corresponding to the line feeds meeting the preset conditions in the file content; and performing language consistency verification on the file contents reserved through the filtering to confirm the continuity of the file contents.

In some embodiments, wherein performing noise content filtering on the document content extracted from the bid document further comprises: executing content word segmentation on the file content, thereby obtaining a word segmentation result of the file content; acquiring word granularity and word granularity of the word cutting result based on the acquired word cutting result; determining an error word segmentation set based on the acquired character granularity and word granularity; correcting the wrong word segmentation set by using a correction model so as to obtain a candidate corrected word segmentation set; calculating a confusion degree for each candidate correcting word in the candidate correcting word set, thereby obtaining a candidate correcting word with the minimum confusion degree; and correcting the wrong segmentation word set based on the obtained candidate correcting segmentation word with the minimum confusion degree.

In some embodiments, wherein identifying the first text based on the identifying feature to determine the second text of the bid document comprises: performing word segmentation on the first text, thereby obtaining a word segmentation set comprising a plurality of words of the first text; calculating the part of speech of each participle in the acquired participle set; obtaining a target part of speech associated with the bid and a target part of speech threshold; calculating the proportion of the target part of speech in each section in the first text based on the calculated part of speech of each participle; determining whether the proportion of the target part-of-speech of each section in the first text is greater than the target part-of-speech threshold; and in response to determining that the proportion of the target part-of-speech of the current section in the first text is greater than the target part-of-speech threshold, causing the second text of the bid document to include the current section.

In some embodiments, wherein calculating the part-of-speech of each participle in the acquired participle set comprises: inputting the obtained participle set comprising a plurality of participles into a nesting layer in sequence, thereby obtaining vectorized participles; extracting features of the vectorized participles based on a neural network model trained via multiple samples to generate predicted part-of-speech scores and predicted branch scores for parts-of-speech; acquiring a part-of-speech state emission matrix and a part-of-speech state transition matrix for each participle set based on the predicted part-of-speech score and the transition score of the predicted part-of-speech; and acquiring the part of speech with the maximum probability of each word segmentation based on the part of speech state emission matrix and the part of speech state transition matrix.

In some embodiments, wherein performing extraction on the text information in the second text based on the standard features comprises: extracting the name of the bidding material in the second text based on the standard characteristics; determining whether the extracted bid material name has a subordinate name; performing decomposition extraction on the extracted bid material name in response to determining that the extracted bid material name has the lower name, thereby obtaining a lower bid material name; and extracting a bid material value based on the bid material name extracted from the second text and the lower bid material name acquired through the decomposition extraction.

In some embodiments, wherein calculating the similarity between the bid data and the standard data to perform a review of the generated bid text comprises: acquiring standard data corresponding to one or more items of bid data in the acquired bid data based on a standard feature database including the standard features; calculating a similarity between the one or more bid data and the obtained corresponding normative data; determining whether the calculated similarity is greater than or equal to a predetermined deviation threshold; and in response to determining that the calculated similarity is greater than or equal to a predetermined deviation threshold, indexing bid data in the obtained bid data having a similarity greater than the predetermined deviation threshold.

In some embodiments, wherein calculating a similarity between the bid data and the standard data to perform a review of the generated bid text further comprises: calculating a reference hamming distance between the bidding data and the standard data of the text type aiming at the standard data of the text type; determining whether the calculated reference hamming distance falls within a predetermined distance threshold range; responsive to determining that the calculated similarity falls within a predetermined distance threshold range, determining that the bid data is fully responsive to the criteria data; and in response to determining that the calculated similarity does not fall within a predetermined distance threshold range, determining that the bid data deviates from or is otherwise inconsistent with the normative data.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, like or similar reference characters designate like or similar elements.

FIG. 1 illustrates a schematic diagram of a system 100 for implementing a method for reviewing bid documents, according to an embodiment of the present invention.

FIG. 2 illustrates a flow diagram of a method 200 for reviewing bid documents, in accordance with an embodiment of the present disclosure.

FIG. 3 illustrates a flow diagram of a method 300 for reviewing bid documents, in accordance with embodiments of the present disclosure.

FIG. 4 illustrates a schematic block diagram for reviewing bid documents, according to embodiments of the present disclosure.

Fig. 5 shows a block diagram of an electronic device according to an embodiment of the disclosure.

FIG. 6 illustrates a pseudo-code diagram for reviewing bid documents, according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The term "include" and variations thereof as used herein is meant to be inclusive in an open-ended manner, i.e., "including but not limited to". Unless specifically stated otherwise, the term "or" means "and/or". The term "based on" means "based at least in part on". The terms "one example embodiment" and "one embodiment" mean "at least one example embodiment". The term "another embodiment" means "at least one additional embodiment". The terms "first," "second," and the like may refer to different or the same object. Other explicit and implicit definitions are also possible below.

As described above, the current bid-drawing (posting) document is usually written by a worker manually according to a fixed template. Since the template is roughly fixed, but the specific details are complicated and error-prone, each bid text needs to be reviewed by a person specially assigned to the first part, which is labor-consuming and inefficient, and errors possibly existing in the bid document are easily missed.

In summary, the conventional scheme for reviewing bid document has disadvantages in that: the traditional mode of manually reviewing the file content of the bid document is labor-consuming and inefficient, errors of the bid text are easily missed, and the mode of extracting the template may miss the error bid text which may exist in the bid document.

To address, at least in part, one or more of the above problems, as well as other potential problems, example embodiments of the present disclosure propose a solution for reviewing bid documents. As the generation of the bid document has some common and universal generation rules, 90% of the content of the bid document can be automatically processed and reviewed by means of digitization and technology. The scheme of the disclosure can efficiently review (enlist) the file content in the bid file, correct and correct the content, and finally extract (enlist) the most critical data information in the bid file based on the corrected and corrected content, thereby improving the processing efficiency of reviewing (enlisting) the bid file.

As described above, the present disclosure provides a method and apparatus for processing and reviewing bid documents. The method and apparatus can be applied to any type of bidding document meeting the industry, national and international standards in the field of bidding engineering or any similar document including the requirements of the main technology, quality, construction period and the like of the engineering, such as bidding text. Thus, the methods and apparatus provided by the present disclosure may also be employed with bid text or other project documents, and are not limited in this respect by document type.

FIG. 1 illustrates a schematic diagram of a system 100 for implementing a method for reviewing bid documents, according to an embodiment of the present invention. As shown in FIG. 1, system 100 includes a computing device 110 and a (bidding) management device 130 and a network 140. The computing device 110, and the bid management device 130, may interact with data via a network 140 (e.g., the internet).

The bid management device 130 may, for example, store one or more bids file data. The (tendering) bid management device 130 may also transmit the stored (tendering) bid data to the computing device 110. The (bidding) bid management devices are for example and not limited to: electronic computers, network servers, storage calculators, and the like. The (bidding) bid management apparatus 130, for example, collects (bidding) document information, such as bidding documents, business bidding documents, etc., from a plurality of sources, and may classify and manage the collected one or more (bidding) bids.

With respect to computing device 110, for example, for obtaining (tenders) bids from (tenders) bid management device 130; and receives and pushes the (tendered) bid data to be pushed. By pushing the (inviting) bid to be pushed, targeted content reach to the (inviting) bid can be achieved. The computing device 110 may receive data to be pushed from a user or the bid management device 130. By applying the method provided by the present disclosure, the computing device 110 may perform pushing on data to be pushed.

Computing device 110 may have one or more processing units, including special purpose processing units such as GPUs, FPGAs, ASICs, and the like, as well as general purpose processing units such as CPUs. Additionally, one or more virtual machines may also be running on each computing device 110. In some embodiments, the computing device 110 and the bid management device 130 may be integrated or separate from each other. In some embodiments, computing device 110 includes, for example, a filtering module 112, an obtaining module 114, a recognition module 116, an extraction module 118, a processing module 120, and a review module 122.

A filtering module 112, the filtering module 112 configured to perform noise content filtering on the file content extracted from the bid document, thereby obtaining a first text of the bid document, the file content being extracted based on the file format of the identified bid document;

an obtaining module 114, wherein the obtaining module 114 is configured to obtain the identification characteristics of the target section associated with the bid and the standard characteristics of the text information associated with the bid;

an identification module 116, the identification module 116 configured to identify the first text based on the identifying characteristic, thereby determining a second text of the bid document;

an extraction module 118, wherein the extraction module 118 is configured to perform extraction on the text information in the second text based on the standard features, so as to obtain the bidding material name and the bidding material value in the second text;

a processing module 120, the processing module 120 configured to perform data alignment of the acquired bid material name and bid material value to acquire bid data of the bid file, so as to generate a bid text based on the acquired bid data; and

a review module 122, the review module 122 configured to calculate a similarity between the bid data and the standard data, thereby performing a review on the generated bid text.

FIG. 2 illustrates a flow diagram of a method 200 for reviewing bid documents, in accordance with an embodiment of the present disclosure. The method 200 may be performed by the computing device 110 as shown in FIG. 1, or may be performed at the electronic device 500 shown in FIG. 5. It should be understood that method 200 may also include additional blocks not shown and/or may omit blocks shown, as the scope of the disclosure is not limited in this respect.

At step 202, the computing device 110 may perform noise content filtering on the extracted file content from the bid file to obtain a first text of the bid file, the file content being extracted based on the file format of the identified bid file.

In one embodiment, computing device 110 may identify the file format of the bid file. Such as identifying that the bid document belongs to a DOC, DOCX, or PDF. Based on the identified file format, the computing device 110 extracts the file content of the bid file. For example, the computing device 110 may utilize the PDFbox and POI to identify bid files in PDF format. The following code segments exemplarily illustrate program implementations that may be used to parse text contents of a bid document to be read in PDF, DOC, DOCX formats. The first text refers to the file text after removing invalid information such as a title, a header and the like and correcting and filtering the file content. The (pseudo) code for parsing the PDF, DOC, DOCX formats is shown in the code box below. The computing device 110 may determine whether the file belongs to one of PDF, DOC, and DOCX formats, and apply a corresponding tool (e.g., wordextra or PDFParser) to perform parsing according to the belonging format. Based on the same principle, a code branch for judging whether the file belongs to other formats can be added, and details are not repeated herein. FIG. 6 illustrates a pseudo-code diagram for reviewing bid documents, in accordance with an embodiment of the present disclosure.

As shown in the pseudo code in fig. 6, the computing device may determine whether the file belongs to one of PDF, DOC, and DOCX formats, and apply a corresponding tool (for example, wordexractor or PDFParser) to perform parsing respectively according to the belonging format.

In one embodiment, the computing device 110 may also filter noise content data in the file content based on the file format of the identified bid file. For example, for a file with a file format of PDF, there may be abnormal linefeeds present in the file. The computing device 110 can determine whether a line feed meeting predetermined conditions exists for the file format of the identified bid file. In response to determining that a line feed meeting a predetermined condition exists in the file format of the identified bid file, the computing device 110 can filter out line feeds corresponding to line feeds meeting the predetermined condition in the file content.

In one embodiment, computing device 110 may also perform noise content filtering on the file content of the bid file using word-cutting.

Specifically, the bidding document is inevitable to have some situations of wrong Chinese expression due to personnel or input methods, for example, the office furniture is input as office home, and the teapot is input as teapot. For this case, the computing device 110 may perform content word segmentation on the file content to obtain word segmentation results for the file content. The word segmentation can be achieved through a built-in word segmentation device of a Chinese knowledge management toolkit (TRS CKM) commonly used in the field. Because the sentence contains wrongly-written characters, the word segmentation result generally has the situation of wrong segmentation, and therefore errors can be detected from both the word granularity and the word granularity. The computing device 110 obtains the word granularity and the word granularity of the word segmentation result based on the obtained word segmentation result. Based on the obtained word granularity and word granularity, the computing device 110 may determine a set of miscut words. And integrating the suspected error results of the two granularities of the character granularity and the word granularity to form a suspected error word cutting position candidate set. The computing device 110 may correct the set of erroneous-word cuts using a correction model to obtain a set of candidate corrected word cuts. The correction model may be, for example, an LSTM language model. The computing device 110 may traverse all of the suspected wrong-word-cutting position candidate sets and replace the wrong-position words with a phonetic, morphological dictionary, then calculate sentence perplexity (PPL) through the LSTM language model, compare and sort all candidate set results to get the optimal corrected words.

The LSTM language model is a model that gives the first k words of a sentence, and it is desirable to predict what the (k + 1) th word is, i.e., to give a distribution p of the probability that the (k + 1) th word may occur (xk +1 @, x 2., xk). PPL is an index used in the field of Natural Language Processing (NLP) to measure the quality of a language model. It mainly calculates the probability of a sentence according to each word, and uses the sentence length as data alignment, and the calculation formula (1) of PPL can be expressed as formula 1.

As shown in formula (1), S represents the current sentence; n represents the sentence length; p (wi) represents the probability of the ith word; p (wi | w1w2w3 … wi-1) indicates that this representation is based on the first i-1 words, the probability of the ith word being calculated; the smaller ppl is, the larger p (wi) is, that is, the probability of each word in the sentence is higher, and the table matched with the sentence is better.

Calculating a confusion degree for each candidate correcting keyword in the candidate correcting keyword set so as to obtain a candidate correcting keyword with the minimum confusion degree, and finally correcting the wrong keyword set based on the obtained candidate correcting keyword with the minimum confusion degree.

Finally, the computing device 110 may perform language consistency verification on the file content retained via filtering to confirm the continuity of the file content. The consistency verification can be carried out through a language consistency verification tool commonly used in the field, so that whether abnormal sentences or voice missing exist in the file is verified. The file contents filtered above may be processed as the first text in subsequent steps. I.e. in case the language consistency is complete, the following steps are entered.

At step 204, the computing device 110 can obtain identifying characteristics of the target section associated with the bid and standard characteristics of the textual information associated with the bid.

In one embodiment, identifying a feature refers to a feature used to identify a bidding target section, such as a bidding title, bidding key indicator, and the like. These features may be concentrated in one section. By identifying these features, key chapters can be identified. The standard feature refers to a standard feature for acquiring a bid name and bid data in the bid section. The standard characteristic may be, for example, a material name and a material value. By acquiring the above identification features and standard features, the key sections of the document contents of the bid document can be identified, and bid materials and material values can be found in the key sections.

At step 206, computing device 110 may identify the first text based on the identifying characteristic to thereby determine a second text of the bid document.

In one embodiment, the computing device 110 may identify the sections, titles in the text using the keyword identification features obtained in step 204 to determine the second text of the bid document, i.e., the sections of the bid key.

In one embodiment, determining the second text of the bid document, i.e., determining the key text, may include performing word segmentation on the first text to obtain a set of word segments of the first text including a plurality of word segments. The word segmentation is to adopt a word segmentation technology based on the combination of rules and statistics to segment Chinese character sequences into meaningful words, and adopts a forward maximum word segmentation technology and a secondary scanning technology to find most intersection type word segmentation ambiguities while ensuring the word segmentation efficiency.

The computing device 110 may calculate a part-of-speech of each participle in the acquired set of participles based on the set of participles. And the segmentation ambiguity processing technology based on the example is adopted to accurately process the ambiguity and ensure that the system has good expandability. The dictionary is developed, can be manually maintained, and new entries are added; in the embodiment, the text chapter content is segmented by adopting professional dictionaries such as a preset product dictionary and a material dictionary and combining a general dictionary. The part of speech tagging is based on the part of speech after word segmentation is identified by an LSTM + CRF model, and the chapters with the product, standard and material names accounting for a ratio larger than a specified threshold are counted and used as key chapters.

The computing device 110 can obtain a target part of speech associated with the bid and a target part of speech threshold. For example, computing device 110 may obtain a target part of speech associated with the bid, e.g., material, quantity, etc., and a threshold percentage of the target part of speech in key sections previously obtained in the historical bid document. For example, by analyzing historical bid documents, it may be determined in the key section that the target part-of-speech threshold for the part-of-speech as material may be greater than 10% and the quantitative value may be greater than 15% in the key section. The key sections, i.e. the second text, can thus be analyzed with a target part-of-speech threshold of 10% for material and 15% for numeric value.

The computing device 110 may base the calculated part-of-speech of each participle,

the proportion of the target part of speech in each section of the first text is calculated, i.e., the calculation device 110 calculates the proportion of one or more target part of speech in each section of the first text in all the parts of speech. Determining whether a proportion of the target part-of-speech of each section in the first text is greater than the target part-of-speech threshold. In response to determining that the proportion of the target part-of-speech of the current section in the first text is greater than the target part-of-speech threshold, causing the second text of the bid document to include the current section. The second text is the key section of the bid, which will be further analyzed in the following steps.

In one embodiment, calculating the part of speech of each participle in the acquired participle set may include: computing deviceThe device 110 sequentially inputs the obtained participle set including a plurality of participles into the nesting layer, thereby obtaining vectorized participles. The computing device 110 may first define the set of participles obtained after the above-mentioned participle as an input array, i.e. input

。

The nesting layer is used for nesting Word2Vec words, nesting the field words obtained by training the bidding field text, and vectorizing the array after the processing of the nesting layer to be used as the next step of input.

The computing device 110 may extract features of the vectorized participle based on a neural network model trained via multiple samples to generate a predicted part-of-speech score and a branch score for the predicted part-of-speech. Specifically, the vectorized array may be input to the Bi-LSTM layer. The Bi-LSTM layer adopts a bidirectional long and short memory neural network, a forward activation function adopts Tanh, a backward activation function adopts SoftMax, and finally prediction scores of different parts of speech, namely a prediction part of speech score and a prediction transfer score of the part of speech, are obtained.

The computing device 110 may then obtain a part-of-speech state emission matrix and a part-of-speech state transition matrix for each segmented word set based on the predicted part-of-speech scores and transition scores for the predicted parts-of-speech, i.e., input the data obtained from the Bi-LSTM layer to the CRF layer. The loss function in the CRF layer includes two types of scores, one is an emision Score Emission Score (state Score), which comes from the output of the Bi-LSTM layer, i.e., each position is a predicted part-of-speech Score for each part-of-speech; the second is the Transition Score, i.e., the Transition Score of the predicted part-of-speech between parts-of-speech. Based on the above two scores, two matrices, a matrix P representing the state transmission matrix and a matrix a representing the state transition matrix, can be obtained.

Finally, the computing device 110 may obtain the part of speech with the maximum probability of each word segmentation based on the part of speech state emission matrix and the part of speech state transition matrix. The computing device may compute the output Y for each participle based on the matrix P and the matrix a, thereby obtaining the part of speech for which each participle has the greatest probability. The output Y can be expressed as formula (2).

Specifically, the probability P of a part of speech in the output Y can be calculated according to formula (3) and formula (4).

In expression (3), s represents a part-of-speech score calculated for output y corresponding to input X. And in expression (4), p represents the probability value calculated for each correct output y using the softmax function.

At step 208, computing device 110 may perform a decimation on the textual information in the second text based on the standard features to obtain a bid material name and a bid material value in the second text.

In one embodiment, the computing device 110 may disassemble the key section content into different part information based on the standard features (finished product knowledge base and finished product label knowledge base) as obtained in step 204, for example, cabinet parts including table tops, substrates, hardware, sub-cabinets, etc., table parts including table tops, substrates, table feet, hardware, etc.; continuously disassembling the material level of the disassembled component information based on a material library and a material label library, wherein the material level is divided into a material name and a material condition, and the material name is converted into a standard material name based on a rule mapping template; various attributes of the extraction component are extracted based on pattern matching, such as: component substrates, facestocks, standards, and the like.

In one embodiment, the computing device 110 performing extraction of text information in the second text based on the standard features comprises: based on the standard features acquired in step 204, the name of the bid material in the second text is extracted. Bid material names such as counter, face, substrate, fiberboard, etc. The computing device 110 may then determine whether the extracted bid material name has a subordinate name. In response to determining that the extracted bid material name has the lower level name, a decomposition extraction is performed on the extracted bid material name, thereby obtaining a lower level bid material name. For example, since it can be judged that the lower bidding material name exists for the counter type, the counter type can be continuously divided up to the basic bidding material name, table, base material, hardware, sub-counter, and the like. Finally, the computing device 110 may extract the bid material value based on the bid material name extracted from the second text and the lower bid material name obtained via the resolution extraction. Such as the height of the table top, the size of the hardware, etc.

At step 210, the computing device 110 may perform data alignment of the acquired bid material name and bid material value to acquire bid data for the bid file to generate bid text based on the acquired bid data.

In one embodiment, computing device 110 may perform a corresponding format conversion of the extracted component attributes based on a standard library, such as: MM to standard MM, cm to MM, etc.

Specifically, computing device 110 aligns the material names disassembled in step 208 to standard material names based on the name mapping template data and performs a data alignment process on the extracted bid material numerical parameters. For example, the unit mm of standard millimeter is uniformly converted by centimeter/CM/meter/m/inch/size, etc., or the percentage% is converted by permillage, etc.

In one embodiment, the computing device may structure the aligned data to generate bid text in a desired format based on the obtained bid data.

At step 210, computing device 110 can calculate a similarity between the bid data and the standard data to perform a review of the generated bid text.

In one embodiment, the computing device 110 may obtain standard data such as bidding requirements. The standard data may be of an industry standard, a national standard or a custom bidding requirement. Taking the national standard as an example, the standard data can be the standard data in national standards of the people's republic of China, such as GB/T3324-2017, GB/T26848-2011, GB/T38611-2020/ISO21015:2007 and the like in the furniture field. The standard data may include standard data specified for different types of products, good, first-class, and good. Based on the obtained normative data, the computing device may calculate a similarity between the bid data and normative data. The similarity may be used to determine whether the bid data satisfies the criterion data. If satisfied, the bid data may be labeled as not deviating or fully corresponding, and if not, the bid data may be labeled as to be checked or described for inconsistency. A method of calculating the similarity between the bid data and the standard data to perform a review on the generated bid text will be described in detail below.

By using the technical scheme, the file content in the bid file can be efficiently extracted (tendered), the content is corrected and corrected, and finally the most key data information in the bid file can be extracted based on the corrected and corrected content, so that the processing efficiency of the bid file is improved.

FIG. 3 illustrates a flow diagram of a method 300 for reviewing bid documents, in accordance with an embodiment of the present disclosure. The method 300 may be performed by the computing device 110 as shown in FIG. 1, or may be performed at the electronic device 500 shown in FIG. 5. It should be understood that method 200 may also include additional blocks not shown and/or may omit blocks shown, as the scope of the present disclosure is not limited in this respect.

At step 302, computing device 110 may obtain criteria data corresponding to one or more of the obtained bid data based on a criteria characteristics database including the criteria characteristics.

In one embodiment, the computing device 110 may obtain the national standard data for the bid material of the bid document obtained in method 200 from a national standard database. The data may include standard data corresponding to one or more of the acquired bid data. The standard can be a national standard, an international standard, an industry standard, or the like.

At step 304, computing device 110 may calculate a similarity between the one or more bid data and the obtained corresponding criteria data.

In one embodiment, computing device 110 may calculate a similarity between one or more of the acquired bid data and the acquired corresponding criteria data for the criteria. Generally, the criteria may include three criteria, namely numerical criteria, range criteria, and enumerated criteria. A similarity degree calculation method may be set for each criterion. The reference hamming distance is only used as an example for explanation. The bidding document processed by the method 200 can be converted into structured data, and for numerical type, range type and enumeration type standards, the national standard of the corresponding material can be found based on the association relationship among the material library, the tag library and the standard library, and the similarity of the national standard can be calculated, so that whether the material belongs to positive deviation or negative deviation can be known.

At step 306, the computing device 110 may determine whether the calculated similarity is greater than or equal to a predetermined deviation threshold.

At step 308, computing device 110 may index data in the obtained bid data having a similarity above a predetermined deviation threshold in response to the calculated similarity of one or more of the obtained bid data being above the predetermined deviation threshold.

In one embodiment, the computing device 110 may highlight a different expression of the bid content in response to the calculated similarity of one or more of the obtained bid data being above a predetermined deviation threshold or below a specified threshold, thereby improving manual review efficiency.

In one embodiment, for textual type criteria data, computing device 110 may calculate a SimHash-based baseline hamming distance between the bid data and the textual type criteria data. Computing device 110 can determine whether the calculated baseline hamming distance falls within a predetermined distance threshold range. In response to determining that the calculated similarity falls within a predetermined distance threshold range, determining that the bid data is fully responsive to the normative data, and in response to determining that the calculated similarity does not fall within the predetermined distance threshold range, determining that the bid data deviates from or is not otherwise inconsistent with the normative data.

FIG. 4 illustrates a schematic block diagram for reviewing bid documents, according to an embodiment of the present disclosure. As shown in fig. 4, the leftmost part is the acquired bidding requirement, which may be a national standard or an industry standard. The middle part is the bidding document data obtained by the method. As shown in FIG. 5, the textual content may be translated into the formatted aligned bid document data of FIG. 5. The rightmost section shows the results of the review of the document. As shown, bid data that is the same or falls within a range is determined to be fully responsive/not deviating, while indicators that do not fall within a threshold range are indexed and displayed to be checked/described as inconsistent.

According to the technical scheme, the bid data content in the bid file can be efficiently reviewed, and the bid reviewing efficiency is improved.

FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. For example, the computing device 110 as shown in fig. 1 may be implemented by the electronic device 500. As shown, electronic device 500 includes a Central Processing Unit (CPU) 501 that may perform various appropriate actions and processes according to computer program instructions stored in a Read Only Memory (ROM) 502 or loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the random access memory 503, various programs and data necessary for the operation of the electronic apparatus 500 can also be stored. The central processing unit 501, the read only memory 502, and the random access memory 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A plurality of components in the electronic device 500 are connected to the input/output interface 505, including: an input unit 506 such as a keyboard, a mouse, a microphone, and the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The various processes and processes described above, such as the

methods

200, 300, may be performed by the central processing unit 501. For example, in some embodiments, the

methods

200, 300 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the read only memory 502 and/or the communication unit 509. When the computer program is loaded into the random access memory 503 and executed by the central processing unit 501, one or more of the actions of the

methods

200, 300 described above may be performed.

The present disclosure relates to methods, apparatuses, systems, electronic devices, computer-readable storage media and/or computer program products. The computer program product may include computer-readable program instructions for performing various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be interpreted as a transitory signal per se, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or an electrical signal transmitted through an electrical wire.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge computing devices. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for reviewing bid documents, comprising:

performing noise content filtering on file content extracted from a bid document, thereby obtaining a first text of the bid document, the file content being extracted based on a file format of the identified bid document;

acquiring identification characteristics of a target section associated with the bid and standard characteristics of text information associated with the bid;

identifying the first text based on the identifying characteristics, thereby determining a second text of the bid document;

extracting the text information in the second text based on the standard characteristics, so as to obtain the bidding material name and the bidding material numerical value in the second text;

performing data alignment of the acquired bid material name and bid material value to acquire bid data of the bid document to generate a bid text based on the acquired bid data; and

calculating a similarity between the bid data and the standard data with respect to a numeric standard, a range standard, and an enumeration standard, thereby performing a review on the generated bid text,

wherein determining the second text of the bid document comprises: performing word segmentation on the first text by adopting a word segmentation technology based on combination of rules and statistics, so as to obtain a word segmentation set comprising a plurality of words of the first text; calculating the part of speech of each participle in the obtained participle set by adopting a segmentation ambiguity processing technology based on an example; obtaining a target part of speech and a target part of speech threshold value which are associated with the bidding, wherein the target part of speech comprises materials and quantity, and the target part of speech threshold value is determined by analyzing a threshold value proportion of the target part of speech in a key section of a historical bidding document; calculating the proportion of the target part of speech in each section in the first text based on the calculated part of speech of each participle; determining whether the proportion of the target part-of-speech of each section in the first text is greater than the target part-of-speech threshold; and in response to determining that the proportion of the target part-of-speech of the current section in the first text is greater than the target part-of-speech threshold, causing the second text of the bid document to include the current section.

2. The method of claim 1, wherein performing noise content filtering on file content extracted from a markup file comprises:

filtering noise content data in the document content based on the identified file format of the bid document;

determining whether a line feed meeting a predetermined condition exists in the file format of the identified bid file;

in response to the fact that the file format of the identified bidding file has line feeds meeting the preset conditions, filtering line feeds corresponding to the line feeds meeting the preset conditions in the file content; and

performing language consistency verification on the file contents retained through the filtering to confirm the continuity of the file contents.

3. The method of claim 1 or 2, wherein performing noise content filtering on file content extracted from a markup file further comprises:

executing content word segmentation on the file content, thereby obtaining a word segmentation result of the file content;

acquiring word granularity and word granularity of the word cutting result based on the acquired word cutting result;

determining an error word segmentation set based on the acquired character granularity and word granularity;

correcting the wrong word segmentation set by using a correction model so as to obtain a candidate corrected word segmentation set;

calculating a confusion degree for each candidate correct word in the candidate correct word set so as to obtain a candidate correct word with the minimum confusion degree; and

and correcting the wrong segmentation word set based on the obtained candidate correcting segmentation word with the minimum confusion degree.

4. The method of claim 1, wherein calculating a part-of-speech of each participle in the acquired set of participles comprises:

inputting the obtained participle set comprising a plurality of participles into a nesting layer in sequence, thereby obtaining vectorized participles;

extracting features of the vectorized participle based on a neural network model trained via multiple samples to generate a predicted part-of-speech score and a branch score of the predicted part-of-speech;

acquiring a part-of-speech state emission matrix and a part-of-speech state transition matrix for each participle set based on the predicted part-of-speech score and the transition score of the predicted part-of-speech; and

and acquiring the part of speech with the maximum probability of each word segmentation based on the part of speech state emission matrix and the part of speech state transition matrix.

5. The method of claim 1, wherein performing extraction of textual information in the second text based on the standard features comprises:

extracting the name of the bidding material in the second text based on the standard characteristics;

determining whether the extracted bid material name has a subordinate name;

performing decomposition extraction on the extracted bid material name in response to determining that the extracted bid material name has the lower name, thereby obtaining a lower bid material name; and

a bid material value is extracted based on the bid material name extracted from the second text and the lower bid material name acquired via the resolution extraction.

6. The method of claim 1, wherein calculating a similarity between the bid data and standard data to perform a review of the generated bid text comprises:

acquiring standard data corresponding to one or more items of bid data in the acquired bid data based on a standard feature database including the standard features;

calculating a similarity between the one or more bid data and the obtained corresponding criteria data;

determining whether the calculated similarity is greater than or equal to a predetermined deviation threshold; and

in response to determining that the calculated similarity is greater than or equal to a predetermined deviation threshold, bid data having a similarity higher than the predetermined deviation threshold is indexed in the acquired bid data.

7. The method of claim 6, wherein calculating a similarity between the bid data and standard data to perform a review of the generated bid text further comprises:

calculating a reference hamming distance between the bidding data and the standard data of the text type aiming at the standard data of the text type;

determining whether the calculated baseline hamming distance falls within a predetermined distance threshold range;

responsive to determining that the calculated similarity falls within a predetermined distance threshold range, determining that the bid data is fully responsive to the criteria data; and

in response to determining that the calculated similarity does not fall within a predetermined distance threshold range, determining that the bid data deviates from or is otherwise inconsistent with the normative data.

8. A computing device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor;

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

9. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.