CN116882416B

CN116882416B - Information identification method and system for bidding documents

Info

Publication number: CN116882416B
Application number: CN202311152902.2A
Authority: CN
Inventors: 陈裕燕; 王聪; 胡静; 尹彬蔚; 傅鹏; 陈光远
Original assignee: Jiangxi Wonderful Horizon Purchasing Consulting Co ltd
Current assignee: Jiangxi Wonderful Horizon Purchasing Consulting Co ltd
Priority date: 2023-09-08
Filing date: 2023-09-08
Publication date: 2023-11-21
Anticipated expiration: 2043-09-08
Also published as: CN116882416A

Abstract

The invention is suitable for the technical field of bidding, and provides an information identification method and system of a bidding document, wherein the method comprises the steps of dividing text information of the bidding document into a plurality of sentences, and performing word segmentation and part-of-speech tagging on the sentences; constructing a directed weighted dynamic graph according to semantic relations and logical relations among sentences; identifying a plurality of key nodes from the directed weighted dynamic graph as candidate information nodes according to a predefined rule; fusing and encoding the neighborhood information and the self characteristics of each candidate information node to obtain a low-dimensional dense vector serving as a characterization vector; for the characterization vector of each candidate information node, a classifier or a regressive is used, and the information value or the confidence coefficient of the characterization vector is output according to the key information type corresponding to the candidate information node; and screening and sequencing the candidate information nodes according to the output information value or the confidence coefficient to obtain a key information list. The invention solves the problem of low information identification efficiency of the existing bidding documents.

Description

Information identification method and system for bidding documents

Technical Field

The invention belongs to the technical field of bidding, and particularly relates to an information identification method and system of a bidding document.

Background

The bidding document is a document issued by a bidding person and used for inviting the bidding person to participate in bidding, and generally contains basic information, detailed information, bidding requirements and the like of the project. The bidding documents are important bases for bidding activities and are also important references for bid preparation and decision making by the bidding units. Therefore, the method and the device can accurately and quickly identify the key information in the bidding document, and have important significance for improving the efficiency and quality of bidding activities.

At present, the identification of information of a bidding document is mainly carried out manually, and the following problems exist in the mode: firstly, the efficiency is low, and a great deal of manpower and time are consumed; secondly, the error is easy to occur, and the error or omission of information extraction and classification can be caused because the manual identification is influenced by subjective factors and the professional level; thirdly, the format and the content of the bid-signing file are difficult to adapt to diversification and complicacy, and various formats and contents of the bid-signing file can exist due to factors of different industries, different areas, different institutions and the like, so that great difficulty is brought to manual identification.

Therefore, methods capable of automatically identifying and extracting key information in the bidding documents are proposed to improve the efficiency and quality of bidding activities. However, the existing information identification methods of the bidding documents are mainly based on rules or machine learning methods, and these methods have the following problems:

The rule method needs to manually write a large number of rules and templates, has large workload, is difficult to adapt to bidding documents of different types and formats, and is easy to generate the conditions of missed detection and false detection;

the machine learning method requires a large amount of labeling data to train the model, so that not only is the data acquisition difficult, but also the model generalization capability is poor, and novel and complex bidding documents are difficult to process;

the methods all need to carry out full-text processing on the bidding documents, have high computational complexity and high memory consumption, and are difficult to realize rapid and efficient information identification.

Disclosure of Invention

Based on the above, the invention aims to provide an information identification method of a bidding document, which aims to solve the problem of low information identification efficiency of the existing bidding document.

According to the embodiment of the invention, the information identification method of the bidding document comprises the following steps:

dividing text information in the bidding documents into a plurality of sentences, and carrying out word segmentation and part-of-speech tagging on each sentence;

constructing a directed weighted dynamic graph according to semantic relations and logical relations among sentences, wherein each sentence serves as a node, each node is provided with a unique number, each side represents the relation between two nodes, and each side has a weight representing the strength of the relation;

According to a predefined rule, a plurality of key nodes are identified from the directed weighted dynamic graph to serve as candidate information nodes, wherein each candidate information node corresponds to one key information type;

using a deep learning-based graph neural network model to fuse and encode neighborhood information and self-characteristics of each candidate information node to obtain a low-dimensional dense vector as a characterization vector;

for the characterization vector of each candidate information node, a classifier or a regressive is used, and the final information value or confidence coefficient of the characterization vector is output according to the key information type corresponding to the candidate information node;

and screening and sequencing the candidate information nodes according to the output information value or the confidence coefficient to obtain a final key information list.

Further, for each candidate information node, using a deep learning-based graph neural network model, fusing and encoding the neighborhood information and the self-characteristics of the candidate information node, and obtaining a low-dimensional dense vector as a characterization vector of the candidate information node comprises the following steps:

extracting self characteristics of each candidate information node, wherein the self characteristics are word vectors or sentence vectors of sentences corresponding to the candidate information nodes;

Extracting neighborhood information of each candidate information node, wherein the neighborhood information is a hidden state vector of a node adjacent to the candidate information node;

and fusing and encoding the self characteristics and neighborhood information of each candidate information node to obtain a low-dimensional dense vector serving as a characterization vector.

Still further, the extracting neighborhood information of each candidate information node is calculated using the following formula:

；

wherein the method comprises the steps ofIndicate->The individual node is at->Hidden state vector of layer, sigma represents a nonlinear activation function, < >>Indicate->Trainable parameter matrix of layer, < >>Indicate->A neighbor node set of individual nodes; />Indicate->Personal node and->Edge weight between individual nodes, +.>Indicate->Degree of individual node, ++>Indicate->Degree of individual node, ++>Indicate->The individual node is at->Hidden state vector of layer,/>Indicate->Bias vector of layer, ">Indicate->The individual node is at the firstA hidden state vector for the layer;

and when the self characteristics and neighborhood information of each candidate information node are fused and encoded to obtain a low-dimensional dense vector as a characterization vector, the following formula is used for calculation:

wherein,indicate->Characterization vector of individual node, " >Indicate->Hidden state vector of individual node at last layer,/->Indicate->Self feature vector of individual node->Representing a trainable function.

Further, in constructing a directed weighted dynamic graph, the edge weights between two nodes are calculated using the following formula:

wherein,indicate->Personal node and->Edge weight between individual nodes, +.>And->Indicate->Personal node and->Sentence corresponding to each node, < > and the like>、/>And->Is a superparameter for adjusting the influence of different factors on the side weights,/for example>Representation->And->Semantic similarity between->Representation->And->Semantic association between->Representation->And->Semantic variability between.

Still further, the step of identifying a number of key nodes from the directed weighted dynamic graph as candidate information nodes according to predefined rules comprises:

predefined rules are used for judging whether sentence nodes contain a certain key information type according to the structure and the content of the bidding documents;

traversing all sentence nodes in the directed weighted dynamic graph, and judging whether each sentence node contains a certain key information type according to a predefined rule;

if yes, taking the sentence nodes as candidate information nodes and giving corresponding key information type labels.

Still further, the following rules are used when identifying a number of key nodes from the directed weighted dynamic graph according to predefined rules:

if the output degree of the target node is zero and the corresponding sentence contains the project name, the project number and the project location information, or the corresponding sentence is the title or the beginning of the bidding document, the target node is taken as a candidate information node, the corresponding key information type is project basic information, and the output degree of the target node is the number of edges of the target node connected to other nodes;

if the output of the target node is greater than zero and the corresponding sentence contains the project content, project budget and project period information, or the corresponding sentence is the text part of the bidding document, the target node is taken as a candidate information node, and the corresponding key information type is project detailed information;

if the entry degree of the target node is greater than zero, and the corresponding sentence contains the bid qualification and the bid expiration date information, or the corresponding sentence is the end part of the bidding document, the target node is used as a candidate information node, the corresponding key information type is bid requirement information, and the entry degree of the target node is the number of edges of other nodes connected to the target node.

Further, the step of outputting the final information value or confidence coefficient of each candidate information node by using a classifier or a regressive according to the corresponding key information type includes:

predefining some key information types according to the structure and the content of the bidding documents;

for each candidate information node, selecting a corresponding classifier or regressor according to the corresponding key information type, and taking the characterization vector of the classifier or regressor as input to obtain output of the classifier or regressor;

and analyzing the final information value or the confidence degree of each candidate information node according to the corresponding key information type.

Further, when a classifier or a regressor is used, different loss functions and evaluation indexes are selected according to different key information types;

if the key information type is project basic information, a multi-label classifier is used, the loss function is a binary cross entropy loss function, and the evaluation indexes are the accuracy, the recall rate and the F1 value;

if the key information type is project detailed information, using a multi-classifier, wherein the loss function is a polynomial cross entropy loss function, and the evaluation index is an accuracy rate, a recall rate and an F1 value;

If the key information type is bidding request information, a regressor is used, the loss function is a mean square error loss function, and the evaluation indexes are mean square error, root mean square error and mean absolute error.

Further, the step of screening and sorting the candidate information nodes according to the output information value or the confidence coefficient to obtain a final key information list includes:

judging whether each candidate information node meets a preset condition according to the information value or the confidence coefficient output by the candidate information node;

if yes, reserving candidate information nodes, otherwise deleting the candidate information nodes;

and for the reserved candidate information nodes, sorting the reserved candidate information nodes according to the information values or the confidence degrees output by the candidate information nodes to obtain an ordered key information list.

It is also an object of another embodiment of the present invention to provide an information identification system for a bidding document, the system comprising:

the sentence segmentation module is used for segmenting the text information in the bidding document into a plurality of sentences and carrying out word segmentation and part-of-speech tagging on each sentence;

the directed weighted dynamic diagram construction module is used for constructing a directed weighted dynamic diagram according to the semantic relation and the logic relation between sentences, wherein each sentence is used as a node, each node is provided with a unique number, each side represents the relation between two nodes, and each side is provided with a weight for representing the strength of the relation;

The candidate information node identification module is used for identifying a plurality of key nodes from the directed weighted dynamic graph as candidate information nodes according to a predefined rule, wherein each candidate information node corresponds to one key information type;

the fusion coding module is used for fusing and coding neighborhood information and self characteristics of each candidate information node by using a graph neural network model based on deep learning to obtain a low-dimensional dense vector as a characterization vector;

the information output module is used for outputting a final information value or confidence coefficient of the characterization vector of each candidate information node by using a classifier or a regressive according to the key information type corresponding to the candidate information node;

and the screening and sorting module is used for screening and sorting the candidate information nodes according to the output information value or the confidence coefficient to obtain a final key information list.

According to the information identification method for the bidding documents, provided by the embodiment of the invention, the efficiency and the accuracy of information extraction are improved through the automatic processing of massive, heterogeneous and unstructured bidding documents; text information in the bidding documents is effectively processed and converted into structured data, so that subsequent graph construction and graph embedding are facilitated; the graph structure data in the bidding documents are effectively processed, and local and global characteristics of nodes and edges are captured by utilizing a graph neural network model, so that subsequent graph characteristics and graph output are facilitated; the numerical information in the bidding documents is effectively processed, a classifier or a regressor is utilized to output a final information value or confidence level, so that subsequent display and application are facilitated, the accuracy, the completeness, the readability and the usability of information identification of the bidding documents are effectively improved, convenient and efficient services are provided for bidding parties and bidding parties, text information is segmented, a directional weighted dynamic diagram is built, the bidding documents are converted into structured diagram data, full-text processing of the bidding documents is not needed, and the computational complexity and the memory consumption are reduced; the complex structure and semantic information in the bidding documents are captured through the graph neural network model, so that the quality and depth of information extraction are improved, and the problem of low information identification efficiency of the existing bidding documents is solved.

Drawings

FIG. 1 is a flowchart of a method for identifying information of a bidding document according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an information identification system for a bidding document according to an embodiment of the present invention.

The following detailed description will further illustrate the invention with reference to the above-described drawings.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," and the like are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

Example 1

Referring to fig. 1, a flowchart of a method for identifying information of a bidding document according to a first embodiment of the present invention is shown, for convenience of explanation, only a portion related to the embodiment of the present invention is shown, and the method includes:

step S10, dividing text information in a bidding document into a plurality of sentences, and carrying out word segmentation and part-of-speech tagging on each sentence;

in one embodiment of the invention, the text information in the bidding document is divided into a plurality of sentences, and each sentence is segmented and labeled in part of speech, so that the text information in the bidding document is converted into structured data, and subsequent graph construction and graph embedding are facilitated. The word segmentation refers to the segmentation of sentences into a plurality of words, and the part-of-speech tagging refers to the assignment of a part-of-speech tag, such as nouns, verbs, adjectives, and the like, to each word. The segmentation and part-of-speech tagging may be implemented using existing natural language processing tools or models, such as jieba, stanfordnlp, spacy, etc.

The specific implementation method is as follows: firstly, a natural language processing tool or model is used to divide the text information in the bidding document into a plurality of independent sentences according to the ending symbol (such as a period, a question mark, an exclamation mark, etc.) or other special symbols (such as a line feed symbol, a space, etc.).

Secondly, using a natural language processing tool or model to segment each sentence and label the parts of speech, namely segmenting each sentence into a plurality of words according to language rules or statistical models, and assigning a part of speech label to each word.

Through the steps, the text information in the bidding document can be divided into a plurality of sentences, and each sentence is segmented and labeled in part of speech. The text information in the bidding document can be converted into structured data, so that subsequent graph construction and graph embedding are facilitated.

Step S20, constructing a directed weighted dynamic diagram according to semantic relations and logic relations among sentences;

in one embodiment of the invention, a directed weighted dynamic graph is constructed according to semantic relationships and logical relationships among sentences, wherein each sentence serves as a node, each node has a unique number, each side represents the relationship between two nodes, and each side has a weight to represent the strength of the relationship, so that text information in a bidding document is converted into graph structure data, and subsequent graph embedding and graph characterization are facilitated. The directed graph may represent directionality and causality between information, the weighted graph may represent strength and importance between information, and the dynamic graph may represent changes and evolution of information over time.

The specific implementation method is as follows: first, for each sentence node, it is assigned a unique number, incremented from 1. Secondly, determining whether edges exist between two nodes or not and the directions and weights of the edges according to the semantic relation and the logic relation between sentences. Where semantic relationships refer to links in the sense of synonyms, antisense, upper and lower, parallel, causal, etc. between sentences. Logical relationships refer to inferential relationships between sentences, such as hypotheses, conclusions, conditions, inferences, and the like. The direction of an edge refers to the direction in which information flows from one node to another, such as back and forth, up and down, left and right, etc. The weight of an edge refers to the strength of a relationship between two nodes, such as strength, closeness, relevance, etc. Wherein the determination of whether an edge exists between two nodes, and the direction and weight of the edge, is calculated using the following formula:

wherein,indicate->Personal node and->Edge weight between individual nodes, +.>And->Indicate->Personal node and->Sentence corresponding to each node, < > and the like>、/>And->Is a superparameter for adjusting the influence of different factors on the side weights,/for example>Representation->And->Semantic similarity between word vectors or sentence vectors, cosine similarity calculation can be used,/and% >Representation->And->The semantic association between the two can be calculated by using methods such as dependency syntactic analysis or semantic role labeling,representation->And->The semantic change degree between the two can be calculated by using methods such as text emotion analysis or text topic analysis.

If it isGreater than a preset threshold, consider +.>Personal node and->Edges exist between individual nodes, otherwise no edges are considered to exist. Wherein the direction of the edge is defined by->Personal node and->The position of the individual node in the text is determined if +.>The individual node is at->Before the individual nodes, the direction of the edge is from +.>The individual node points to->The individual nodes, otherwise from->The individual node points to->And each node.

Therefore, through the steps, a directional weighted dynamic graph can be constructed according to the semantic relationship and the logic relationship between sentences, wherein each sentence is used as a node, each node is provided with a unique number, each side represents the relationship between two nodes, and each side is provided with a weight for representing the strength of the relationship. Therefore, text information in the bidding document can be converted into data of a graph structure, and subsequent graph embedding and graph characterization are facilitated. By using the directional weighted dynamic graph as the graph structure and graph type of the information identification method of the bidding document, the directionality, strength and change characteristics of the information in the bidding document can be fully utilized, and the accuracy and robustness of information identification can be improved.

Step S30, identifying a plurality of key nodes from the directed weighted dynamic graph as candidate information nodes according to a predefined rule, wherein each candidate information node corresponds to one key information type;

in one embodiment of the present invention, a number of key nodes are identified from the directed weighted dynamic graph according to predefined rules in order to screen the most relevant and important information from the bid document and to give different classification labels. Wherein the key information types include project basic information, project detailed information, bid requirement information, and the like. Wherein the steps can be realized by the following steps:

if yes, the sentence node is used as a candidate information node and a corresponding key information type label is given.

The specific implementation method is as follows: first, some rules are predefined according to the structure and content of the bidding document. For example, the following are some possible rules:

If the output degree of the target node is zero, and the corresponding sentence contains the project name, the project number and the project location information, or the corresponding sentence is the title or the beginning of the bidding document, the target node is taken as a candidate information node, the corresponding key information type is project basic information, the output degree of the target node is the number of edges of the target node connected to other nodes, and the output degree of one node is zero, so that the node does not send any edge, namely is not connected to any other node;

if the output degree of the target node is greater than zero, and the corresponding sentence contains the item content, the item budget and the item period information, or the corresponding sentence is the text part of the bidding document, the target node is taken as a candidate information node, the corresponding key information type is item detailed information, and the output degree of one node is greater than zero, which means that the node sends out at least one side, namely is connected to at least one other node;

if the ingress of the target node is greater than zero, and the corresponding sentence contains the bid qualification and the bid expiration date information, or the corresponding sentence is the end part of the bidding document, the target node is used as a candidate information node, the corresponding key information type is bid requirement information, the ingress of the target node is the number of edges of other nodes connected to the target node, and the ingress of one node is greater than zero, which means that the node is pointed by at least one edge, that is, is connected by at least one other node.

The rule may be adjusted and optimized according to different types and formats of bidding documents, which is not limited herein. Secondly, traversing all sentence nodes in the directed weighted dynamic graph, and judging whether each sentence node contains a certain key information type according to a predefined rule. If yes, the sentence node is used as a candidate information node, and a corresponding key information type label is assigned to the candidate information node.

Through the steps, a plurality of key nodes can be identified from the directed weighted dynamic graph according to the predefined rule and serve as candidate information nodes, wherein each candidate information node corresponds to one key information type. By setting the predefined rules, key nodes can be identified and then analyzed, and all nodes do not need to be analyzed, so that the data processing efficiency is effectively improved.

Step S40, for each candidate information node, using a graph neural network model based on deep learning to fuse and encode neighborhood information and self-characteristics of the candidate information node, and obtaining a low-dimensional dense vector as a characterization vector of the low-dimensional dense vector;

in one embodiment of the invention, for each candidate information node, a deep learning-based graph neural network model is used to fuse and encode neighborhood information and self-characteristics of the candidate information node, and a low-dimensional dense vector is obtained as a representation of the candidate information node, so that text information in a bid-in file is converted into numerical information, and subsequent classification or regression is facilitated. The graph neural network is a deep learning model capable of processing graph structure data, and can effectively capture local and global characteristics of nodes and edges in the graph. Wherein the steps can be realized by the following steps:

The specific implementation method is as follows: first, for each candidate information node, its own feature, i.e., a word vector or sentence vector of its corresponding sentence, is extracted. Word vectors refer to mapping each word to a vector in a high-dimensional space to represent its semantic information. Sentence vector refers to a vector that maps the entire sentence into a high-dimensional space to represent its semantic information. Word vectors or sentence vectors may be generated using existing natural language processing tools or models, such as word2vec, BERT, etc.

Next, for each candidate information node, its neighborhood information, i.e., the hidden state vector of its neighboring nodes, is extracted. The hidden state vector refers to the intermediate output of each node in the graph neural network model as a characteristic thereof to represent the structural information thereof. The hidden state vector may be calculated using the following formula:

；

Wherein the method comprises the steps ofIndicate->The individual node is at->The hidden state vector of the layer, sigma, represents a nonlinear activation function, e.g. ReLU, sigmoid, tanh, etc.,/i>Indicate->Trainable parameter matrix of layer, < >>Indicate->A neighbor node set of individual nodes; />Indicate->Personal node and->Edge weight between individual nodes, +.>Indicate->Degree of individual node, ++>Indicate->Degree of individual node, ++>Indicate->The individual node is at->Hidden state vector of layer,/>Indicate->Bias vector of layer, ">Indicate->The individual node is at->Hidden state vector of layer.

And finally, for each candidate information node, fusing and encoding the self characteristics and the neighborhood information of the candidate information node to obtain a low-dimensional dense vector serving as a characterization vector of the low-dimensional dense vector. The fusion and encoding can be performed using the following formula:

wherein,indicate->Characterization vector of individual node, ">Indicate->Hidden state vector of individual node at last layer,/->Indicate->Self feature vector of individual node->Representing a trainable function.

Through the steps, the neighborhood information and the self characteristics of each candidate information node can be fused and encoded by using a deep learning-based graph neural network model, so that a low-dimensional dense vector is obtained as a characterization vector. The map embedding method of the information identification method using the map neural network model based on deep learning as the bidding documents can fully utilize the advantages of deep learning in the field of natural language processing, and improve the quality and efficiency of information embedding by combining the capability of the map neural network in the aspect of processing complex and dynamic map data. Meanwhile, by using the dense low-dimensional single vector as the graph characterization space of the information identification method of the bidding document, the advantages of the dense vector in terms of information retention and semantics can be fully utilized, the advantages of the low-dimensional vector in terms of complexity reduction and generalization capability improvement are combined, and the simplicity and the interpretability of information characterization are improved. Because the bidding document is mainly text information, the requirement of information identification can be met by using a single vector, and other information types are not required to be considered.

Step S50, for the characterization vector of each candidate information node, a classifier or a regressive is used, and the final information value or confidence coefficient of the characterization vector is output according to the key information type corresponding to the candidate information node;

in one embodiment of the present invention, a classifier or regressor is used for the token vector of each candidate information node, and according to the corresponding key information type, the final information value or confidence level is output, so as to convert the numerical information in the bid-in file into readable information, thereby facilitating subsequent display and application. The classifier or regressor may be implemented using existing machine learning or deep learning models, such as multi-layer perceptrons, logistic regressions, support vector machines, etc. According to different key information types, different loss functions and evaluation indexes are selected. Wherein the steps can be realized by the following steps:

The specific implementation method is as follows: first, some key information types, such as project basic information, project detailed information, and bid requirement information, are predefined according to the structure and contents of the bid document. Each key information type corresponds to a classifier or regressor, and a loss function and an evaluation index. For example, the following are some of the possible key information types and corresponding models, loss functions, and evaluation metrics:

if the key information type is project basic information, using a multi-label classifier such as a multi-layer perceptron, a logistic regression, a support vector machine and the like, wherein the loss function is a binary cross entropy loss function, and the evaluation index is an accuracy rate, a recall rate, an F1 value and the like;

if the key information type is project detailed information, using a multi-classifier such as a multi-layer perceptron, a logistic regression, a support vector machine and the like, wherein the loss function is a polynomial cross entropy loss function, and the evaluation index is an accuracy rate, a recall rate, an F1 value and the like;

if the key information type is bidding request information, a regressor, such as a multi-layer perceptron, linear regression, support vector machine and the like, is used, the loss function is a mean square error loss function, and the evaluation indexes are mean square error, root mean square error, mean absolute error and the like.

The models, the loss function and the evaluation index can be adjusted and optimized according to different types and formats of bidding documents, and are not particularly limited herein.

And secondly, for each candidate information node, selecting a corresponding classifier or regressor according to the corresponding key information type, and taking the characterization vector as input to obtain output. The output may be a class label or a numerical value, or a probability distribution or a confidence interval.

And finally, analyzing the final information value or the confidence degree of each candidate information node according to the corresponding key information type. The information value or confidence level may be a specific text or number, or may be a range or interval.

Through the steps, a classifier or a regressor is used for the characterization vector of each candidate information node, and a final information value or confidence coefficient is output according to the corresponding key information type.

Step S60, screening and sequencing the candidate information nodes according to the output information value or confidence coefficient to obtain a final key information list;

in one embodiment of the present invention, candidate information nodes are filtered and ordered according to the output information values or confidence levels to obtain a final key information list, so as to extract the most accurate and complete information from the bidding document, and display and apply the information according to a certain sequence and format. Screening and ordering may be implemented using existing algorithms or rules, such as thresholding, confidence, priority, etc. Wherein the steps can be realized by the following steps:

if yes, reserving the candidate information node, otherwise deleting the candidate information node;

The specific implementation method is as follows: firstly, for each candidate information node, whether the candidate information node meets a certain condition or not is judged according to the output information value or the confidence coefficient, if yes, whether the candidate information node is larger than or smaller than a preset threshold value or is in a preset range or interval or is in accordance with a preset mode or format or the like. If yes, the candidate information node is reserved, otherwise, the candidate information node is deleted. This step is to remove some false or invalid candidate information nodes, and improve the quality and accuracy of the information. And secondly, sorting the reserved candidate information nodes according to the information values or the confidence degrees output by the candidate information nodes to obtain an ordered key information list. The ordering may be implemented using existing algorithms or rules, such as ascending, descending, priority, relevance, etc. This step is to give a reasonable and clear sequence of displaying the key information, and to improve the readability and usability of the information.

Through the steps, candidate information nodes can be screened and ordered according to the output information value or the confidence coefficient, and a final key information list is obtained. Therefore, the method can effectively extract key information from the bidding document and embed the information into a low-dimensional dense vector for subsequent analysis and application. The accuracy and the robustness of information identification are improved by fully utilizing the directionality, the strength and the change characteristics of the information in the bid-drawing file, and meanwhile, the complexity of information characterization and the interpretability of the information characterization can be simplified.

In the embodiment, the efficiency and accuracy of information extraction are improved through the automatic processing of massive, heterogeneous and unstructured bidding documents; text information in the bidding documents is effectively processed and converted into structured data, so that subsequent graph construction and graph embedding are facilitated; the graph structure data in the bidding documents are effectively processed, and local and global characteristics of nodes and edges are captured by utilizing a graph neural network model, so that subsequent graph characteristics and graph output are facilitated; the numerical information in the bidding documents is effectively processed, a classifier or a regressor is utilized to output a final information value or confidence level, so that subsequent display and application are facilitated, the accuracy, the completeness, the readability and the usability of information identification of the bidding documents are effectively improved, convenient and efficient services are provided for bidding parties and bidding parties, text information is segmented, a directional weighted dynamic diagram is built, the bidding documents are converted into structured diagram data, full-text processing of the bidding documents is not needed, and the computational complexity and the memory consumption are reduced; the complex structure and semantic information in the bidding documents are captured through the graph neural network model, so that the quality and depth of information extraction are improved, and the problem of low information identification efficiency of the existing bidding documents is solved.

Example two

Referring to fig. 2, which is a schematic structural diagram of an information recognition system for a bidding document according to a second embodiment of the present invention, for convenience of explanation, only a portion related to the embodiment of the present invention is shown, the system includes:

the sentence segmentation module 10 is used for segmenting text information in the bidding document into a plurality of sentences and carrying out word segmentation and part-of-speech tagging on each sentence;

the directed weighted dynamic graph construction module 20 is configured to construct a directed weighted dynamic graph according to semantic relationships and logical relationships between sentences, where each sentence is used as a node, each node has a unique number, each side represents a relationship between two nodes, and each side has a weight to represent the strength of the relationship;

a candidate information node identifying module 30, configured to identify, according to a predefined rule, a plurality of key nodes from the weighted directed dynamic graph as candidate information nodes, where each candidate information node corresponds to a key information type;

the fusion encoding module 40 is configured to fuse and encode, for each candidate information node, its neighborhood information and its own features by using a neural network model based on deep learning, to obtain a low-dimensional dense vector as its characterization vector;

The information output module 50 is configured to output, for each characterization vector of the candidate information nodes, a final information value or confidence coefficient according to the key information type corresponding to the candidate information node by using a classifier or a regressive;

and the screening and sorting module 60 is configured to screen and sort the candidate information nodes according to the output information value or the confidence level, so as to obtain a final key information list.

Further, in one embodiment of the present invention, the fusion encoding module 40 includes:

the self-feature extraction unit is used for extracting self-features of each candidate information node, wherein the self-features are word vectors or sentence vectors of sentences corresponding to the candidate information nodes;

the neighborhood information extraction unit is used for extracting neighborhood information of each candidate information node, wherein the neighborhood information is a hidden state vector of a node adjacent to the candidate information node;

and the fusion coding unit is used for fusing and coding the self characteristics and neighborhood information of each candidate information node to obtain a low-dimensional dense vector serving as a characterization vector.

Further, in one embodiment of the present invention, the neighborhood information extraction unit calculates using the following formula:

；

Wherein the method comprises the steps ofIndicate->Personal nodeIn->Hidden state vector of layer, sigma represents a nonlinear activation function, < >>Indicate->Trainable parameter matrix of layer, < >>Indicate->A neighbor node set of individual nodes; />Indicate->Personal node and->Edge weight between individual nodes, +.>Indicate->Degree of individual node, ++>Indicate->Degree of individual node, ++>Indicate->The individual node is at->Hidden state vector of layer,/>Indicate->Bias vector of layer, ">Indicate->The individual node is at the firstA hidden state vector for the layer;

the fusion encoding unit calculates using the following formula:

Further, in one embodiment of the present invention, the directed weighted dynamic graph construction module 20 calculates the edge weights between two nodes using the following formula:

wherein,indicate->Personal node and->Edge weight between individual nodes, +.>And->Indicate->Personal node and->Sentence corresponding to each node, < > and the like>、/>And->Is a superparameter for adjusting the influence of different factors on the side weights,/for example >Representation->And->Semantic similarity between->Representation->And->Semantic association between->Representation->And->Semantic variability between.

Further, in one embodiment of the present invention, the candidate information node identification module 30 includes:

the rule definition unit is used for predefining rules according to the structure and the content of the bidding document and judging whether the sentence nodes contain a certain key information type or not;

the sentence node judging unit is used for traversing all sentence nodes in the directed weighted dynamic graph and judging whether each sentence node contains a certain key information type according to a predefined rule;

and the candidate information node identification unit is used for taking the sentence node as the candidate information node and giving a corresponding key information type label when the sentence node judgment unit judges that the sentence node contains a certain key information type.

Further, in one embodiment of the present invention, the rule definition unit uses the following rule:

Further, in one embodiment of the present invention, the information output module 50 includes:

a key information type definition unit for predefining some key information types according to the structure and content of the bidding document;

the selection unit is used for selecting a corresponding classifier or regressor for each candidate information node according to the corresponding key information type, and obtaining the output of the candidate information node after taking the characterization vector as the input;

and the analysis unit is used for analyzing the final information value or the confidence coefficient of each candidate information node according to the corresponding key information type.

Further, in one embodiment of the present invention, when the selection unit is used, different loss functions and evaluation indexes are selected according to different key information types;

Further, in one embodiment of the present invention, the filtering and sorting module 60 includes:

the judging unit is used for judging whether the preset condition is met or not according to the information value or the confidence coefficient output by each candidate information node;

the reservation unit is used for reserving candidate information nodes when the judgment unit judges that the output information value or the confidence degree meets the preset condition;

the deleting unit is used for deleting the candidate information nodes when the judging unit judges that the output information value or the confidence coefficient does not meet the preset condition;

And the ordering unit is used for ordering the candidate information nodes reserved by the reserving unit according to the information values or the confidence degrees output by the candidate information nodes, so as to obtain an ordered key information list.

The information recognition system for the bidding documents provided by the embodiment of the present application has the same implementation principle and technical effects as those of the foregoing method embodiment, and for the sake of brief description, reference may be made to corresponding contents in the foregoing method embodiment where the system embodiment is not mentioned.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional units or modules according to needs, i.e. the internal structure of the storage device is divided into different functional units or modules, so as to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. An information identification method of a bidding document, the method comprising:

screening and sequencing the candidate information nodes according to the output information value or confidence coefficient to obtain a final key information list;

the step of identifying a number of key nodes from the directed weighted dynamic graph as candidate information nodes according to predefined rules comprises:

if yes, taking the sentence nodes as candidate information nodes and giving corresponding key information type labels;

the following rules are used when a plurality of key nodes are identified from the directed weighted dynamic graph according to predefined rules:

2. The method for identifying information of a bidding document according to claim 1, wherein the step of fusing and encoding neighborhood information and self-features of each candidate information node by using a deep learning-based graph neural network model to obtain a low-dimensional dense vector as a characterization vector thereof comprises:

3. The method for identifying information of a bidding document according to claim 2, wherein the neighborhood information of each candidate information node is extracted by the following formula:

；

wherein the method comprises the steps ofIndicate->The individual node is at->The hidden state vector of the layer, sigma, represents a nonlinear activation function,indicate->Trainable parameter matrix of layer, < >>Indicate->Personal nodeIs a neighbor node set; />Indicate->Personal node and->Edge weight between individual nodes, +.>Indicate->Degree of individual node, ++>Indicate->Degree of individual node, ++>Indicate->The individual node is at->Hidden state vector of layer,/>Indicate->Bias vector of layer, ">Indicate->The individual node is at->A hidden state vector for the layer;

Wherein,indicate->Characterization vector of individual node, ">Indicate->Hidden state vector of the individual nodes at the last layer,indicate->Self feature vector of individual node->Representing a trainable function.

4. The method of claim 1, wherein the step of constructing a directed weighted dynamic graph uses the following formula to calculate edge weights between two nodes:

5. The method for identifying information of a bidding document according to claim 1, wherein the step of outputting the final information value or confidence level of the characterization vector of each candidate information node according to the corresponding key information type by using a classifier or regressor comprises:

6. The method for identifying information of a bidding document according to claim 5, wherein when a classifier or a regressor is used, different loss functions and evaluation indexes are selected according to different key information types;

7. The method for identifying information of a bidding document according to claim 1, wherein the step of screening and sorting candidate information nodes according to the output information value or confidence level to obtain a final key information list comprises:

8. An information identification system for a bidding document, the system comprising:

the screening and sorting module is used for screening and sorting the candidate information nodes according to the output information value or the confidence coefficient to obtain a final key information list;

the candidate information node identification module comprises:

the candidate information node identification unit is used for taking the sentence node as the candidate information node and giving a corresponding key information type label when the sentence node judgment unit judges that the sentence node contains a certain key information type;

The rule definition unit uses the following rule: