CN113128218A - Key field extraction method and device for bidding information - Google Patents

Key field extraction method and device for bidding information Download PDF

Info

Publication number
CN113128218A
CN113128218A CN202110462661.6A CN202110462661A CN113128218A CN 113128218 A CN113128218 A CN 113128218A CN 202110462661 A CN202110462661 A CN 202110462661A CN 113128218 A CN113128218 A CN 113128218A
Authority
CN
China
Prior art keywords
bidding
text
probability
preset key
key field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110462661.6A
Other languages
Chinese (zh)
Inventor
李武钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China World Digital Technology Shenzhen Co ltd
Original Assignee
China World Digital Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China World Digital Technology Shenzhen Co ltd filed Critical China World Digital Technology Shenzhen Co ltd
Priority to CN202110462661.6A priority Critical patent/CN113128218A/en
Publication of CN113128218A publication Critical patent/CN113128218A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Analysis (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a data processing technology, and provides a method, a device, a computer device and a storage medium for extracting key fields of bidding information, wherein the method comprises the following steps: forming a bid inviting set by acquiring the published bid inviting information, wherein the bid inviting set comprises a plurality of bid inviting texts, and each bid inviting text has a corresponding number; screening out a prediction text potentially containing preset key fields from the bid-inviting set, and extracting the number of the prediction text; acquiring a predicted text according to the number, and extracting a corresponding bidding key field from the predicted text according to a regular expression matching algorithm; the bidding key fields are stored to the designated positions for subsequent retrieval and analysis, so that the processing data amount for extracting the bidding key fields is greatly reduced, the efficiency is improved, and the resources are saved because the bidding texts potentially containing the preset key fields are screened out from massive bidding information and the corresponding bidding key fields are extracted from the screened bidding texts.

Description

Key field extraction method and device for bidding information
Technical Field
The present invention relates to the technical field of data processing, and in particular, to a method, an apparatus, a computer device, and a storage medium for extracting a key field related to bidding information.
Background
The bid-inviting bid is a transaction mode adopted when a large quantity of goods are bought and sold under the market economic condition, an engineering construction project is contracted and contracted, and a service project is purchased and provided. With the development of economy, under the condition of increasingly severe market competition, the analysis of bidding information is more important, for massive bidding information, the searching and screening are often carried out through key fields, and at present, the key fields are usually preset by self definition, so that the obtained data is incomplete and accurate, and the analysis is easy to miss; or the bid and bid information is directly obtained from massive bid and bid information, so that the processing data volume is huge and the efficiency is too low.
Disclosure of Invention
The invention mainly aims to provide a method and a device for extracting key fields of bidding information, computer equipment and a storage medium, and aims to solve the technical problem that the processing data size of the key fields in the bidding information is too large in the prior art.
Based on the above object, the present invention provides a method for extracting a key field related to bidding information, comprising:
forming a bidding set by acquiring published bidding information, wherein the bidding set comprises a plurality of bidding texts, and each bidding text has a corresponding number;
screening out bidding texts potentially containing preset key fields from the bidding set, recording the bidding texts as prediction texts, and extracting the numbers of the prediction texts;
acquiring the predicted text according to the serial number, and processing the predicted text according to a regular expression matching algorithm to extract a corresponding bidding key field;
and storing the bidding key field to a specified position for subsequent retrieval and analysis.
Further, the step of screening out the bidding text potentially containing key fields from the bidding set includes:
respectively calculating a first probability that the bidding text contains preset key fields and a second probability that the bidding text does not contain the preset key fields according to the bidding text;
and screening out the bidding text containing preset key fields according to whether the bidding text contains the preset key fields or not according to the first probability and the second probability.
Further, the step of respectively calculating a first probability that the bidding text contains a preset key field and a second probability that the bidding text does not contain the preset key field according to the bidding text includes:
the first probability is calculated using the following formula:
P(D|S)=ΠiP(wi|S);
the second probability is calculated using the following formula:
Figure BDA0003041225730000021
wherein s represents a category containing a preset key field,
Figure BDA0003041225730000022
representing categories not containing preset key fields, D representing a bidding text, wiIndicating the ith preset key field.
Further, the step of determining whether the bidding text contains a preset key field according to the first probability and the second probability includes:
calculating P (s | D) and P (s | D) according to the first probability and the second probability
Figure BDA0003041225730000023
A value of (d);
when in use
Figure BDA0003041225730000024
And judging that the bidding text contains the preset key field, otherwise, judging that the bidding text does not contain the preset key field.
Further, before the step of calculating a first probability that the bidding text contains a preset key field and a second probability that the bidding text does not contain the preset key field according to the bidding text, the method includes:
defining a plurality of preset key fields;
and labeling each bidding text according to each preset key field.
The present invention also provides a key field extracting apparatus for bid information, comprising:
an obtaining information unit, configured to form a bid inviting set by obtaining published bid inviting information, where the bid inviting set includes a plurality of bid inviting texts, and each bid inviting text has a corresponding number;
the text screening unit is used for screening out bidding texts which potentially contain preset key fields from the bidding set, recording the bidding texts as prediction texts and extracting the numbers of the prediction texts;
the field extraction unit is used for acquiring the predicted text according to the serial number and processing the predicted text according to a regular expression matching algorithm so as to extract a corresponding bidding key field;
and the storage field unit is used for storing the bidding key field to a specified position for subsequent retrieval and analysis.
Further, the filtering text unit includes:
the calculation probability subunit is used for respectively calculating a first probability that the bidding text contains a preset key field and a second probability that the bidding text does not contain the preset key field according to the bidding text;
and the text screening subunit is used for screening the bidding text containing the preset key fields according to whether the bidding text contains the preset key fields or not according to the first probability and the second probability.
Further, the calculating probability subunit includes:
the first probability is calculated using the following formula:
P(D|S)=ΠiP(wi|S);
the second probability is calculated using the following formula:
Figure BDA0003041225730000031
wherein s represents a category containing a preset key field,
Figure BDA0003041225730000032
representing categories not containing preset key fields, D representing a bidding text, wiIndicating the ith preset key field.
The invention also provides computer equipment which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the key field extraction method related to the bidding information when executing the computer program.
The present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described key field extraction method with respect to bid information.
The invention has the beneficial effects that: the method comprises the steps of obtaining mass bidding data disclosed on a network, judging whether texts in the bidding data potentially contain preset key fields or not, matching and extracting the bidding texts through regular matching to obtain corresponding bidding key fields when the fact that the texts potentially contain the preset key field information is detected, and therefore not only can follow-up retrieval and statistical analysis according to the key fields be facilitated, but also the processing data amount for extracting the bidding key fields is greatly reduced, efficiency is improved, and resources are saved.
Drawings
FIG. 1 is a diagram illustrating the steps of a method for extracting key fields related to bidding information according to an embodiment of the present invention;
FIG. 2 is a block diagram illustrating a key field extraction apparatus for bid information according to an embodiment of the present invention;
FIG. 3 is a block diagram illustrating the structure of one embodiment of a storage medium of the present application;
FIG. 4 is a block diagram illustrating the structure of one embodiment of a computer device of the present application.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, the method for extracting a key field related to bid information in this embodiment includes:
step S1: forming a bidding set by acquiring published bidding information, wherein the bidding set comprises a plurality of bidding texts, and each bidding text has a corresponding number;
step S2: screening out bidding texts potentially containing preset key fields from the bidding set, recording the bidding texts as prediction texts, and extracting the numbers of the prediction texts;
step S3: acquiring the predicted text according to the serial number, and processing the predicted text according to a regular expression matching algorithm to extract a corresponding bidding key field;
step S4: and storing the bidding key field to a specified position for subsequent retrieval and analysis.
As described in step S1, the published bidding data is obtained through multiple channels to obtain a large amount of bidding information, for example, crawl by a crawler or collect from a designated database, and the bidding information is collected to form a bidding collection, where the bidding collection includes bidding texts, and each bidding text corresponds to a number or id, for example, an item number of the bidding text: GGZC2020-G3-0001-GG or project id: GGZC 2020-G3-0001-GG.
As described in the above step S2, a desired bidding text is selected from the plurality of bidding texts in the bidding collection, wherein the desired bidding text is the bidding text predicted to contain the preset key fields, the preset key field can be a province, a city, a project number, a budget amount, a purchasing agency address contact way, a project contact address, a project contact way, a project contact person, a purchasing article, a purchaser name, a purchaser address, a purchaser contact phone number, a bidding deadline and the like in the bidding text, specifically, whether the bidding text has the preset key field can be predicted through a preset rule, if the prediction has, the bid-marking text is screened from the bid-marking set and recorded as a predicted text, and the number of the predicted text is extracted, for example, the item number: the GGZC2020-G3-0001-GG is used for obtaining corresponding bidding texts according to the numbers in the following process, for example, the bidding texts potentially containing the preset key fields are determined through a naive Bayes model, or corresponding matching rules are preset, and when the matching rules are met, the bidding texts are determined to contain the preset key fields.
As described in the foregoing steps S3-S4, after mass bidding texts in the bidding collection are all traversed and screened, a plurality of predicted texts potentially including key fields are obtained, at this time, corresponding predicted texts are obtained through number acquisition, the predicted texts are processed according to a regular expression matching algorithm to obtain corresponding key fields for bidding, and then the key fields for bidding are stored to a designated location, so that retrieval and statistical analysis can be subsequently performed according to the key fields for bidding. The specific process of obtaining the bidding key field in the predicted text through the regular expression is the prior art, and is not described herein again.
Therefore, the bid inviting texts potentially containing the preset key fields are screened out from the massive bid inviting information, and then the corresponding bid inviting key fields are extracted from the screened bid inviting texts, so that the processing data amount for extracting the bid inviting key fields is greatly reduced, the efficiency is improved, and the resources are saved.
In one embodiment, the bidding text may be processed based on bayesian theorem, and in particular, the step S2 includes:
step S21: respectively calculating a first probability that the bidding text contains preset key fields and a second probability that the bidding text does not contain the preset key fields according to the bidding text;
step S22: and screening out the bidding text containing preset key fields according to whether the bidding text contains the preset key fields or not according to the first probability and the second probability.
In step S21, to calculate the first probability and the second probability, the bidding text of the pre-settable bidding set has two categories, i.e., a category including the pre-set key field, which is denoted by S, and a category not including the pre-set key field, which is denoted by S
Figure BDA0003041225730000051
Representing the category; the bidding text can be marked by preset key fields, and each preset key field is independent of each other, and w is equal to (w)1,w2,w3,...,wi) Representing a preset key field in the bid-on text; specifically, the first probability is calculated using the following formula:
P(D|S)=ΠiP(wi|S);
the second probability is calculated using the following formula:
Figure BDA0003041225730000052
wherein s represents a category containing a preset key field,
Figure BDA0003041225730000053
representing categories not containing preset key fields, D representing a bidding text, wiIndicating the ith preset key field.
As described in the above step S22, it is determined whether the bidding document includes the predetermined key field according to the first probability and the second probability, and specifically, P (S | D) and P (S | D) are calculated according to the first probability and the second probability
Figure BDA0003041225730000061
A value of (d); when in use
Figure BDA0003041225730000062
And judging that the bidding text contains a preset key field, otherwise, judging that the bidding text does not contain the preset key field.
In another embodiment, the formula is arranged according to the first probability and the second probability:
Figure BDA0003041225730000063
when in use
Figure BDA0003041225730000064
And judging that the bidding text contains a preset key field, otherwise, judging that the bidding text does not contain the preset key field.
In one embodiment, before step S21, the method includes:
step S201: defining a plurality of preset key fields;
step S202: and labeling each bidding text according to each preset key field.
In this embodiment, a plurality of preset key fields are defined, for example, the above-mentioned province, city, item number, budget amount, purchase agency address contact, item contact address, item contact, purchase item, purchaser name, purchaser address, purchaser contact telephone, bid deadline, and the like, and then each bid text is labeled according to each preset key field, so as to determine whether the bid text has labels, which labels are included, and the corresponding label number and total label number in each bid text, and further calculate the first probability and the second probability according to the data.
Referring to fig. 2, the present embodiment provides an apparatus for extracting a key field of bidding information, which corresponds to the method for extracting a key field of bidding information, and the apparatus includes:
an obtaining information unit 1 configured to form a bid inviting set by obtaining published bid inviting information, the bid inviting set including a plurality of bid inviting texts, each of the bid inviting texts having a corresponding number;
a text screening unit 2, configured to screen out a bid inviting text potentially including a preset key field from the bid inviting set, record the bid inviting text as a predicted text, and extract a number of the predicted text;
the field extraction unit 3 is used for acquiring the predicted text according to the serial number and processing the predicted text according to a regular expression matching algorithm so as to extract a corresponding bidding key field;
and the storage field unit 4 is used for storing the bidding key field to a specified position for subsequent retrieval and analysis.
As described in the above information obtaining unit 1, the published bidding data is obtained through multiple channels, and massive bidding information is obtained, for example, crawled by a crawler or collected from a designated database, and the bidding information is collected to form a bidding collection, where the bidding collection includes bidding texts, and each bidding text corresponds to a number or id, for example, an item number of the bidding text: GGZC2020-G3-0001-GG or project id: GGZC 2020-G3-0001-GG.
As described in the aforementioned filtering text unit 2, a desired bidding text is filtered from a plurality of bidding texts in the bidding collection, where the desired bidding text is the aforementioned bidding text predicted to contain the preset key fields, the preset key field can be a province, a city, a project number, a budget amount, a purchasing agency address contact way, a project contact address, a project contact way, a project contact person, a purchasing article, a purchaser name, a purchaser address, a purchaser contact phone number, a bidding deadline and the like in the bidding text, specifically, whether the bidding text has the preset key field can be predicted through a preset rule, if the prediction has, the bid-marking text is screened from the bid-marking set and recorded as a predicted text, and the number of the predicted text is extracted, for example, the item number: the GGZC2020-G3-0001-GG is used for obtaining corresponding bidding texts according to the numbers in the following process, for example, the bidding texts potentially containing the preset key fields are determined through a naive Bayes model, or corresponding matching rules are preset, and when the matching rules are met, the bidding texts are determined to contain the preset key fields.
As described in the above field extracting unit 3 and field storing unit 4, after traversing and screening a large amount of bidding texts in the bidding collection, a plurality of potential predicted texts containing key fields are obtained, at this time, corresponding predicted texts can be obtained through number acquisition, the predicted texts are processed according to a regular expression matching algorithm to obtain corresponding bidding key fields, and then the bidding key fields are stored to a designated position, so that retrieval and statistical analysis can be subsequently performed according to the bidding key fields. The specific process of obtaining the bidding key field in the predicted text through the regular expression is the prior art, and is not described herein again.
Therefore, the bid inviting texts potentially containing the preset key fields are screened out from the massive bid inviting information, and then the corresponding bid inviting key fields are extracted from the screened bid inviting texts, so that the processing data amount for extracting the bid inviting key fields is greatly reduced, the efficiency is improved, and the resources are saved.
In one embodiment, the bidding text may be processed based on bayesian theorem, and specifically, the filtering text unit 2 includes:
the calculation probability subunit is used for respectively calculating a first probability that the bidding text contains a preset key field and a second probability that the bidding text does not contain the preset key field according to the bidding text;
and the text screening subunit is used for screening the bidding text containing the preset key fields according to whether the bidding text contains the preset key fields or not according to the first probability and the second probability.
In order to calculate the first probability and the second probability, the bidding text of the pre-settable bidding set has two categories, namely a category containing a pre-set key field, the category is represented by s, and the category is not includedClasses containing preset key fields, using
Figure BDA0003041225730000087
Representing the category; the bidding text can be marked by preset key fields, and each preset key field is independent of each other, and w is equal to (w)1,w2,w3,...,wi) Representing a preset key field in the bid-on text; specifically, the first probability is calculated using the following formula:
P(D|S)=ΠiP(wi|S);
the second probability is calculated using the following formula:
Figure BDA0003041225730000081
wherein s represents a category containing a preset key field,
Figure BDA0003041225730000082
representing categories not containing preset key fields, D representing a bidding text, wiIndicating the ith preset key field.
As described above for the text filtering subunit, it is determined whether the bidding text includes the predetermined key field according to the first probability and the second probability, and specifically, P (s | D) and P (s | D) are calculated according to the first probability and the second probability
Figure BDA0003041225730000083
A value of (d); when in use
Figure BDA0003041225730000084
And judging that the bidding text contains a preset key field, otherwise, judging that the bidding text does not contain the preset key field.
In another embodiment, the formula is arranged according to the first probability and the second probability:
Figure BDA0003041225730000085
when in use
Figure BDA0003041225730000086
And judging that the bidding text contains a preset key field, otherwise, judging that the bidding text does not contain the preset key field.
In an embodiment, the apparatus for extracting a key field of bid-related information further includes:
a field defining unit for defining a plurality of preset key fields;
and the marking text unit is used for marking each bidding text according to each preset key field.
In this embodiment, a plurality of preset key fields are defined, for example, the above-mentioned province, city, item number, budget amount, purchase agency address contact, item contact address, item contact, purchase item, purchaser name, purchaser address, purchaser contact telephone, bid deadline, and the like, and then each bid text is labeled according to each preset key field, so as to determine whether the bid text has labels, which labels are included, and the corresponding label number and total label number in each bid text, and further calculate the first probability and the second probability according to the data.
Referring to fig. 3, the present application also provides a computer-readable storage medium 10, in which a computer program 20 is stored in the storage medium 10, and when the computer program runs on a computer, the computer program causes the computer to execute the key field extraction method on bid information described in the above embodiment.
Referring to fig. 4, the present application further provides a computer device 40 containing instructions, the computer device includes a memory 30 and a processor 50, the memory 30 stores a computer program 20, and the processor 30 executes the computer program 20 to implement the key field extraction method regarding bid information described in the above embodiment.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. A method for extracting key fields of bidding information is characterized by comprising the following steps:
forming a bidding set by acquiring published bidding information, wherein the bidding set comprises a plurality of bidding texts, and each bidding text has a corresponding number;
screening out bidding texts potentially containing preset key fields from the bidding set, recording the bidding texts as prediction texts, and extracting the numbers of the prediction texts;
acquiring the predicted text according to the serial number, and processing the predicted text according to a regular expression matching algorithm to extract a corresponding bidding key field;
and storing the bidding key field to a specified position for subsequent retrieval and analysis.
2. The method of claim 1, wherein the step of filtering out the bidding text containing the key fields from the bidding collection comprises:
respectively calculating a first probability that the bidding text contains preset key fields and a second probability that the bidding text does not contain the preset key fields according to the bidding text;
and screening out the bidding text containing preset key fields according to whether the bidding text contains the preset key fields or not according to the first probability and the second probability.
3. The method of claim 2, wherein the step of calculating a first probability that the bidding text contains a preset key field and a second probability that the bidding text does not contain a preset key field according to the bidding text comprises:
the first probability is calculated using the following formula:
P(D|S)=∏iP(wi|S);
the second probability is calculated using the following formula:
Figure FDA0003041225720000011
wherein s represents a category containing a preset key field,
Figure FDA0003041225720000012
representing categories not containing preset key fields, D representing a bidding text, wiIndicating the ith preset key field.
4. The method of claim 3, wherein the step of determining whether the bid-marking text contains a predetermined key field according to the first probability and the second probability comprises:
calculating P (s | D) and P (s | D) according to the first probability and the second probability
Figure FDA0003041225720000021
A value of (d);
when in use
Figure FDA0003041225720000022
And judging that the bidding text contains the preset key field, otherwise, judging that the bidding text does not contain the preset key field.
5. The method of claim 2, wherein the step of calculating a first probability that the bidding text contains the preset key fields and a second probability that the bidding text does not contain the preset key fields according to the bidding text comprises:
defining a plurality of preset key fields;
and labeling each bidding text according to each preset key field.
6. A key field extraction apparatus for bid information, comprising:
an obtaining information unit, configured to form a bid inviting set by obtaining published bid inviting information, where the bid inviting set includes a plurality of bid inviting texts, and each bid inviting text has a corresponding number;
the text screening unit is used for screening out bidding texts which potentially contain preset key fields from the bidding set, recording the bidding texts as prediction texts and extracting the numbers of the prediction texts;
the field extraction unit is used for acquiring the predicted text according to the serial number and processing the predicted text according to a regular expression matching algorithm so as to extract a corresponding bidding key field;
and the storage field unit is used for storing the bidding key field to a specified position for subsequent retrieval and analysis.
7. A key field extraction apparatus for bidding information, wherein the filtering text unit comprises:
the calculation probability subunit is used for respectively calculating a first probability that the bidding text contains a preset key field and a second probability that the bidding text does not contain the preset key field according to the bidding text;
and the text screening subunit is used for screening the bidding text containing the preset key fields according to whether the bidding text contains the preset key fields or not according to the first probability and the second probability.
8. A key field extraction apparatus on bid information according to claim 7, characterized in that the calculating probability subunit comprises:
the first probability is calculated using the following formula:
P(D|S)=∏iP(wi|S);
the second probability is calculated using the following formula:
Figure FDA0003041225720000031
wherein s represents a category containing a preset key field,
Figure FDA0003041225720000032
representing categories not containing preset key fields, D representing a bidding text, wiIndicating the ith preset key field.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the key field extraction method on bid information of any of claims 1 to 5.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for extracting a key field on bid information according to any one of claims 1 to 5.
CN202110462661.6A 2021-04-27 2021-04-27 Key field extraction method and device for bidding information Pending CN113128218A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110462661.6A CN113128218A (en) 2021-04-27 2021-04-27 Key field extraction method and device for bidding information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110462661.6A CN113128218A (en) 2021-04-27 2021-04-27 Key field extraction method and device for bidding information

Publications (1)

Publication Number Publication Date
CN113128218A true CN113128218A (en) 2021-07-16

Family

ID=76780450

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110462661.6A Pending CN113128218A (en) 2021-04-27 2021-04-27 Key field extraction method and device for bidding information

Country Status (1)

Country Link
CN (1) CN113128218A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704667A (en) * 2021-08-31 2021-11-26 北京百炼智能科技有限公司 Automatic extraction processing method and device for bidding announcement
CN116384948A (en) * 2023-06-02 2023-07-04 北京拓普丰联信息科技股份有限公司 Method, device, equipment and medium for extracting location of mark information item

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050050099A1 (en) * 2003-08-22 2005-03-03 Ge Information Systems System and method for extracting customer-specific data from an information network
CN109766438A (en) * 2018-12-12 2019-05-17 平安科技(深圳)有限公司 Biographic information extracting method, device, computer equipment and storage medium
CN112530597A (en) * 2020-11-26 2021-03-19 山东健康医疗大数据有限公司 Data table classification method, device and medium based on Bert character model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050050099A1 (en) * 2003-08-22 2005-03-03 Ge Information Systems System and method for extracting customer-specific data from an information network
CN109766438A (en) * 2018-12-12 2019-05-17 平安科技(深圳)有限公司 Biographic information extracting method, device, computer equipment and storage medium
CN112530597A (en) * 2020-11-26 2021-03-19 山东健康医疗大数据有限公司 Data table classification method, device and medium based on Bert character model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CATHERINE E. GRAVES ET AL.: "Memristor TCAMs Accelerate Regular Expression Matching for Network Intrusion Detection", 《 IEEE TRANSACTIONS ON NANOTECHNOLOGY》 *
胡卫华 等: "安全事件采集关键技术研究与实现", 《计算机应用与软件》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704667A (en) * 2021-08-31 2021-11-26 北京百炼智能科技有限公司 Automatic extraction processing method and device for bidding announcement
CN116384948A (en) * 2023-06-02 2023-07-04 北京拓普丰联信息科技股份有限公司 Method, device, equipment and medium for extracting location of mark information item
CN116384948B (en) * 2023-06-02 2023-08-25 北京拓普丰联信息科技股份有限公司 Method, device, equipment and medium for extracting location of mark information item

Similar Documents

Publication Publication Date Title
CN111523976B (en) Commodity recommendation method and device, electronic equipment and storage medium
CN107341716B (en) Malicious order identification method and device and electronic equipment
US8635197B2 (en) Systems and methods for efficient development of a rule-based system using crowd-sourcing
EP3686756A1 (en) Method and apparatus for grouping data records
CN107657048A (en) user identification method and device
CN106126630A (en) The collection of a kind of business object, searching method and device
CN112269805A (en) Data processing method, device, equipment and medium
CN113128218A (en) Key field extraction method and device for bidding information
CN117271385A (en) Garbage collection for data storage
CN107911448A (en) Content pushing method and device
CN113032668A (en) Product recommendation method, device and equipment based on user portrait and storage medium
CN114942971B (en) Extraction method and device of structured data
CN117235586B (en) Hotel customer portrait construction method, system, electronic equipment and storage medium
CN110851708B (en) Negative sample extraction method, device, computer equipment and storage medium
CN111581235B (en) Method and system for identifying common incidence relation
CN112631889A (en) Portrayal method, device and equipment for application system and readable storage medium
CN110796505B (en) Business object recommendation method and device
CN111127057A (en) Multi-dimensional user portrait restoration method
CN111930967B (en) Data query method and device based on knowledge graph and storage medium
CN112328899B (en) Information processing method, information processing apparatus, storage medium, and electronic device
CN114358879A (en) Real-time price monitoring method and system based on big data
CN114282119A (en) Scientific and technological information resource retrieval method and system based on heterogeneous information network
CN112015916A (en) Completion method and device of knowledge graph, server and computer storage medium
CN112256836A (en) Recording data processing method and device and server
CN116934418B (en) Abnormal order detection and early warning method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210716

RJ01 Rejection of invention patent application after publication