CN117252194B - Bid file detection method and system based on natural semantic model - Google Patents

Bid file detection method and system based on natural semantic model Download PDF

Info

Publication number
CN117252194B
CN117252194B CN202311531314.XA CN202311531314A CN117252194B CN 117252194 B CN117252194 B CN 117252194B CN 202311531314 A CN202311531314 A CN 202311531314A CN 117252194 B CN117252194 B CN 117252194B
Authority
CN
China
Prior art keywords
contents
processing
similarity
group
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311531314.XA
Other languages
Chinese (zh)
Other versions
CN117252194A (en
Inventor
陈琦
金建青
周俊
于树轩
陈佳玮
陈文�
程盼盼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Baitong Project Management Consulting Co ltd
Original Assignee
Shanghai Baitong Project Management Consulting Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Baitong Project Management Consulting Co ltd filed Critical Shanghai Baitong Project Management Consulting Co ltd
Priority to CN202311531314.XA priority Critical patent/CN117252194B/en
Publication of CN117252194A publication Critical patent/CN117252194A/en
Application granted granted Critical
Publication of CN117252194B publication Critical patent/CN117252194B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0611Request for offers or quotes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a method and a system for detecting bidding documents based on a natural semantic model, wherein the method comprises the steps of searching the bidding documents and extracting contents related to the same technical word and/or the same group of technical words into a collection group; selecting any two collection groups belonging to the same technical words and/or the same technical words and carrying out duplicate checking treatment on the contents in the collection groups; comparing the sequence of a plurality of processing areas belonging to one collection group with the similarity of the contents of the rest collection groups and calculating the similarity; and dividing the processing area with the similarity meeting the requirement into repeated contents, and giving the similarity according to the repeated contents. According to the bidding document detection method and system based on the natural semantic model, whether similar contents exist in the bidding document is determined by adopting an intelligent analysis means, and the similar contents are provided for workers to further analyze, so that the problems of large data quantity and low processing speed in the prior art are solved, and the bidding document detection method and system based on the natural semantic model have good practicability.

Description

Bid file detection method and system based on natural semantic model
Technical Field
The application relates to the technical field of data processing, in particular to a bidding document detection method and system based on a natural semantic model.
Background
Bid refers to the act of collusion of bidding items by improper means between the bidder and the bidder or between the bidder and the bidder to preempt competitors or compromise the bidder's interests. The string marks are bid-winning bid units or bid-winning bid units and bid-winning bid units mutually and mutually collude bid-winning bid units.
Aiming at the conditions of the purse and the string, the currently adopted solution is electronic information identification, data analysis and the like, wherein the electronic information identification is to acquire the IP address, mac address and basic information (such as contact name, telephone, mailbox, company address and the like) of a bidder, and if the information is repeated or the stakeholders of a plurality of suppliers have association relations, the bidding document is marked. The data analysis is based on the collected bidding documents, from which clues are found, such as the bidding documents are identical in content and the last holder is identical to the company; the bid price is regularly decreased or increased, and the bid price compositions of a plurality of bidders are abnormal and consistent.
For electronic information, the electronic bidding method can be used for solving the problem that a bidder needs to upload bidding files in a network, and in the process, related information of used electronic equipment and even operators are recorded, such as IP addresses or Mac addresses of historical bids of the bidder are recorded in a system; however, in terms of data analysis, such as checking of a technical scheme, manual checking is mainly relied on at present, but when the real problems of large data volume, time requirement, emergency and the like exist, the actual effect of manual checking is poor.
The prior art can not meet the demands of people at present, and based on the current situation, improvement on the prior art is needed urgently so as to avoid the situation by using technical means.
Disclosure of Invention
The application provides a bidding document detection method and system based on a natural semantic model, and the above purposes of the application are achieved through the following technical scheme:
in a first aspect, the present application provides a method for detecting a bidding document based on a natural semantic model, including:
searching a plurality of received electronic bidding documents by using technical words in a technical word bank, and extracting contents associated with the same technical word and/or the same group of technical words into a collection group, wherein the contents in each collection group belong to the same electronic bidding document;
any two contents in the collection group belonging to the same technical word and/or the same group are subjected to duplicate checking treatment, and the contents in the collection group are divided into repeated contents and non-repeated contents;
dividing the non-repeated content into areas by using paragraph marks to obtain a plurality of processing areas;
comparing the sequence of a plurality of processing areas belonging to one collection group with the similarity of the contents of the rest collection groups and calculating the similarity;
dividing a processing area with the similarity meeting the requirement into repeated contents; and
and giving the similarity of two grouped groups belonging to the same technical term and/or the same technical term according to the repeated content.
In a possible implementation manner of the first aspect, comparing the order of the plurality of processing regions belonging to one collection group with the content in the remaining collection groups includes:
dividing the content in the residual collection group into areas by using paragraph marks to obtain a plurality of comparison processing areas; and
and comparing the sequence of the plurality of processing areas belonging to one collection group with all comparison processing areas in a similarity mode.
In a possible implementation manner of the first aspect, the method further includes:
counting the number of coincident characters in a processing area and a comparison processing area; and
and when the number of the overlapped characters is larger than or equal to a preset number value, the processing area is divided into repeated contents.
In a possible implementation manner of the first aspect, counting the number of coincident characters in a processing area and a contrast processing area further includes:
dividing the characters in the processing area and the contrast processing area into overlapping text fields and non-overlapping text fields; and
and calculating the similarity of the non-coincident text fields between the two coincident text fields by using semantic recognition, and dividing the non-coincident text fields with the similarity meeting the requirement into the coincident text fields.
In a possible implementation manner of the first aspect, the method further includes:
identifying forms in the electronic bidding document and extracting content in the forms; and
the contents in one cell in the table are taken as one processing area.
In a possible implementation manner of the first aspect, the contents in the first row and/or the first column of the table are added to each processing area.
In a possible implementation manner of the first aspect, the contents in the first row and/or the first column of the table are located before or after the contents in the added processing area;
there is a blank area between the contents in the first row and/or column of the table and the contents in the added processing area.
In a second aspect, the present application provides a device for detecting a bidding document, including:
the collecting unit is used for searching the received electronic bidding documents by using the technical words in the technical word library, extracting the content associated with the same technical word and/or the same group of technical words into a collecting group, wherein the content in each collecting group belongs to the same electronic bidding document;
the first duplicate checking processing unit is used for performing duplicate checking processing on the contents in any two gathering groups belonging to the same technical word and/or the same technical word group, and dividing the contents in the gathering groups into repeated contents and non-repeated contents;
the first area dividing unit is used for dividing areas of the non-repeated content by using paragraph marks to obtain a plurality of processing areas;
the first similarity processing unit is used for comparing the sequence of a plurality of processing areas belonging to one collection group with the similarity of the contents in the rest collection groups and calculating the similarity;
the second duplicate checking processing unit is used for dividing the processing area with the similarity meeting the requirement into repeated contents; and
and the result unit is used for giving the similarity of two aggregation groups belonging to the same technical word and/or the same group according to the repeated content.
In a third aspect, the present application provides a system for detecting a bid document, the system comprising:
one or more memories for storing instructions; and
one or more processors configured to invoke and execute the instructions from the memory, to perform the method as described in the first aspect and any possible implementation of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium comprising:
a program which, when executed by a processor, performs a method as described in the first aspect and any possible implementation of the first aspect.
In a fifth aspect, the present application provides a computer program product comprising program instructions which, when executed by a computing device, perform a method as described in the first aspect and any possible implementation manner of the first aspect.
In a sixth aspect, the present application provides a chip system comprising a processor for implementing the functions involved in the above aspects, e.g. generating, receiving, transmitting, or processing data and/or information involved in the above methods. The chip system can be composed of chips, and can also comprise chips and other discrete devices. In one possible design, the system on a chip also includes memory to hold the necessary program instructions and data. The processor and the memory may be decoupled, provided on different devices, respectively, connected by wire or wirelessly, or the processor and the memory may be coupled on the same device.
According to the bidding document detection method and system based on the natural semantic model, whether similar contents exist in the bidding document is determined by adopting an intelligent analysis means, and the similar contents are provided for staff to further analyze, so that the problems of large data quantity and low processing speed in the prior art are solved, and better directivity is achieved.
Drawings
FIG. 1 is a schematic block diagram of the steps of a method for detecting bidding documents based on natural semantic model.
Fig. 2 is a schematic diagram of a concept of using technical terms to obtain an aggregation group provided in the present application.
Fig. 3 is a schematic diagram of similarity comparison between a processing region and a comparative processing region.
Fig. 4 is a schematic diagram of similarity comparison between non-coincident text fields provided in the present application.
Fig. 5 is a schematic diagram of similarity comparison of a table provided in the present application.
Description of the embodiments
The technical solutions in the present application are described in further detail below with reference to the accompanying drawings.
The application discloses a bidding document detection method based on a natural semantic model, referring to fig. 1, the detection method comprises the following steps:
s101, searching a plurality of received electronic bidding documents by using technical words in a technical word bank, and extracting contents associated with the same technical word and/or the same group of technical words into an aggregation group, wherein the contents in each aggregation group belong to the same electronic bidding document;
s102, performing duplicate checking treatment on contents in any two aggregation groups belonging to the same technical words and/or the same groups, and dividing the contents in the aggregation groups into repeated contents and non-repeated contents;
s103, carrying out region division on the non-repeated content by using paragraph marks to obtain a plurality of processing regions;
s104, comparing the sequence of a plurality of processing areas belonging to one collection group with the similarity of the contents of the rest collection groups and calculating the similarity;
s105, dividing a processing area with the similarity meeting the requirement into repeated contents; and
and S106, giving the similarity of two collection groups belonging to the same technical term and/or the same technical term group according to the repeated content.
Firstly, it is required to explain that the bidding document detection method based on the natural semantic model disclosed by the application is applied to a server, the server can provide electronic bidding service, and in the bidding process, the bidding document is firstly sent to the server and the account is given downloading permission or viewing permission.
When the account checks the bidding document or downloads the bidding document, the account decides whether to participate in the bidding process or not, and uploads the bidding document after logging in the bidding process, the server can record basic information such as an IP address and a Mac address used in the bidding document uploading process, and can acquire portrait information of staff uploading the bidding document with basic information of the supplier if necessary.
After the bid document is uploaded, the server first analyzes basic information of the bid document, such as the IP address, mac address and provider basic information (such as contact name, phone, mailbox, company address, etc.), and if the information is repeated, for example, the Mac addresses of two bid delivery persons are the same, the bid document is marked. In addition, if the bid price is displayed in a regular decreasing or increasing mode, the bid file is marked.
And the marked bid files are sent to a manual auditing department for auditing, and the marked bid files are cancelled or marked as invalid after the auditing is finished.
After the above processing procedure, analysis of the remaining (i.e. untagged) bidding documents is started, specifically, in step S101, the received plurality of electronic bidding documents are searched by using the technical terms in the technical word bank, and the content related to the same and/or the same technical terms is extracted into a collection group, where the content in each collection group belongs to the same electronic bidding document, as shown in fig. 2.
The term refers herein to terms involved in bidding processes, which mainly relate to technical content (e.g., products, specifications, models, maintenance, service) and business content (information related to bidders, such as qualification, performance proof, etc.). These terms come from words commonly used in the industry or from bidding documents.
For example, the technical terms include open caisson foundation, foundation coefficient, foundation bearing capacity, composite foundation, rigid foundation, foundation treatment and the like. And technical nouns related to government office furniture purchasing and bidding include detection, installation and debugging, spare parts and the like.
In this embodiment, a technical word library of related industries is constructed in the bidding system, related technical nouns are extracted from the technical word library to search a plurality of electronic bidding documents in the same bidding project, for example, for bidding documents of a building project, a "foundation coefficient and foundation bearing capacity" in the technical word library are used as a group of technical words, a "sunk well foundation" is used as another group of technical words, and contents of the technical words belonging to each group are respectively extracted to a corresponding collection group. In addition, in one possible implementation manner, the constructed technical word library can be expanded at the same time by inputting technical words into the system.
By collecting the related content through the technical words, the subsequent duplication checking process can be more targeted, and the content related to the technical words has a larger association degree with the bid.
In a general duplication checking method, the ratio of the repeated content in the whole content is calculated, but the bidding document may duplicate some technologies and business terms in the bidding document, so that a great deal of duplication exists in the bidding document, and the judgment based on the method has high similarity, but the duplication is reasonable, obviously, the disadvantage of the method is that important content and non-important content cannot be distinguished, the repetition rate of the important content is high, the repetition rate of the non-important content is low, and when the occupation of the important content is low, the potential exists that the occupation of the repeated content in the whole content is calculated to be low but the purse string is actually present. Therefore, in the application, the processing manner of extracting the content by the technical terms is used, the content of the part of the technical terms is extracted from the bidding document, and then the similarity determination is carried out. The manner of extracting content from technical terms can also avoid bidding documents compiled based on the same bidding requirements, because these standard content or formatted content can be directly drawn out of the similarity determination range when setting technical terms. Meanwhile, related contents can be concentrated to be processed through technical words, so that the problem that the contents are substantially similar but cannot be judged to be similar due to the fact that the disordered sequences are used is avoided.
In some possible implementations, the sentence at the position where the technical word appears is extracted, here marked with a period (or other specific symbol) as the position, while the preceding sentence and the following sentence at that position are extracted, for a total of three sentences.
For the natural semantic model, the concrete explanation is as follows: in natural language, there are words and relationships between words, and words and relationships between words. For example, the words with similar meaning, anti-meaning and the same attribute can be divided into "word processing", "word processing" and "sentence processing", and the natural semantic model is based on "word processing" and "word processing", and the "word" and "word" are analyzed in combination with the context in "sentence" so as to improve the accuracy of obtaining the collection group by using the technical words.
In step S102, any two contents in the collection group belonging to the same technical term and/or the same technical term group are subjected to duplicate checking processing, and the contents in the collection group are divided into duplicate contents and non-duplicate contents. In the step, similarity analysis is carried out on the rest of the bidding documents, and each bidding document is compared with the rest of the bidding documents one by one in the analysis process to carry out duplicate checking treatment. If in the electronic bidding documents in the same bidding project, five collection groups belonging to the technical word groups of foundation coefficient and foundation bearing capacity exist, any one collection group of the five collection groups needs to be subjected to repeated searching processing with the remaining four collection groups, and repeated content and non-repeated content are obtained.
In step S103, the non-repetitive content is further divided into a plurality of processing areas using the paragraph marks. The purpose of the region division of non-duplicate content is to further calculate the similarity of content in any two grouped sets of technical terms belonging to the same and/or same group. Because there may be some content that is substantially repetitive among the non-repetitive content obtained in step S102 that is not found.
In some possible implementations, a line feed, blank line, first line indentation, or special characters and codes may be employed as paragraph marks. For example, using a line feed as a paragraph mark, dividing non-repetitive content into different areas based on the detected line feed, and obtaining a plurality of processing areas; also for example, leader indents are used as paragraph labels, i.e. non-repeating content is divided into a number of different regions based on detected leader indents.
In step S104, the order of the plurality of processing regions belonging to one cluster group is compared with the similarity of the contents in the remaining cluster group and the similarity is calculated. The content executed in this step is found to be substantially repetitive. In particular, each processing region in one collection group is compared in similarity with the contents in the remaining collection groups.
This way, the plurality of processing areas in the collection group are dispersed into each remaining collection group, which has an advantage of being able to find whether similar contents exist in the remaining bid files as much as possible. In practical application, a plurality of processing areas in the collection group can have similar contents with different bidding documents respectively, when any two bidding documents are subjected to similarity comparison, the situation that the similarity meets the requirement possibly occurs, but when samples are scattered into all the remaining bidding documents, the situation that the similar contents exist can be found more easily.
Then in step S105, the processing area with the similarity meeting the requirement is divided into repeated contents, and finally, the similarity of two grouped groups belonging to the same technical term and/or the same technical term is given according to the repeated contents, that is, the content in step S106.
Specifically, for the collection group in one bid document, if it appears similar to the contents in different bid documents, respectively, it can be considered that there are possibilities of the bid documents for the bid and the bid, because it is judged from the behavior that the similarity avoidance of two bid documents has a certain technical feasibility, but the similarity avoidance of multiple bid documents has difficulty, because it involves more complicated and cumbersome work, in order to avoid the problem, each bid document needs to be rewritten, and if the situation that each bid document needs to be rewritten, it means that the problem of the bid and the bid has disappeared from the technical perspective.
But from another aspect, if some enterprises frequently appear in the same bidding scene, the higher the frequency of occurrence, the more easily the enterprises are found, the content recorded in the scene can be recorded into an information base, and the content is provided for screening and using basic information before bidding.
In some possible implementations, comparing the order of the plurality of processing regions belonging to one of the collection groups to the similarity of the content in the remaining collection groups includes the steps of:
s201, using paragraph marks to divide the content in the residual collection group into areas to obtain a plurality of comparison processing areas; and
s202, comparing the sequence of the plurality of processing areas belonging to one collection group with all comparison processing areas in a similarity mode.
Specifically, the content in the remaining collection groups is divided into areas by using paragraph marks, so as to obtain a plurality of comparison processing areas, each comparison processing area comprises a plurality of characters, and then the similarity comparison is performed between the sequence of the plurality of processing areas belonging to one collection group and all the comparison processing areas, as shown in fig. 3.
The comparison mode can be regarded as similarity comparison of sentences, and the similarity comparison of sentences is advantageous in that higher processing efficiency and more accurate comparison results can be obtained. In the manner provided by the application, the sentence similarity comparison method is not limited to the position of the sentence, but the sentence is compared with the content in other collection groups after being independent.
In some possible implementations, further:
s301, counting the number of coincident characters in a processing area and a comparison processing area; and
s302, when the number of the overlapped characters is larger than or equal to a preset number value, the processing area is divided into repeated contents.
The content in step S301 to step S302 is that when the number of coincident characters in a processing area and a comparison processing area is greater than or equal to a set number, the whole processing area is divided into repeated content.
In some examples, counting the number of coincident text in one processing region and one contrast processing region further includes:
s401, dividing characters in a processing area and a comparison processing area into overlapping text fields and non-overlapping text fields; and
s402, calculating the similarity of non-coincident text fields between two coincident text fields by using semantic recognition, and dividing the non-coincident text fields with similar similarity compounding requirements into the coincident text fields.
Specifically, the text is divided into a coincident text field and a non-coincident text field, then the similarity of the non-coincident text field between the two coincident text fields is calculated by using semantic recognition, and finally the non-coincident text field required by similarity compounding is divided into the coincident text fields. The role of semantic recognition is for the case of text replacement to have similarity to what can be identified as being, as shown in fig. 4.
In some examples, for the table that appears, the processing is performed as follows:
s501, identifying a form in the electronic bidding document and extracting contents in the form; and
s502, taking the content in one cell in the table as a processing area.
Specifically, the contents in each cell in the table are treated as a single treatment area, and then treated in the manner described in the above. The method aims to avoid the phenomenon of avoiding duplicate checking when the characters are tabulated.
Further, referring to fig. 5, adding the contents of the first row and/or first column of the table to each processing region is more targeted because the first row (top) and first column (left-most) of the table often contain guiding information, for which the guiding information can be entered into each processing region, making the contents in the processing regions more targeted.
In some possible implementations, the content in the first row of the table and/or the first column of the table is located before or after the content in the added processing region, the latter being processed twice using both before and after.
In other possible implementations, a blank area exists between the content in the first row and/or the first column of the table and the content in the added processing area, and the blank area indicates that the content in the first row and/or the first column of the table is not consistent with the content in the added processing area, so that a larger search range can be obtained, and the length of the blank area can be set according to a specific experience value.
The application also provides a device for detecting the bidding documents, which comprises:
the collecting unit is used for searching the received electronic bidding documents by using the technical words in the technical word library, extracting the content associated with the same technical word and/or the same group of technical words into a collecting group, wherein the content in each collecting group belongs to the same electronic bidding document;
the first duplicate checking processing unit is used for performing duplicate checking processing on the contents in any two gathering groups belonging to the same technical word and/or the same technical word group, and dividing the contents in the gathering groups into repeated contents and non-repeated contents;
the first area dividing unit is used for dividing areas of the non-repeated content by using paragraph marks to obtain a plurality of processing areas;
the first similarity processing unit is used for comparing the sequence of a plurality of processing areas belonging to one collection group with the similarity of the contents in the rest collection groups and calculating the similarity;
the second duplicate checking processing unit is used for dividing the processing area with the similarity meeting the requirement into repeated contents; and
and the result unit is used for giving the similarity of two aggregation groups belonging to the same technical word and/or the same group according to the repeated content.
Further, the method further comprises the following steps:
the second area dividing unit is used for dividing the areas of the contents in the residual collection group by using paragraph marks to obtain a plurality of comparison processing areas; and
and the second similarity processing unit is used for comparing the sequence of the plurality of processing areas belonging to one grouping group with all the comparison processing areas in a similarity manner.
Further, the method further comprises the following steps:
the quantity counting unit is used for counting the quantity of coincident characters in a processing area and a comparison processing area; and
and the first repartitioning unit is used for dividing the processing area into repeated contents when the number of the overlapped characters is larger than or equal to a set number value.
Further, the method further comprises the following steps:
the field dividing unit is used for dividing the characters in the processing area and the comparison processing area into overlapping text fields and non-overlapping text fields; and
and the second subdivision unit is used for calculating the similarity of the non-coincident text fields between the two coincident text fields by using semantic recognition, and dividing the non-coincident text fields with similar similarity compound requirements into the coincident text fields.
Further, the method further comprises the following steps:
an identification unit for identifying the form in the electronic bidding document and extracting the content in the form; and
and the processing unit is used for taking the content in one cell in the table as one processing area.
Further, the contents of the first row and/or first column of the table are added to each processing region.
Further, the contents in the first row of the table and/or the first column of the table are located before or after the contents in the added processing area;
there is a blank area between the contents in the first row and/or column of the table and the contents in the added processing area.
In one example, the unit in any of the above apparatuses may be one or more integrated circuits configured to implement the above methods, for example: one or more application specific integrated circuits (application specific integratedcircuit, ASIC), or one or more digital signal processors (digital signal processor, DSP), or one or more field programmable gate arrays (field programmable gate array, FPGA), or a combination of at least two of these integrated circuit forms.
For another example, when the units in the apparatus may be implemented in the form of a scheduler of processing elements, the processing elements may be general-purpose processors, such as a central processing unit (central processing unit, CPU) or other processor that may invoke the program. For another example, the units may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Various objects such as various messages/information/devices/network elements/systems/devices/actions/operations/processes/concepts may be named in the present application, and it should be understood that these specific names do not constitute limitations on related objects, and that the named names may be changed according to the scenario, context, or usage habit, etc., and understanding of technical meaning of technical terms in the present application should be mainly determined from functions and technical effects that are embodied/performed in the technical solution.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It should also be understood that in various embodiments of the present application, first, second, etc. are merely intended to represent that multiple objects are different. For example, the first time window and the second time window are only intended to represent different time windows. Without any effect on the time window itself, the first, second, etc. mentioned above should not impose any limitation on the embodiments of the present application.
It is also to be understood that in the various embodiments of the application, terms and/or descriptions of the various embodiments are consistent and may be referenced to one another in the absence of a particular explanation or logic conflict, and that the features of the various embodiments may be combined to form new embodiments in accordance with their inherent logic relationships.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a computer-readable storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned computer-readable storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The application also provides a system for detecting the bidding document, which comprises:
one or more memories for storing instructions; and
one or more processors configured to invoke and execute the instructions from the memory to perform the method as described above.
The present application also provides a computer program product comprising instructions that, when executed, cause a system for detecting a bid document to perform operations corresponding to the system for detecting a bid document of the above-described method.
The present application also provides a chip system comprising a processor for implementing the functions involved in the above, e.g. generating, receiving, transmitting, or processing data and/or information involved in the above method.
The chip system can be composed of chips, and can also comprise chips and other discrete devices.
The processor referred to in any of the foregoing may be a CPU, microprocessor, ASIC, or integrated circuit that performs one or more of the procedures for controlling the transmission of feedback information described above.
In one possible design, the system on a chip also includes memory to hold the necessary program instructions and data. The processor and the memory may be decoupled, and disposed on different devices, respectively, and connected by wired or wireless means, so as to support the chip system to implement the various functions in the foregoing embodiments. In the alternative, the processor and the memory may be coupled to the same device.
Optionally, the computer instructions are stored in a memory.
Alternatively, the memory may be a storage unit in the chip, such as a register, a cache, etc., and the memory may also be a storage unit in the terminal located outside the chip, such as a ROM or other type of static storage device, a RAM, etc., that may store static information and instructions.
It is to be understood that the memory in this application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.
The nonvolatile memory may be a ROM, a Programmable ROM (PROM), an Erasable Programmable ROM (EPROM), an electrically erasable programmable EPROM (EEPROM), or a flash memory.
The volatile memory may be RAM, which acts as external cache. There are many different types of RAM, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM.
The embodiments of the present invention are all preferred embodiments of the present application, and are not intended to limit the scope of the present application in this way, therefore: all equivalent changes in structure, shape and principle of this application should be covered in the protection scope of this application.

Claims (10)

1. A bidding document detection method based on a natural semantic model is characterized by comprising the following steps:
searching a plurality of received electronic bidding documents by using technical words in a technical word bank, and extracting contents associated with the same technical word and/or the same group of technical words into a collection group, wherein the contents in each collection group belong to the same electronic bidding document;
any two contents in the collection group belonging to the same technical word and/or the same group are subjected to duplicate checking treatment, and the contents in the collection group are divided into repeated contents and non-repeated contents;
dividing the non-repeated content into areas by using paragraph marks to obtain a plurality of processing areas;
comparing the sequence of a plurality of processing areas belonging to one collection group with the similarity of the contents belonging to the rest collection groups belonging to the same technical words and/or the same technical words, and calculating the similarity;
dividing a processing area with the similarity meeting the requirement into repeated contents; and
and giving the similarity of two grouped groups belonging to the same technical term and/or the same technical term according to the repeated content.
2. The method of claim 1, wherein comparing the order of the plurality of processing regions belonging to one collection group with the content of the remaining collection group belonging to the same and/or same technical term group comprises:
dividing the content in other collection groups into areas by using paragraph marks to obtain a plurality of comparison processing areas; and
and comparing the sequence of the plurality of processing areas belonging to one collection group with all comparison processing areas in a similarity mode.
3. The natural semantic model based bid document detection method of claim 2, further comprising:
counting the number of coincident characters in a processing area and a comparison processing area; and
and when the number of the overlapped characters is larger than or equal to a preset number value, the processing area is divided into repeated contents.
4. The method for detecting a bidding document based on natural semantic model of claim 3, wherein counting the number of coincident words in a processing region and a contrast processing region further comprises:
dividing the characters in the processing area and the contrast processing area into overlapping text fields and non-overlapping text fields; and
and calculating the similarity of the non-coincident text fields between the two coincident text fields by using semantic recognition, and dividing the non-coincident text fields with the similarity meeting the requirement into the coincident text fields.
5. The natural semantic model based bidding document detection method according to any one of claims 1 to 4, further comprising:
identifying forms in the electronic bidding document and extracting content in the forms; and
the contents in one cell in the table are taken as one processing area.
6. The method for detecting bidding documents based on natural semantic model of claim 5, wherein the contents of the first row of the form and/or the first column of the form are added to each processing region.
7. The natural semantic model based bidding document detection method of claim 6, wherein the content in the first row of the form and/or the first column of the form is located before or after the content in the joining processing region;
there is a blank area between the contents in the first row and/or column of the table and the contents in the added processing area.
8. A device for detecting a bidding document, comprising:
the collecting unit is used for searching the received electronic bidding documents by using the technical words in the technical word library, extracting the content associated with the same technical word and/or the same group of technical words into a collecting group, wherein the content in each collecting group belongs to the same electronic bidding document;
the first duplicate checking processing unit is used for performing duplicate checking processing on the contents in any two gathering groups belonging to the same technical word and/or the same technical word group, and dividing the contents in the gathering groups into repeated contents and non-repeated contents;
the first area dividing unit is used for dividing areas of the non-repeated content by using paragraph marks to obtain a plurality of processing areas;
the first similarity processing unit is used for comparing the sequence of a plurality of processing areas belonging to one grouping group with the content belonging to the rest grouping group in the same technical word and/or the same group and calculating the similarity;
the second duplicate checking processing unit is used for dividing the processing area with the similarity meeting the requirement into repeated contents; and
and the result unit is used for giving the similarity of two aggregation groups belonging to the same technical word and/or the same group according to the repeated content.
9. A system for detecting a bid document, the system comprising:
one or more memories for storing instructions; and
one or more processors to invoke and execute the instructions from the memory to perform the method of any of claims 1 to 7.
10. A computer-readable storage medium, the computer-readable storage medium comprising:
program which, when executed by a processor, performs a method according to any one of claims 1 to 7.
CN202311531314.XA 2023-11-17 2023-11-17 Bid file detection method and system based on natural semantic model Active CN117252194B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311531314.XA CN117252194B (en) 2023-11-17 2023-11-17 Bid file detection method and system based on natural semantic model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311531314.XA CN117252194B (en) 2023-11-17 2023-11-17 Bid file detection method and system based on natural semantic model

Publications (2)

Publication Number Publication Date
CN117252194A CN117252194A (en) 2023-12-19
CN117252194B true CN117252194B (en) 2024-02-23

Family

ID=89128005

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311531314.XA Active CN117252194B (en) 2023-11-17 2023-11-17 Bid file detection method and system based on natural semantic model

Country Status (1)

Country Link
CN (1) CN117252194B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112258303A (en) * 2020-11-16 2021-01-22 北京筑龙信息技术有限责任公司 Surrounding string mark early warning analysis method and device, electronic equipment and storage medium
CN112800113A (en) * 2021-02-04 2021-05-14 天津德尔塔科技有限公司 Bidding auditing method and system based on data mining analysis technology
CN114492323A (en) * 2021-12-27 2022-05-13 博思数采科技发展有限公司 Method and device for detecting enclosing and bidding behavior based on electronic bidding document comparison
CN115795000A (en) * 2023-02-07 2023-03-14 南方电网数字电网研究院有限公司 Joint similarity algorithm comparison-based enclosure identification method and device
CN116485190A (en) * 2023-06-26 2023-07-25 中招联合信息股份有限公司 Enterprise bidding information file risk prediction system based on multi-file comparison analysis
CN116484231A (en) * 2023-03-14 2023-07-25 厦门市民数据服务股份有限公司 Abnormal group bidding, surrounding bidding behavior identification method, device, equipment and medium
US11748577B1 (en) * 2022-08-22 2023-09-05 Rohirrim, Inc. Computer-generated content based on text classification, semantic relevance, and activation of deep learning large language models

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10453101B2 (en) * 2016-10-14 2019-10-22 SoundHound Inc. Ad bidding based on a buyer-defined function

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112258303A (en) * 2020-11-16 2021-01-22 北京筑龙信息技术有限责任公司 Surrounding string mark early warning analysis method and device, electronic equipment and storage medium
CN112800113A (en) * 2021-02-04 2021-05-14 天津德尔塔科技有限公司 Bidding auditing method and system based on data mining analysis technology
CN114492323A (en) * 2021-12-27 2022-05-13 博思数采科技发展有限公司 Method and device for detecting enclosing and bidding behavior based on electronic bidding document comparison
CN115249007A (en) * 2021-12-27 2022-10-28 博思数采科技发展有限公司 Method and device for detecting enclosing and bidding behavior based on electronic bidding document comparison
US11748577B1 (en) * 2022-08-22 2023-09-05 Rohirrim, Inc. Computer-generated content based on text classification, semantic relevance, and activation of deep learning large language models
CN115795000A (en) * 2023-02-07 2023-03-14 南方电网数字电网研究院有限公司 Joint similarity algorithm comparison-based enclosure identification method and device
CN116484231A (en) * 2023-03-14 2023-07-25 厦门市民数据服务股份有限公司 Abnormal group bidding, surrounding bidding behavior identification method, device, equipment and medium
CN116485190A (en) * 2023-06-26 2023-07-25 中招联合信息股份有限公司 Enterprise bidding information file risk prediction system based on multi-file comparison analysis

Also Published As

Publication number Publication date
CN117252194A (en) 2023-12-19

Similar Documents

Publication Publication Date Title
CN109299362B (en) Similar enterprise recommendation method and device, computer equipment and storage medium
CN111797210A (en) Information recommendation method, device and equipment based on user portrait and storage medium
WO2019109918A1 (en) Abstract text generation method, computer readable storage medium and computer device
US8117609B2 (en) System and method for optimizing changes of data sets
JP6734946B2 (en) Method and apparatus for generating information
CN109657137B (en) Public opinion news classification model construction method, device, computer equipment and storage medium
CN108153824B (en) Method and device for determining target user group
CN110781246A (en) Enterprise association relationship construction method and system
CN109376273B (en) Enterprise information map construction method, enterprise information map construction device, computer equipment and storage medium
CN109635084B (en) Real-time rapid duplicate removal method and system for multi-source data document
CN110275965A (en) Pseudo event detection method, electronic device and computer readable storage medium
CN106959976B (en) Search processing method and device
CN102591855A (en) Data identification method and data identification system
CN105045911B (en) Label generating method and equipment for user to mark
CN111224859A (en) Method for deleting chat records, computer equipment and storage medium
CN117252194B (en) Bid file detection method and system based on natural semantic model
CN109918661B (en) Synonym acquisition method and device
CN109726290B (en) Complaint classification model determination method and device and computer-readable storage medium
CN111161088A (en) Bill processing method, device and equipment
CN110688995A (en) Map query processing method, computer-readable storage medium and mobile terminal
CN115994534A (en) Government scene hot word mining method, device, equipment and storage medium
CN110941952A (en) Method and device for perfecting audit analysis model
CN111460268B (en) Method and device for determining database query request and computer equipment
CN112861532B (en) Address standardization processing method, device, equipment and online searching system
CN104252488A (en) Data processing method and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant