CN117252194B

CN117252194B - Bid file detection method and system based on natural semantic model

Info

Publication number: CN117252194B
Application number: CN202311531314.XA
Authority: CN
Inventors: 陈琦; 金建青; 周俊; 于树轩; 陈佳玮; 陈文�; 程盼盼
Original assignee: Shanghai Baitong Project Management Consulting Co ltd
Current assignee: Shanghai Baitong Project Management Consulting Co ltd
Priority date: 2023-11-17
Filing date: 2023-11-17
Publication date: 2024-02-23
Anticipated expiration: 2043-11-17
Also published as: CN117252194A

Abstract

The application relates to a method and a system for detecting bidding documents based on a natural semantic model, wherein the method comprises the steps of searching the bidding documents and extracting contents related to the same technical word and/or the same group of technical words into a collection group; selecting any two collection groups belonging to the same technical words and/or the same technical words and carrying out duplicate checking treatment on the contents in the collection groups; comparing the sequence of a plurality of processing areas belonging to one collection group with the similarity of the contents of the rest collection groups and calculating the similarity; and dividing the processing area with the similarity meeting the requirement into repeated contents, and giving the similarity according to the repeated contents. According to the bidding document detection method and system based on the natural semantic model, whether similar contents exist in the bidding document is determined by adopting an intelligent analysis means, and the similar contents are provided for workers to further analyze, so that the problems of large data quantity and low processing speed in the prior art are solved, and the bidding document detection method and system based on the natural semantic model have good practicability.

Description

Bid file detection method and system based on natural semantic model

Technical Field

The application relates to the technical field of data processing, in particular to a bidding document detection method and system based on a natural semantic model.

Background

Bid refers to the act of collusion of bidding items by improper means between the bidder and the bidder or between the bidder and the bidder to preempt competitors or compromise the bidder's interests. The string marks are bid-winning bid units or bid-winning bid units and bid-winning bid units mutually and mutually collude bid-winning bid units.

Aiming at the conditions of the purse and the string, the currently adopted solution is electronic information identification, data analysis and the like, wherein the electronic information identification is to acquire the IP address, mac address and basic information (such as contact name, telephone, mailbox, company address and the like) of a bidder, and if the information is repeated or the stakeholders of a plurality of suppliers have association relations, the bidding document is marked. The data analysis is based on the collected bidding documents, from which clues are found, such as the bidding documents are identical in content and the last holder is identical to the company; the bid price is regularly decreased or increased, and the bid price compositions of a plurality of bidders are abnormal and consistent.

For electronic information, the electronic bidding method can be used for solving the problem that a bidder needs to upload bidding files in a network, and in the process, related information of used electronic equipment and even operators are recorded, such as IP addresses or Mac addresses of historical bids of the bidder are recorded in a system; however, in terms of data analysis, such as checking of a technical scheme, manual checking is mainly relied on at present, but when the real problems of large data volume, time requirement, emergency and the like exist, the actual effect of manual checking is poor.

The prior art can not meet the demands of people at present, and based on the current situation, improvement on the prior art is needed urgently so as to avoid the situation by using technical means.

Disclosure of Invention

The application provides a bidding document detection method and system based on a natural semantic model, and the above purposes of the application are achieved through the following technical scheme:

in a first aspect, the present application provides a method for detecting a bidding document based on a natural semantic model, including:

searching a plurality of received electronic bidding documents by using technical words in a technical word bank, and extracting contents associated with the same technical word and/or the same group of technical words into a collection group, wherein the contents in each collection group belong to the same electronic bidding document;

any two contents in the collection group belonging to the same technical word and/or the same group are subjected to duplicate checking treatment, and the contents in the collection group are divided into repeated contents and non-repeated contents;

dividing the non-repeated content into areas by using paragraph marks to obtain a plurality of processing areas;

comparing the sequence of a plurality of processing areas belonging to one collection group with the similarity of the contents of the rest collection groups and calculating the similarity;

dividing a processing area with the similarity meeting the requirement into repeated contents; and

and giving the similarity of two grouped groups belonging to the same technical term and/or the same technical term according to the repeated content.

In a possible implementation manner of the first aspect, comparing the order of the plurality of processing regions belonging to one collection group with the content in the remaining collection groups includes:

dividing the content in the residual collection group into areas by using paragraph marks to obtain a plurality of comparison processing areas; and

and comparing the sequence of the plurality of processing areas belonging to one collection group with all comparison processing areas in a similarity mode.

In a possible implementation manner of the first aspect, the method further includes:

counting the number of coincident characters in a processing area and a comparison processing area; and

and when the number of the overlapped characters is larger than or equal to a preset number value, the processing area is divided into repeated contents.

In a possible implementation manner of the first aspect, counting the number of coincident characters in a processing area and a contrast processing area further includes:

dividing the characters in the processing area and the contrast processing area into overlapping text fields and non-overlapping text fields; and

and calculating the similarity of the non-coincident text fields between the two coincident text fields by using semantic recognition, and dividing the non-coincident text fields with the similarity meeting the requirement into the coincident text fields.

identifying forms in the electronic bidding document and extracting content in the forms; and

the contents in one cell in the table are taken as one processing area.

In a possible implementation manner of the first aspect, the contents in the first row and/or the first column of the table are added to each processing area.

In a possible implementation manner of the first aspect, the contents in the first row and/or the first column of the table are located before or after the contents in the added processing area;

there is a blank area between the contents in the first row and/or column of the table and the contents in the added processing area.

In a second aspect, the present application provides a device for detecting a bidding document, including:

the collecting unit is used for searching the received electronic bidding documents by using the technical words in the technical word library, extracting the content associated with the same technical word and/or the same group of technical words into a collecting group, wherein the content in each collecting group belongs to the same electronic bidding document;

the first duplicate checking processing unit is used for performing duplicate checking processing on the contents in any two gathering groups belonging to the same technical word and/or the same technical word group, and dividing the contents in the gathering groups into repeated contents and non-repeated contents;

the first area dividing unit is used for dividing areas of the non-repeated content by using paragraph marks to obtain a plurality of processing areas;

the first similarity processing unit is used for comparing the sequence of a plurality of processing areas belonging to one collection group with the similarity of the contents in the rest collection groups and calculating the similarity;

the second duplicate checking processing unit is used for dividing the processing area with the similarity meeting the requirement into repeated contents; and

and the result unit is used for giving the similarity of two aggregation groups belonging to the same technical word and/or the same group according to the repeated content.

In a third aspect, the present application provides a system for detecting a bid document, the system comprising:

one or more memories for storing instructions; and

one or more processors configured to invoke and execute the instructions from the memory, to perform the method as described in the first aspect and any possible implementation of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium comprising:

a program which, when executed by a processor, performs a method as described in the first aspect and any possible implementation of the first aspect.

In a fifth aspect, the present application provides a computer program product comprising program instructions which, when executed by a computing device, perform a method as described in the first aspect and any possible implementation manner of the first aspect.

In a sixth aspect, the present application provides a chip system comprising a processor for implementing the functions involved in the above aspects, e.g. generating, receiving, transmitting, or processing data and/or information involved in the above methods. The chip system can be composed of chips, and can also comprise chips and other discrete devices. In one possible design, the system on a chip also includes memory to hold the necessary program instructions and data. The processor and the memory may be decoupled, provided on different devices, respectively, connected by wire or wirelessly, or the processor and the memory may be coupled on the same device.

According to the bidding document detection method and system based on the natural semantic model, whether similar contents exist in the bidding document is determined by adopting an intelligent analysis means, and the similar contents are provided for staff to further analyze, so that the problems of large data quantity and low processing speed in the prior art are solved, and better directivity is achieved.

Drawings

FIG. 1 is a schematic block diagram of the steps of a method for detecting bidding documents based on natural semantic model.

Fig. 2 is a schematic diagram of a concept of using technical terms to obtain an aggregation group provided in the present application.

Fig. 3 is a schematic diagram of similarity comparison between a processing region and a comparative processing region.

Fig. 4 is a schematic diagram of similarity comparison between non-coincident text fields provided in the present application.

Fig. 5 is a schematic diagram of similarity comparison of a table provided in the present application.

Description of the embodiments

The technical solutions in the present application are described in further detail below with reference to the accompanying drawings.

The application discloses a bidding document detection method based on a natural semantic model, referring to fig. 1, the detection method comprises the following steps:

s101, searching a plurality of received electronic bidding documents by using technical words in a technical word bank, and extracting contents associated with the same technical word and/or the same group of technical words into an aggregation group, wherein the contents in each aggregation group belong to the same electronic bidding document;

s102, performing duplicate checking treatment on contents in any two aggregation groups belonging to the same technical words and/or the same groups, and dividing the contents in the aggregation groups into repeated contents and non-repeated contents;

s103, carrying out region division on the non-repeated content by using paragraph marks to obtain a plurality of processing regions;

s104, comparing the sequence of a plurality of processing areas belonging to one collection group with the similarity of the contents of the rest collection groups and calculating the similarity;

s105, dividing a processing area with the similarity meeting the requirement into repeated contents; and

and S106, giving the similarity of two collection groups belonging to the same technical term and/or the same technical term group according to the repeated content.

Firstly, it is required to explain that the bidding document detection method based on the natural semantic model disclosed by the application is applied to a server, the server can provide electronic bidding service, and in the bidding process, the bidding document is firstly sent to the server and the account is given downloading permission or viewing permission.

When the account checks the bidding document or downloads the bidding document, the account decides whether to participate in the bidding process or not, and uploads the bidding document after logging in the bidding process, the server can record basic information such as an IP address and a Mac address used in the bidding document uploading process, and can acquire portrait information of staff uploading the bidding document with basic information of the supplier if necessary.

After the bid document is uploaded, the server first analyzes basic information of the bid document, such as the IP address, mac address and provider basic information (such as contact name, phone, mailbox, company address, etc.), and if the information is repeated, for example, the Mac addresses of two bid delivery persons are the same, the bid document is marked. In addition, if the bid price is displayed in a regular decreasing or increasing mode, the bid file is marked.

And the marked bid files are sent to a manual auditing department for auditing, and the marked bid files are cancelled or marked as invalid after the auditing is finished.

After the above processing procedure, analysis of the remaining (i.e. untagged) bidding documents is started, specifically, in step S101, the received plurality of electronic bidding documents are searched by using the technical terms in the technical word bank, and the content related to the same and/or the same technical terms is extracted into a collection group, where the content in each collection group belongs to the same electronic bidding document, as shown in fig. 2.

The term refers herein to terms involved in bidding processes, which mainly relate to technical content (e.g., products, specifications, models, maintenance, service) and business content (information related to bidders, such as qualification, performance proof, etc.). These terms come from words commonly used in the industry or from bidding documents.

For example, the technical terms include open caisson foundation, foundation coefficient, foundation bearing capacity, composite foundation, rigid foundation, foundation treatment and the like. And technical nouns related to government office furniture purchasing and bidding include detection, installation and debugging, spare parts and the like.

In this embodiment, a technical word library of related industries is constructed in the bidding system, related technical nouns are extracted from the technical word library to search a plurality of electronic bidding documents in the same bidding project, for example, for bidding documents of a building project, a "foundation coefficient and foundation bearing capacity" in the technical word library are used as a group of technical words, a "sunk well foundation" is used as another group of technical words, and contents of the technical words belonging to each group are respectively extracted to a corresponding collection group. In addition, in one possible implementation manner, the constructed technical word library can be expanded at the same time by inputting technical words into the system.

By collecting the related content through the technical words, the subsequent duplication checking process can be more targeted, and the content related to the technical words has a larger association degree with the bid.

In a general duplication checking method, the ratio of the repeated content in the whole content is calculated, but the bidding document may duplicate some technologies and business terms in the bidding document, so that a great deal of duplication exists in the bidding document, and the judgment based on the method has high similarity, but the duplication is reasonable, obviously, the disadvantage of the method is that important content and non-important content cannot be distinguished, the repetition rate of the important content is high, the repetition rate of the non-important content is low, and when the occupation of the important content is low, the potential exists that the occupation of the repeated content in the whole content is calculated to be low but the purse string is actually present. Therefore, in the application, the processing manner of extracting the content by the technical terms is used, the content of the part of the technical terms is extracted from the bidding document, and then the similarity determination is carried out. The manner of extracting content from technical terms can also avoid bidding documents compiled based on the same bidding requirements, because these standard content or formatted content can be directly drawn out of the similarity determination range when setting technical terms. Meanwhile, related contents can be concentrated to be processed through technical words, so that the problem that the contents are substantially similar but cannot be judged to be similar due to the fact that the disordered sequences are used is avoided.

In some possible implementations, the sentence at the position where the technical word appears is extracted, here marked with a period (or other specific symbol) as the position, while the preceding sentence and the following sentence at that position are extracted, for a total of three sentences.

For the natural semantic model, the concrete explanation is as follows: in natural language, there are words and relationships between words, and words and relationships between words. For example, the words with similar meaning, anti-meaning and the same attribute can be divided into "word processing", "word processing" and "sentence processing", and the natural semantic model is based on "word processing" and "word processing", and the "word" and "word" are analyzed in combination with the context in "sentence" so as to improve the accuracy of obtaining the collection group by using the technical words.

In step S102, any two contents in the collection group belonging to the same technical term and/or the same technical term group are subjected to duplicate checking processing, and the contents in the collection group are divided into duplicate contents and non-duplicate contents. In the step, similarity analysis is carried out on the rest of the bidding documents, and each bidding document is compared with the rest of the bidding documents one by one in the analysis process to carry out duplicate checking treatment. If in the electronic bidding documents in the same bidding project, five collection groups belonging to the technical word groups of foundation coefficient and foundation bearing capacity exist, any one collection group of the five collection groups needs to be subjected to repeated searching processing with the remaining four collection groups, and repeated content and non-repeated content are obtained.

In step S103, the non-repetitive content is further divided into a plurality of processing areas using the paragraph marks. The purpose of the region division of non-duplicate content is to further calculate the similarity of content in any two grouped sets of technical terms belonging to the same and/or same group. Because there may be some content that is substantially repetitive among the non-repetitive content obtained in step S102 that is not found.

In some possible implementations, a line feed, blank line, first line indentation, or special characters and codes may be employed as paragraph marks. For example, using a line feed as a paragraph mark, dividing non-repetitive content into different areas based on the detected line feed, and obtaining a plurality of processing areas; also for example, leader indents are used as paragraph labels, i.e. non-repeating content is divided into a number of different regions based on detected leader indents.

In step S104, the order of the plurality of processing regions belonging to one cluster group is compared with the similarity of the contents in the remaining cluster group and the similarity is calculated. The content executed in this step is found to be substantially repetitive. In particular, each processing region in one collection group is compared in similarity with the contents in the remaining collection groups.

This way, the plurality of processing areas in the collection group are dispersed into each remaining collection group, which has an advantage of being able to find whether similar contents exist in the remaining bid files as much as possible. In practical application, a plurality of processing areas in the collection group can have similar contents with different bidding documents respectively, when any two bidding documents are subjected to similarity comparison, the situation that the similarity meets the requirement possibly occurs, but when samples are scattered into all the remaining bidding documents, the situation that the similar contents exist can be found more easily.

Then in step S105, the processing area with the similarity meeting the requirement is divided into repeated contents, and finally, the similarity of two grouped groups belonging to the same technical term and/or the same technical term is given according to the repeated contents, that is, the content in step S106.

Specifically, for the collection group in one bid document, if it appears similar to the contents in different bid documents, respectively, it can be considered that there are possibilities of the bid documents for the bid and the bid, because it is judged from the behavior that the similarity avoidance of two bid documents has a certain technical feasibility, but the similarity avoidance of multiple bid documents has difficulty, because it involves more complicated and cumbersome work, in order to avoid the problem, each bid document needs to be rewritten, and if the situation that each bid document needs to be rewritten, it means that the problem of the bid and the bid has disappeared from the technical perspective.

But from another aspect, if some enterprises frequently appear in the same bidding scene, the higher the frequency of occurrence, the more easily the enterprises are found, the content recorded in the scene can be recorded into an information base, and the content is provided for screening and using basic information before bidding.

In some possible implementations, comparing the order of the plurality of processing regions belonging to one of the collection groups to the similarity of the content in the remaining collection groups includes the steps of:

s201, using paragraph marks to divide the content in the residual collection group into areas to obtain a plurality of comparison processing areas; and

s202, comparing the sequence of the plurality of processing areas belonging to one collection group with all comparison processing areas in a similarity mode.

Specifically, the content in the remaining collection groups is divided into areas by using paragraph marks, so as to obtain a plurality of comparison processing areas, each comparison processing area comprises a plurality of characters, and then the similarity comparison is performed between the sequence of the plurality of processing areas belonging to one collection group and all the comparison processing areas, as shown in fig. 3.

The comparison mode can be regarded as similarity comparison of sentences, and the similarity comparison of sentences is advantageous in that higher processing efficiency and more accurate comparison results can be obtained. In the manner provided by the application, the sentence similarity comparison method is not limited to the position of the sentence, but the sentence is compared with the content in other collection groups after being independent.

In some possible implementations, further:

s301, counting the number of coincident characters in a processing area and a comparison processing area; and

s302, when the number of the overlapped characters is larger than or equal to a preset number value, the processing area is divided into repeated contents.

The content in step S301 to step S302 is that when the number of coincident characters in a processing area and a comparison processing area is greater than or equal to a set number, the whole processing area is divided into repeated content.

In some examples, counting the number of coincident text in one processing region and one contrast processing region further includes:

s401, dividing characters in a processing area and a comparison processing area into overlapping text fields and non-overlapping text fields; and

s402, calculating the similarity of non-coincident text fields between two coincident text fields by using semantic recognition, and dividing the non-coincident text fields with similar similarity compounding requirements into the coincident text fields.

Specifically, the text is divided into a coincident text field and a non-coincident text field, then the similarity of the non-coincident text field between the two coincident text fields is calculated by using semantic recognition, and finally the non-coincident text field required by similarity compounding is divided into the coincident text fields. The role of semantic recognition is for the case of text replacement to have similarity to what can be identified as being, as shown in fig. 4.

In some examples, for the table that appears, the processing is performed as follows:

s501, identifying a form in the electronic bidding document and extracting contents in the form; and

s502, taking the content in one cell in the table as a processing area.

Specifically, the contents in each cell in the table are treated as a single treatment area, and then treated in the manner described in the above. The method aims to avoid the phenomenon of avoiding duplicate checking when the characters are tabulated.

Further, referring to fig. 5, adding the contents of the first row and/or first column of the table to each processing region is more targeted because the first row (top) and first column (left-most) of the table often contain guiding information, for which the guiding information can be entered into each processing region, making the contents in the processing regions more targeted.

In some possible implementations, the content in the first row of the table and/or the first column of the table is located before or after the content in the added processing region, the latter being processed twice using both before and after.

In other possible implementations, a blank area exists between the content in the first row and/or the first column of the table and the content in the added processing area, and the blank area indicates that the content in the first row and/or the first column of the table is not consistent with the content in the added processing area, so that a larger search range can be obtained, and the length of the blank area can be set according to a specific experience value.

The application also provides a device for detecting the bidding documents, which comprises:

Further, the method further comprises the following steps:

the second area dividing unit is used for dividing the areas of the contents in the residual collection group by using paragraph marks to obtain a plurality of comparison processing areas; and

and the second similarity processing unit is used for comparing the sequence of the plurality of processing areas belonging to one grouping group with all the comparison processing areas in a similarity manner.

Further, the method further comprises the following steps:

the quantity counting unit is used for counting the quantity of coincident characters in a processing area and a comparison processing area; and

and the first repartitioning unit is used for dividing the processing area into repeated contents when the number of the overlapped characters is larger than or equal to a set number value.

Further, the method further comprises the following steps:

the field dividing unit is used for dividing the characters in the processing area and the comparison processing area into overlapping text fields and non-overlapping text fields; and

and the second subdivision unit is used for calculating the similarity of the non-coincident text fields between the two coincident text fields by using semantic recognition, and dividing the non-coincident text fields with similar similarity compound requirements into the coincident text fields.

Further, the method further comprises the following steps:

an identification unit for identifying the form in the electronic bidding document and extracting the content in the form; and

and the processing unit is used for taking the content in one cell in the table as one processing area.

Further, the contents of the first row and/or first column of the table are added to each processing region.

Further, the contents in the first row of the table and/or the first column of the table are located before or after the contents in the added processing area;

In one example, the unit in any of the above apparatuses may be one or more integrated circuits configured to implement the above methods, for example: one or more application specific integrated circuits (application specific integratedcircuit, ASIC), or one or more digital signal processors (digital signal processor, DSP), or one or more field programmable gate arrays (field programmable gate array, FPGA), or a combination of at least two of these integrated circuit forms.

For another example, when the units in the apparatus may be implemented in the form of a scheduler of processing elements, the processing elements may be general-purpose processors, such as a central processing unit (central processing unit, CPU) or other processor that may invoke the program. For another example, the units may be integrated together and implemented in the form of a system-on-a-chip (SOC).

Various objects such as various messages/information/devices/network elements/systems/devices/actions/operations/processes/concepts may be named in the present application, and it should be understood that these specific names do not constitute limitations on related objects, and that the named names may be changed according to the scenario, context, or usage habit, etc., and understanding of technical meaning of technical terms in the present application should be mainly determined from functions and technical effects that are embodied/performed in the technical solution.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It should also be understood that in various embodiments of the present application, first, second, etc. are merely intended to represent that multiple objects are different. For example, the first time window and the second time window are only intended to represent different time windows. Without any effect on the time window itself, the first, second, etc. mentioned above should not impose any limitation on the embodiments of the present application.

It is also to be understood that in the various embodiments of the application, terms and/or descriptions of the various embodiments are consistent and may be referenced to one another in the absence of a particular explanation or logic conflict, and that the features of the various embodiments may be combined to form new embodiments in accordance with their inherent logic relationships.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a computer-readable storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned computer-readable storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The application also provides a system for detecting the bidding document, which comprises:

one or more memories for storing instructions; and

one or more processors configured to invoke and execute the instructions from the memory to perform the method as described above.

The present application also provides a computer program product comprising instructions that, when executed, cause a system for detecting a bid document to perform operations corresponding to the system for detecting a bid document of the above-described method.

The present application also provides a chip system comprising a processor for implementing the functions involved in the above, e.g. generating, receiving, transmitting, or processing data and/or information involved in the above method.

The chip system can be composed of chips, and can also comprise chips and other discrete devices.

The processor referred to in any of the foregoing may be a CPU, microprocessor, ASIC, or integrated circuit that performs one or more of the procedures for controlling the transmission of feedback information described above.

In one possible design, the system on a chip also includes memory to hold the necessary program instructions and data. The processor and the memory may be decoupled, and disposed on different devices, respectively, and connected by wired or wireless means, so as to support the chip system to implement the various functions in the foregoing embodiments. In the alternative, the processor and the memory may be coupled to the same device.

Optionally, the computer instructions are stored in a memory.

Alternatively, the memory may be a storage unit in the chip, such as a register, a cache, etc., and the memory may also be a storage unit in the terminal located outside the chip, such as a ROM or other type of static storage device, a RAM, etc., that may store static information and instructions.

It is to be understood that the memory in this application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.

The nonvolatile memory may be a ROM, a Programmable ROM (PROM), an Erasable Programmable ROM (EPROM), an electrically erasable programmable EPROM (EEPROM), or a flash memory.

The volatile memory may be RAM, which acts as external cache. There are many different types of RAM, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM.

The embodiments of the present invention are all preferred embodiments of the present application, and are not intended to limit the scope of the present application in this way, therefore: all equivalent changes in structure, shape and principle of this application should be covered in the protection scope of this application.

Claims

1. A bidding document detection method based on a natural semantic model is characterized by comprising the following steps:

comparing the sequence of a plurality of processing areas belonging to one collection group with the similarity of the contents belonging to the rest collection groups belonging to the same technical words and/or the same technical words, and calculating the similarity;

2. The method of claim 1, wherein comparing the order of the plurality of processing regions belonging to one collection group with the content of the remaining collection group belonging to the same and/or same technical term group comprises:

dividing the content in other collection groups into areas by using paragraph marks to obtain a plurality of comparison processing areas; and

3. The natural semantic model based bid document detection method of claim 2, further comprising:

4. The method for detecting a bidding document based on natural semantic model of claim 3, wherein counting the number of coincident words in a processing region and a contrast processing region further comprises:

5. The natural semantic model based bidding document detection method according to any one of claims 1 to 4, further comprising:

the contents in one cell in the table are taken as one processing area.

6. The method for detecting bidding documents based on natural semantic model of claim 5, wherein the contents of the first row of the form and/or the first column of the form are added to each processing region.

7. The natural semantic model based bidding document detection method of claim 6, wherein the content in the first row of the form and/or the first column of the form is located before or after the content in the joining processing region;

8. A device for detecting a bidding document, comprising:

the first similarity processing unit is used for comparing the sequence of a plurality of processing areas belonging to one grouping group with the content belonging to the rest grouping group in the same technical word and/or the same group and calculating the similarity;

9. A system for detecting a bid document, the system comprising:

one or more memories for storing instructions; and

one or more processors to invoke and execute the instructions from the memory to perform the method of any of claims 1 to 7.

10. A computer-readable storage medium, the computer-readable storage medium comprising:

program which, when executed by a processor, performs a method according to any one of claims 1 to 7.