CN115934880A - Construction of project cost document database and search method of project cost document - Google Patents
Construction of project cost document database and search method of project cost document Download PDFInfo
- Publication number
- CN115934880A CN115934880A CN202211360402.3A CN202211360402A CN115934880A CN 115934880 A CN115934880 A CN 115934880A CN 202211360402 A CN202211360402 A CN 202211360402A CN 115934880 A CN115934880 A CN 115934880A
- Authority
- CN
- China
- Prior art keywords
- project cost
- document
- target
- cost
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010276 construction Methods 0.000 title claims abstract description 78
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000010586 diagram Methods 0.000 claims description 18
- 230000015654 memory Effects 0.000 claims description 18
- 238000000605 extraction Methods 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 8
- 238000004140 cleaning Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 3
- 238000007619 statistical method Methods 0.000 abstract description 9
- 239000002699 waste material Substances 0.000 abstract description 9
- 238000012216 screening Methods 0.000 abstract description 2
- 238000007405 data analysis Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a construction method of a project cost document database and a retrieval method of the project cost document, the construction method of the project cost document database comprises the steps of extracting a target key and a target value of each project cost document in a project cost document data set, constructing a first target key value pair list, storing the first target key value pair list and the corresponding project cost document in the database in a correlation mode, constructing the project cost document database, converting unstructured data corresponding to the project cost document into structured data, facilitating subsequent retrieval of the project cost document according to information represented by the first target key value pair list, screening interested documents, facilitating statistical analysis of the project cost document, providing a large amount of data support, and solving the problems of time waste and resource waste caused by manual searching of the project cost document in the prior art.
Description
Technical Field
The invention relates to the technical field of big data analysis, in particular to a construction method of a project cost document database and a search method of a project cost document.
Background
In the related art, when the construction cost of a building project is evaluated, past project cost documents are generally consulted and referred. In a large amount of past engineering cost documents, interested target documents are searched, the traditional method needs to manually open the engineering cost documents for browsing, and needs to spend a large amount of time for searching the interested documents, so that the process of looking up data consumes long time, and a large amount of human resources are wasted.
Disclosure of Invention
Therefore, the technical problem to be solved by the invention is to overcome the defects of long time consumption and high labor cost existing in the prior art of manually looking up the project cost, thereby providing a project cost document database construction and project cost document retrieval method.
According to a first aspect, an embodiment of the present invention discloses a construction method for a project cost document database, including: acquiring a project cost document data set; extracting a target key and a target value of each project cost document according to a preset key and a value set, wherein the key in the preset key and the value set represents a professional cost concept vocabulary and a value represents a cost description vocabulary similar to the professional cost concept vocabulary; constructing a first target key-value pair list according to the target key and the target value; and the first target key value pair list and the corresponding project cost document are stored in a database in a correlation mode to obtain a project cost document database.
Optionally, the method further comprises: acquiring a plurality of professional cost concept vocabularies and cost description vocabularies similar to each professional cost concept vocabulary; and constructing a preset key value pair set represented in the form of a prefix tree according to the plurality of professional cost concept vocabularies and the cost description vocabularies corresponding to each professional cost concept vocabulary.
Optionally, before the first target key-value pair list and the corresponding project cost document are stored in the database in association to obtain a project cost document database, the method further includes: and performing data cleaning and standardization processing on the first target key value pair list.
Optionally, the project cost document dataset includes a project cost form document, and the method includes:
extracting a target form keyword of each project cost form document according to a preset form extraction rule;
matching the target table key words with the preset keys and the value set;
determining a second target key value pair list corresponding to the target table according to the matching result;
and storing the second target key value pair list and a target table in a corresponding project cost table document in a database in an associated mode.
Optionally, the method further comprises: extracting a plurality of the target table data; generating display diagram data corresponding to each target table according to the target table data; and associating the display diagram data with the corresponding target table in a database.
According to a second aspect, the embodiment of the invention discloses a method for retrieving a project cost document, which comprises the following steps: acquiring keyword data to be retrieved; matching the keyword data to be retrieved with a target key value pair list in a project cost document database, wherein the project cost document database is obtained by the project cost document database construction method according to the first aspect or any optional embodiment of the first aspect; and outputting the project cost document corresponding to the keyword data to be retrieved according to the matching result.
Optionally, the engineering cost document is table data, and the method further includes: when a display request for the table data is received, extracting display diagram data corresponding to the table data from a database; and displaying the display diagram data.
According to a third aspect, an embodiment of the present invention further discloses an apparatus for constructing a project cost document database, including: the first acquisition module is used for acquiring a project cost document data set; the extraction module is used for extracting a target key and a target value of each project cost document according to preset keys and value sets, wherein the keys in the preset keys and the value sets represent professional cost concept vocabularies, and the values represent cost description vocabularies similar to the professional cost concept vocabularies; a first construction module for constructing a first target key-value pair list according to the target key and the target value; and the association storage module is used for associating and storing the first target key value pair list and the corresponding project cost document into a database to obtain a project cost document database.
According to a fourth aspect, an embodiment of the present invention further discloses a project cost document retrieval apparatus, including: the second acquisition module is used for acquiring the keyword data to be retrieved; the retrieval module is used for matching the keyword data to be retrieved with a target key value pair list in a project cost document database, wherein the project cost document database is obtained by the project cost document database construction method as described in the first aspect or any optional embodiment of the first aspect; and the output module is used for outputting the project cost document corresponding to the keyword data to be retrieved according to the matching result.
According to a fifth aspect, an embodiment of the present invention further discloses an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the construction cost document database construction method according to the first aspect or any one of the alternative embodiments of the first aspect, or to perform the construction cost document retrieval method according to any one of the alternative embodiments of the second aspect and the second aspect.
According to a sixth aspect, the present invention further discloses a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the construction cost document database construction method according to the first aspect or any one of the optional embodiments of the first aspect, or implements the construction cost document retrieval method according to any one of the optional embodiments of the second aspect and the second aspect.
The technical scheme of the invention has the following advantages:
the invention provides a construction method/device of a project cost document database, which comprises the following steps: acquiring a project cost document data set; extracting a target key and a target value of each project cost document according to a preset key and a preset value set, wherein the key in the preset key and the preset value set represents a professional cost concept vocabulary and a value represents a cost description vocabulary similar to the professional cost concept vocabulary; constructing a first target key-value pair list according to the target key and the target value; and the first target key value pair list and the corresponding project cost document are stored in a database in a correlation mode to obtain a project cost document database. The method of the invention extracts the target key and the target value of each project cost document in the project cost document data set to construct a first target key value pair list, stores the first target key value pair list and the corresponding project cost document in a database in a correlation manner to construct a project cost document database, converts unstructured data corresponding to the project cost document into structured data, facilitates subsequent retrieval of the project cost document according to the information represented by the first target key value pair list, screens interested documents, facilitates statistical analysis of the project cost document, provides a large amount of data support, and solves the problem of time waste and resource waste caused by manual search of the project cost document in the prior art.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flowchart showing a concrete example of a construction cost document database construction method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an example of a construction method of a construction cost document database according to an embodiment of the present invention;
FIG. 3 is a flowchart of a specific example of a construction cost document retrieval method according to an embodiment of the present invention;
FIG. 4 is a schematic block diagram of a specific example of a construction cost document database construction apparatus according to an embodiment of the present invention;
FIG. 5 is a schematic block diagram of a specific example of a construction cost document retrieval apparatus according to an embodiment of the present invention;
fig. 6 is a diagram of a specific example of an electronic device in an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; the two elements may be directly connected or indirectly connected through an intermediate medium, or may be communicated with each other inside the two elements, or may be wirelessly connected or wired connected. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The embodiment of the invention discloses a construction method of a project cost document database, which can be applied to any construction cost retrieval platform, wherein the construction cost retrieval platform in the embodiment of the invention is an intelligent analysis system of project recommendation, as shown in figure 1, the method comprises the following steps:
For example, the project cost document data set may include a plurality of project cost documents, where the project cost documents include some project basic information and evaluation information in the project cost evaluation, in this embodiment, the project cost documents may be project cost documents in the building field, and the file formats of the project cost documents may include, but are not limited to, doc files, docx files, and pdf files.
And 102, extracting a target key and a target value of each project cost document according to preset keys and value sets, wherein the keys in the preset keys and value sets represent professional cost concept vocabularies, and values represent cost description vocabularies similar to the professional cost concept vocabularies.
For example, the preset key and the set of values may include, but are not limited to, a "unit name", "consignment unit", and "owner unit", and the value may be used to represent a concept vocabulary of a professional construction cost in the engineering construction cost field, and the value may be used to represent a construction cost description vocabulary similar to the concept vocabulary of the professional construction cost. In the embodiment of the application, each project cost document can be matched in the preset key and the value set, and the corresponding key and the corresponding value are read after the matching is successful. In the embodiment of the present application, key and value extraction (key-value pair extraction) may be performed according to the following two rules: (1) "key +: the + value "form, typical features have colons, such as" compile time: two good components are one seven year and six month. (2) The "key + \ n + value" form, typical features have line feeds, such as "project title \ n engineering technology profession college student activity center (student dormitory, activity studios) project". The retrieved keys and values may be filtered using a predetermined filtering rule to determine the key and value of interest, which are the target key and target value. For example, when the preset filtering rule is to filter according to "compiling time", "compiling unit" and "project location", the corresponding value is filtered from the read key and value.
Illustratively, a first target key-value pair list is constructed according to the target keys and the corresponding target values extracted from each engineering cost document, and the first target key-value pair list comprises key-value pairs consisting of a plurality of target keys and corresponding target values.
And 104, storing the first target key value pair list and the corresponding project cost document in a database in a correlation manner to obtain a project cost document database.
By way of example, in embodiments of the present application, the database may include, but is not limited to, mysql (relational database), distributed file database, and es (a distributed, high-expansion, high-real-time search and data analysis engine) database.
The invention provides a construction cost document database construction method, which comprises the steps of extracting a target key and a target value from each construction cost document in a construction cost document data set, constructing a first target key value pair list, storing the first target key value pair list and the corresponding construction cost document in a database in a correlation manner, constructing a construction cost document database, converting unstructured data corresponding to the construction cost document into structured data, facilitating subsequent retrieval of the construction cost document according to information represented by the first target key value pair list, screening interested documents, providing a large amount of data support while facilitating statistical analysis of the construction cost document, and solving the problem of time and resource waste caused by manual searching of the construction cost document in the prior art.
As an optional embodiment of the invention, the method further comprises: and acquiring a plurality of professional cost concept vocabularies and cost description vocabularies similar to each professional cost concept vocabulary. In an exemplary embodiment of the present application, each professional construction cost concept vocabulary and the corresponding similar construction cost description vocabulary may be determined by a text similarity calculation method, or a plurality of professional construction cost concept vocabularies and construction cost descriptions similar to each of the professional construction cost concept vocabularies may be manually maintained to obtain a possibility description vocabulary library, where the possibility description vocabulary library includes a large number of the plurality of professional construction cost concept vocabularies and construction cost description vocabularies similar to each of the professional construction cost concept vocabularies, and the plurality of professional construction cost concept vocabularies and the similarity description vocabulary of each of the professional construction cost concept vocabularies may be obtained from the possibility description vocabulary library.
And constructing a preset key value pair set characterized in the form of a prefix tree according to the plurality of professional cost concept vocabularies and the cost/description vocabularies similar to each professional cost concept vocabulary. Illustratively, the prefix tree is a multi-branch tree structure for quick retrieval, which utilizes the common prefix of character strings to reduce the query time, and the core idea is space time conversion, which is often used by search engines for text word frequency statistics. And constructing a prefix tree by a preset prefix tree construction method for a plurality of professional cost concept vocabularies and cost description vocabularies similar to each professional cost concept vocabulary. The preset prefix tree construction method may be to construct a prefix tree through a map set. In the embodiment of the application, a prefix tree component can be constructed to construct a plurality of professional cost concept vocabularies and cost description vocabularies similar to each professional cost concept vocabulary.
As an optional implementation manner of the present invention, before the associating and storing the first target key-value pair list and the corresponding project cost document in the database to obtain the project cost document database, the method further includes: and performing data cleaning and standardization processing on the first target key value pair list.
For example, in the embodiment of the present application, data cleaning and normalization processing may be performed on the first target key-value list according to preset data cleaning and normalization rules, for example, if the key value pair key value is "location of the project", the value is "this project is located in school C of institute of engineering and technology, university, B, province, C", and the key configuration in the text configuration rule "is configured such that the data type must be a character string type and is to be normalized to a format of city, province, B city, C, and the value corresponding to the" location of the project "is" location-C, province-B city-C.
As an alternative embodiment of the present invention, the project cost document dataset includes a project cost table document; the method comprises the following steps:
and extracting the target form key words of each project cost form document according to a preset form extraction rule. For example, in the embodiment of the present application, investment estimation, main technical and economic indicators, and the like are important contents in engineering cost evaluation, and most of the contents exist in a table form in an engineering cost document. The table information is extracted, one is to visually present the table data, and the other is to support the subsequent statistical analysis, such as project cost composition analysis. The preset table extraction rules can be dynamically configured in a manual maintenance mode, and the extraction rules include: extracting 'investment estimation table' in the project cost table document, wherein the 'investment estimation table' is positioned under a directory node of which the title in the project cost table document contains a 'investment estimation' keyword, and the table name of the 'investment estimation table' needs to contain a keyword 'investment estimation', and the like. Which form in the project cost form document and which columns of data in the obtained form are extracted can be configured in a preset form extraction rule. For example, when the preset table extraction rule is configured with "specified title + allowable title level", the table list in the specified title range is obtained, and if the specified title is not configured, the table list in the entire document range is obtained. For example, a "main technical and economic index table" in the engineering cost table document is extracted, the table is located under a directory node of a document title containing keywords of "main technical and economic index", "main economic technical index" and "main technical index", the allowed title levels are a secondary title and a tertiary title, when the title levels are matched, the corresponding table is extracted, and if the title levels are not right, the corresponding table is skipped, and the table is not extracted. In the preset table extraction rule, if the 'appointed table name' is configured, the tables which do not meet the condition are filtered, and if the 'table head essential field' is configured, whether the table head of the table contains all essential fields is checked, and the tables which do not meet the condition are filtered.
And matching the target table key words with the preset keys and the value set. For example, matching the target table keywords with the preset keys and the value set may determine the key value pairs corresponding to the target table.
And determining a second target key value pair list corresponding to the target table according to the matching result. Illustratively, a second list of target key-value pairs is generated from the key-value pairs matched to the target table.
And storing the second target key value pair list and a target table in a corresponding project cost table document in a database in an associated mode. The second list of target key-value pairs is illustratively stored in association with the target table in the database to facilitate subsequent retrieval of the target table according to the second list of target key-value pairs.
As an optional embodiment of the present invention, the method further comprises: extracting a plurality of the target table data; generating display diagram data corresponding to each target table according to the target table data; and associating the display diagram data with the corresponding target table in a database.
Illustratively, the display diagram data can include, but is not limited to, statistical analysis diagram data of the target table data, the corresponding display diagram is generated according to the target table data and is stored in the database in a correlation manner with the target table, so that the target table can be visually checked according to the display diagram, and the checking efficiency is improved.
In the embodiment of the present application, a functional block diagram of the project recommendation intelligent parsing system may be as shown in fig. 2, and has functions of retrieving, listing, and viewing, and the doc file, the docx file, and the pdf file may be parsed according to preset text parsing rules and table parsing rules, and the databases may include mysql (relational database), a distributed file database, and an es (distributed, high-expansion, high-real-time search and data analysis engine) database.
The embodiment of the invention also discloses a method for searching the project cost document, which comprises the following steps as shown in figure 3:
and 203, outputting the project cost document corresponding to the keyword data to be retrieved according to the matching result.
The project cost document retrieval method provided by the invention can be used for retrieving in the project cost document database according to the keyword to be retrieved, can quickly screen out the interested document, is convenient for statistical analysis of the project cost document, provides a large amount of data support, and solves the problem of time and resource waste caused by manual search of the project cost document in the prior art.
As an optional embodiment of the present invention, the project cost document is table data, and the method further includes:
when a display request for the table data is received, extracting display graph data corresponding to the table data from a database; and displaying the display diagram data.
Illustratively, when a display request for the retrieved form data is received, the display graph data corresponding to the form data is displayed, so as to facilitate statistical analysis.
The embodiment of the invention also discloses a construction cost document database construction device, as shown in fig. 4, the device comprises:
a first obtaining module 301, configured to obtain a project cost document data set;
an extraction module 302, configured to perform target key and target value extraction on each engineering cost document according to preset keys and value sets, where the keys in the preset keys and value sets represent professional cost concept vocabularies and represent cost description vocabularies similar to the professional cost concept vocabularies;
a first construction module 303, configured to construct a first target key-value pair list according to the target key and the target value;
and the association storage module 304 is configured to associate and store the first target key-value pair list and the corresponding project cost document in a database to obtain a project cost document database.
The construction device of the project cost document database provided by the invention constructs the first target key value pair list by extracting the target key and the target value of each project cost document in the project cost document data set, stores the first target key value pair list and the corresponding project cost document in the database in a correlation manner, constructs the project cost document database, converts the non-structural data corresponding to the project cost document into the structural data, facilitates subsequent retrieval of the project cost document according to the information represented by the first target key value pair list, screens interested documents, provides a large amount of data support while facilitating statistical analysis of the project cost document, and solves the problems of time waste and resource waste caused by manual search of the project cost document in the prior art.
As an optional embodiment of the present invention, the apparatus further comprises:
the third acquisition module is used for acquiring a plurality of professional cost concept vocabularies and cost description vocabularies similar to each professional cost concept vocabulary;
and the second construction module is used for constructing a preset key value pair set represented in a prefix tree form according to the plurality of professional cost concept vocabularies and the cost description vocabularies corresponding to each professional cost concept vocabulary.
As an optional embodiment of the present invention, the apparatus further comprises:
and the processing module is used for carrying out data cleaning and standardization processing on the first target key value pair list.
As an alternative embodiment of the present invention, the project cost document dataset includes a project cost table document; the device comprises: the extraction submodule is used for extracting a target form keyword of each project cost form document according to a preset form extraction rule; the matching submodule is used for matching the target table key words with the preset keys and the value set;
the determining submodule is used for determining a second target key value pair list corresponding to the target table according to the matching result;
and the association storage submodule is used for associating and storing the second target key value pair list and a target table in the corresponding project cost table document in a database.
The embodiment of the invention also discloses a project cost document retrieval device, as shown in fig. 5, the device comprises:
a second obtaining module 501, configured to obtain keyword data to be retrieved;
a retrieval module 502, configured to match the keyword data to be retrieved with a target key value pair list in a project cost document database, where the project cost document database is obtained by the project cost document database construction method in the foregoing embodiment;
and an output module 503, configured to output, according to the matching result, the engineering cost document corresponding to the keyword data to be retrieved.
The project cost document retrieval device provided by the invention can be used for retrieving in the project cost document database according to the keyword to be retrieved, can quickly screen out interested documents, is convenient for statistical analysis of the project cost documents, simultaneously provides a large amount of data support, and solves the problem of time and resource waste caused by manual project cost document lookup in the prior art.
As an optional embodiment of the present invention, the project cost document is table data, and the apparatus further includes:
the receiving module is used for extracting display graph data corresponding to the table data from a database when a display request for the table data is received;
and the display module is used for displaying the display image data.
An embodiment of the present invention further provides an electronic device, as shown in fig. 6, the electronic device may include a processor 401 and a memory 402, where the processor 401 and the memory 402 may be connected by a bus or in another manner, and fig. 6 illustrates an example of a connection by a bus.
The memory 402, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the construction method of the project cost document database or the search method of the project cost document in the embodiment of the present invention. The processor 401 executes various functional applications and data processing of the processor by running the non-transitory software programs, instructions and modules stored in the memory 402, that is, implementing the construction cost document database construction method in the above method embodiment or implementing the construction cost document retrieval method in the above method embodiment.
The memory 402 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 401, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 402 may optionally include memory located remotely from processor 401, which may be connected to processor 401 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory 402 and, when executed by the processor 401, perform a project cost document database construction method as in the embodiment shown in FIG. 1, or perform a project cost document retrieval method as in the embodiment shown in FIG. 2.
The details of the electronic device may be understood with reference to the corresponding descriptions and effects in the embodiments shown in fig. 1 or fig. 2, and are not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.
Claims (11)
1. A construction method of a project cost document database is characterized by comprising the following steps:
acquiring a project cost document data set;
extracting a target key and a target value of each project cost document according to a preset key and a value set, wherein the key in the preset key and the value set represents a professional cost concept vocabulary and a value represents a cost description vocabulary similar to the professional cost concept vocabulary;
constructing a first target key-value pair list according to the target key and the target value;
and the first target key value pair list and the corresponding project cost document are stored in a database in a correlation mode to obtain a project cost document database.
2. The method of claim 1, further comprising:
acquiring a plurality of professional cost concept vocabularies and cost description vocabularies similar to each professional cost concept vocabulary;
and constructing a preset key value pair set represented in the form of a prefix tree according to the plurality of professional cost concept vocabularies and the cost description vocabularies corresponding to each professional cost concept vocabulary.
3. The method of claim 1, wherein prior to storing the first list of target key-value pairs in association with the corresponding project cost document in a database to obtain a project cost document database, the method further comprises:
and performing data cleaning and standardization processing on the first target key value pair list.
4. The method of claim 1, wherein the project cost document dataset comprises a project cost table document; the method comprises the following steps:
extracting a target form keyword of each project cost form document according to a preset form extraction rule;
matching the target table key words with the preset keys and the value set;
determining a second target key value pair list corresponding to the target table according to the matching result;
and storing the second target key value pair list and a target table in a corresponding project cost table document in a database in an associated mode.
5. The method of claim 4, further comprising:
extracting a plurality of the target table data;
generating display graph data corresponding to each target form according to the target form data;
and associating the display diagram data with the corresponding target table in a database.
6. A project cost document retrieval method is characterized by comprising the following steps:
acquiring keyword data to be retrieved;
matching the keyword data to be retrieved with a target key value pair list in a project cost document database, the project cost document database being obtained by the project cost document database construction method according to any one of claims 1 to 5;
and outputting the project cost document corresponding to the keyword data to be retrieved according to the matching result.
7. The method of claim 6, wherein the project cost document is tabular data, the method further comprising:
when a display request for the table data is received, extracting display graph data corresponding to the table data from a database;
and displaying the display diagram data.
8. An engineering cost document database construction device, comprising:
the first acquisition module is used for acquiring a project cost document data set;
the extraction module is used for extracting a target key and a target value of each project cost document according to preset keys and value sets, wherein the keys in the preset keys and the value sets represent professional cost concept vocabularies, and the values represent cost description vocabularies similar to the professional cost concept vocabularies;
a first construction module for constructing a first target key-value pair list according to the target key and the target value;
and the association storage module is used for associating and storing the first target key value pair list and the corresponding project cost document into a database to obtain a project cost document database and obtain a project cost document database.
9. An apparatus for retrieving a project cost document, comprising:
the second acquisition module is used for acquiring the keyword data to be retrieved;
a retrieval module, configured to match the keyword data to be retrieved with a target key value pair list in a project cost document database, where the project cost document database is obtained by the project cost document database construction method according to any one of claims 1 to 5;
and the output module is used for outputting the project cost document corresponding to the keyword data to be retrieved according to the matching result.
10. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the construction cost document database construction method according to any one of claims 1-5 or to perform the construction cost document retrieval method according to claim 6 or 7.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the project cost document database construction method according to any one of claims 1 to 5, or carries out the project cost document retrieval method according to claim 6 or 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211360402.3A CN115934880A (en) | 2022-10-31 | 2022-10-31 | Construction of project cost document database and search method of project cost document |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211360402.3A CN115934880A (en) | 2022-10-31 | 2022-10-31 | Construction of project cost document database and search method of project cost document |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115934880A true CN115934880A (en) | 2023-04-07 |
Family
ID=86698389
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211360402.3A Pending CN115934880A (en) | 2022-10-31 | 2022-10-31 | Construction of project cost document database and search method of project cost document |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115934880A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117235010A (en) * | 2023-09-27 | 2023-12-15 | 浙江招天下招投标交易平台有限公司 | Bid document chart title classification management method and system |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102156712A (en) * | 2011-03-08 | 2011-08-17 | 国网信息通信有限公司 | Power information retrieval method and power information retrieval system based on cloud storage |
CN106294595A (en) * | 2016-07-29 | 2017-01-04 | 海尔优家智能科技(北京)有限公司 | A kind of document storage, search method and device |
CN109871473A (en) * | 2019-02-01 | 2019-06-11 | 上海核工程研究设计院有限公司 | A kind of method of pair of project file and Database full-text search document |
US20190266158A1 (en) * | 2018-02-27 | 2019-08-29 | Innoplexus Ag | System and method for optimizing search query to retreive set of documents |
CN110399339A (en) * | 2019-06-18 | 2019-11-01 | 平安科技(深圳)有限公司 | File classifying method, device, equipment and the storage medium of knowledge base management system |
CN111177306A (en) * | 2020-01-02 | 2020-05-19 | 中国银行股份有限公司 | Data processing method and device |
CN112541338A (en) * | 2020-12-10 | 2021-03-23 | 平安科技(深圳)有限公司 | Similar text matching method and device, electronic equipment and computer storage medium |
CN113505217A (en) * | 2021-07-29 | 2021-10-15 | 永道科技有限公司 | Method and system for realizing rapid formation of project cost database based on big data |
-
2022
- 2022-10-31 CN CN202211360402.3A patent/CN115934880A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102156712A (en) * | 2011-03-08 | 2011-08-17 | 国网信息通信有限公司 | Power information retrieval method and power information retrieval system based on cloud storage |
CN106294595A (en) * | 2016-07-29 | 2017-01-04 | 海尔优家智能科技(北京)有限公司 | A kind of document storage, search method and device |
US20190266158A1 (en) * | 2018-02-27 | 2019-08-29 | Innoplexus Ag | System and method for optimizing search query to retreive set of documents |
CN109871473A (en) * | 2019-02-01 | 2019-06-11 | 上海核工程研究设计院有限公司 | A kind of method of pair of project file and Database full-text search document |
CN110399339A (en) * | 2019-06-18 | 2019-11-01 | 平安科技(深圳)有限公司 | File classifying method, device, equipment and the storage medium of knowledge base management system |
CN111177306A (en) * | 2020-01-02 | 2020-05-19 | 中国银行股份有限公司 | Data processing method and device |
CN112541338A (en) * | 2020-12-10 | 2021-03-23 | 平安科技(深圳)有限公司 | Similar text matching method and device, electronic equipment and computer storage medium |
CN113505217A (en) * | 2021-07-29 | 2021-10-15 | 永道科技有限公司 | Method and system for realizing rapid formation of project cost database based on big data |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117235010A (en) * | 2023-09-27 | 2023-12-15 | 浙江招天下招投标交易平台有限公司 | Bid document chart title classification management method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11120059B2 (en) | Conversational query answering system | |
CN110321408B (en) | Searching method and device based on knowledge graph, computer equipment and storage medium | |
US8868556B2 (en) | Method and device for tagging a document | |
US20160275196A1 (en) | Semantic search apparatus and method using mobile terminal | |
US20150310129A1 (en) | Method of managing database, management computer and storage medium | |
US20150206101A1 (en) | System for determining infringement of copyright based on the text reference point and method thereof | |
CN104679783A (en) | Network searching method and device | |
CN109522396B (en) | Knowledge processing method and system for national defense science and technology field | |
CN115934880A (en) | Construction of project cost document database and search method of project cost document | |
JP5844824B2 (en) | SPARQL query optimization method | |
CN114117242A (en) | Data query method and device, computer equipment and storage medium | |
US11216894B2 (en) | Image-based semantic accommodation search | |
CN106777140B (en) | Method and device for searching unstructured document | |
CN112862334A (en) | Index system construction method and device based on syntax analysis tree and computer equipment | |
US11507593B2 (en) | System and method for generating queryeable structured document from an unstructured document using machine learning | |
CN114691845A (en) | Semantic search method and device, electronic equipment, storage medium and product | |
CN116126918A (en) | Data generation method, information screening method, device and medium | |
JP2008026964A (en) | Retrieval processor and program | |
Faiz et al. | OD2WD: From Open Data to Wikidata through Patterns. | |
CN113407678A (en) | Knowledge graph construction method, device and equipment | |
US10360243B2 (en) | Storage medium, information presentation method, and information presentation apparatus | |
KR102605931B1 (en) | Method for processing structured data and unstructured data on a plurality of databases and data processing platform providing the method | |
CN103995849B (en) | Event tracing method and system | |
KR20070072929A (en) | Data processing system and method | |
CN118035381A (en) | Patent document searching method and device, storage medium and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230407 |