CN114186964A

CN114186964A - Active engineering file implementation collecting method based on information difference comparison

Info

Publication number: CN114186964A
Application number: CN202111469735.5A
Authority: CN
Inventors: 甘文胜
Original assignee: Wuhan Dazheng Technology Co ltd
Current assignee: Wuhan Dazheng Technology Co ltd
Priority date: 2021-12-03
Filing date: 2021-12-03
Publication date: 2022-03-15

Abstract

The invention relates to the technical field of engineering file management, in particular to an active engineering file implementation collection method based on information difference comparison. The method performs the steps of: step 1: acquiring engineering archive data, wherein the data volume of the acquired engineering archive data is greater than a set threshold value, and acquiring mark data from the engineering archive data; the mark data is other data except the engineering file content used for representing the engineering file; step 2: and determining a set boundary by using a preset boundary model based on the acquired mark data to obtain a plurality of set boundaries. The engineering archive data are collected uniformly, the concept of the boundary is used, the engineering archive data are classified after being divided, so that redundant parts are obtained, the redundant parts are processed independently, the management efficiency of the engineering data is improved, the engineering data are sorted in an index mode, and the utilization efficiency of the data is improved.

Description

Active engineering file implementation collecting method based on information difference comparison

Technical Field

The invention belongs to the technical field of remote sensing, and particularly relates to an active engineering file collection method based on information difference comparison.

Background

BlM, its application in the field of construction becomes a leap-over revolution following 20-CAD technology, and its application will bring corresponding impact and challenge to archive management. Therefore, it should attract attention as a file industry for information management.

First, the archival personnel are familiar with BIM knowledge, understand the relevant specifications, track the relevant progress, and analyze the problems that may arise therein. If the BIM visualization mode is adopted, the traditional two-dimensional construction drawing may not be generated in the project implementation process, and then archives need to use relevant software to check relevant drawings and construction information; if the traditional two-dimensional construction drawing is not available, whether a paper completion drawing needs to be separately produced specially for filing or not; if a BIM completion model is directly used as a completion drawing, how to make the problems of electronic authentication and legal liability; if a traditional two-dimensional completion map is not made, the BIM information is used as shared information, is important composition information for smart city construction, belongs to a traditional city construction archive or a smart city cloud data information center and the like, and is worthy of attention of a file department.

When project archive management is performed based on the BIM technology, the management efficiency is reduced due to too large data volume caused by a large number of project archives which are often processed, and this situation is generally caused by two reasons: firstly, the storage efficiency is reduced due to redundant data in a plurality of engineering files, and when the engineering files are called, the utilization rate of the engineering files is reduced after the engineering files are obtained due to the reduction of the storage efficiency; secondly, because there is often a correlation between engineering files, if the files are directly accessed, rather than the files are accessed in a correlated manner, the system needs to perform full-disk search and call again each time the target file is acquired, which results in reduced efficiency.

Patent document CN2008200508254 discloses a construction project archive management system. The utility model discloses a, a archives storage server for storing engineering archives, a rule server of filing for storing archives range rule of filing, a archives input device for according to archives range rule to archives storage server input engineering archives, an acceptance rule server for storing archives acceptance rule, an archives acceptance device for accepting the engineering archives of archives storage server storage according to the acceptance rule. The engineering files are input through the file input equipment and stored in the file storage server, and after the engineering is completed, the file acceptance check equipment checks and accepts the engineering files stored in the file storage server according to acceptance rules. Although the scheme can realize digital management and storage of the engineering archives, compared with the traditional management mode, the efficiency is improved, but the efficiency is still lower due to the fact that data redundancy and associated storage of the engineering archives are still not solved.

Disclosure of Invention

The invention mainly aims to provide an active collection method for engineering files based on information difference comparison, which is characterized in that engineering file data are collected uniformly, then are classified after being divided by using a boundary concept to obtain a redundant part, and then the redundant part is processed independently to improve the management efficiency of the engineering data, and meanwhile, the engineering data are sorted by using an index mode to improve the utilization efficiency of the data.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

an active collection method for engineering files based on information difference comparison is implemented, and the method comprises the following steps:

step 1: acquiring engineering archive data, wherein the data volume of the acquired engineering archive data is greater than a set threshold value, and acquiring mark data from the engineering archive data; the mark data is other data except the engineering file content used for representing the engineering file;

step 2: based on the acquired mark data, determining a set boundary by using a preset boundary model to obtain a plurality of set boundaries;

and step 3: classifying each project archive data into a set boundary corresponding to the mark data of the project archive data to obtain a plurality of sets;

and 4, step 4: performing secondary classification on the engineering archive data in each set based on content identification to obtain a plurality of engineering archive data classes in each set, wherein each engineering archive data class comprises at least one engineering archive data;

and 5: creating a file index for each collection; the file index comprises a plurality of pointers, and each pointer is used for pointing to one project archive data class;

step 6: and performing redundancy analysis on any two sets in all the sets to obtain redundant parts, removing parts corresponding to the redundant parts in each set, only retaining pointers in the file index, and simultaneously, independently storing all the redundant parts.

Further, the flag data includes: time, description, and related information; the time is the time of creating the engineering archive data; the description is an engineering project description corresponding to the engineering archive data; the related information is other additional information of the engineering file.

Further, the threshold set in step 1 should satisfy the following formula: the set threshold > average data volume of engineering archive data 100.

Furthermore, the engineering project description is characterized by using a group of binary data, and the binary data is binary data corresponding to the text data of the engineering project description.

Further, the boundary model set in step 2 is expressed by using the following formula:

wherein T is the mean value of the time of creating the engineering archive data; t is the boundary time; n is the number of boundaries and is a set value; s is a binary array corresponding to the engineering project description, and S is a boundary array; the | | is the operation of solving the product of the elements in the array, and the calculation rule is as follows: multiplying each element in the array; g is a boundary judgment value, and if the boundary is within a set threshold range, the project file data is judged as a set boundary.

Further, in step 4, the following steps are performed on the engineering archive data in each set by using a method for secondary classification based on content identification: extracting data characteristics of the engineering archive data; inputting the engineering archive data into a pre-established content classifier set, and acquiring a probability value output by each content classifier in the content classifier set according to data characteristics of the engineering archive data, wherein each content classifier comprises a category, the category corresponds to a type label, each category comprises a plurality of contents, each content in the plurality of contents corresponds to an entity label, and the type label is determined according to the number of the same entity label of the content in each category; selecting at least one content classifier from the content classifier set as a target content classifier according to the probability value output by each content classifier; determining the similarity between each content in the target content classifier and the engineering archive data, and selecting a plurality of contents from the target content classifier as target contents according to the similarity; and labeling the engineering archive data by using the type label corresponding to the category in the target content classifier and the entity label corresponding to the target content in the target content classifier.

Furthermore, each content classifier corresponds to a coefficient vector, and the obtaining of the probability value output by each content classifier in the content classifier set according to the data characteristics of the engineering archive data comprises: converting the data characteristics of the engineering archive data into characteristic vectors of the engineering archive data; calculating the probability value output by each content classifier according to the coefficient vector of each content classifier and the feature vector of the engineering archive data by using the following formula:

wherein A is a feature vector of the engineering archive data, and B is a coefficient vector of the content classifier; p is a probability value output by the content classifier; s is a probability adjustment coefficient, and the value range is as follows: 3 to 6.

Further, the method for performing redundancy analysis on any two sets in all sets in step 6 to obtain the redundant part includes: performing cross comparison on the two sets, and taking the similar part of the cross comparison as a redundant part; and each set in all the sets is subjected to cross comparison by traversing all other sets except the set, the set with the highest redundant part ratio after cross comparison is used as the cross comparison set of the set, and the obtained redundant part is used as the redundant part of the set and the corresponding cross comparison set.

Further, after the engineering archive data is obtained, the method further comprises the step of carrying out data cleaning on the engineering archive data.

Further, after the engineering archive data is subjected to data cleaning, the engineering archive data is subjected to data standardization processing.

Compared with the prior art, the active collecting method of the engineering file implementation based on the information difference comparison has the following beneficial effects: the engineering archive data are collected uniformly, the concept of the boundary is used, the engineering archive data are classified after being divided, so that redundant parts are obtained, the redundant parts are processed independently, the management efficiency of the engineering data is improved, the engineering data are sorted in an index mode, and the utilization efficiency of the data is improved. The method is mainly realized by the following steps:

1. construction of set boundaries: according to the method, the first classification of the project archive data is realized by constructing the set boundary, and the project archive data with certain data similarity are classified into a large class in the process, so that the project archive data are subjected to data redundancy processing in the follow-up process, and the processing efficiency is improved;

2. classifying engineering archive data: the invention classifies the project data archives in a set again to realize higher management efficiency of the project archive data, and the method is mainly realized by two aspects: firstly, an index is established, classified engineering archive data usually belong to a project or have strongly-associated engineering archive data, the engineering archive data are managed through a pointer by using the index, and the related engineering archive data can be obtained when the engineering archive data are called once, so that the efficiency is improved; secondly, the complexity of redundancy analysis can be reduced by performing redundancy analysis on the classified engineering archive data, and because the redundant data in the similar engineering archive data is more, cross comparison on other engineering archive data is not needed, so that the efficiency is improved.

Drawings

Fig. 1 is a schematic flow chart of a method for actively collecting engineering files based on information difference comparison according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating a principle of finding a redundant portion by a set boundary according to an active collection method for an engineering file based on information difference comparison provided by an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating a principle that file management is performed by constructing a file index according to the active collection method for engineering files based on information difference comparison provided in the embodiment of the present invention.

Detailed Description

The method of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments of the invention.

Example 1

As shown in fig. 1, the active collection method is implemented based on the engineering files of the information difference comparison, and the method comprises the following steps:

Specifically, the bim (building Information modeling) technology is a datamation tool applied to engineering design, construction and management, and through building datamation and informatization model integration, sharing and transmission are performed in the whole life cycle process of project planning, operation and maintenance, so that engineering technicians can correctly understand and efficiently respond to various building Information, a foundation for cooperative work is provided for design teams and all building main bodies including buildings and operation units, and important functions are played in the aspects of improving production efficiency, saving cost and shortening construction period.

Reference is made herein to the definition of BIM by the national BIM standard (NBIMS), which consists of three parts:

(1) BIM is a digital representation of the physical and functional characteristics of a facility (construction project);

(2) BIM is a shared knowledge resource, is a process that shares information about the facility and provides reliable basis for all decisions in the full life cycle of the facility from concept to demolition;

(3) at different stages of the facility, different stakeholders support and reflect the cooperative work of their respective responsibilities by inserting, extracting, updating and modifying information in the BIM.

Example 2

On the basis of the above embodiment, the flag data includes: time, description, and related information; the time is the time of creating the engineering archive data; the description is an engineering project description corresponding to the engineering archive data; the related information is other additional information of the engineering file.

Example 3

On the basis of the above embodiment, the threshold set in step 1 should satisfy the following formula: the set threshold > average data volume of engineering archive data 100.

Specifically, the threshold is set to ensure that the acquired engineering archive data is sufficient, so that the finally established index and boundary have strong applicability to subsequently acquired engineering answer data. If only a small amount of data is acquired, the result obtained by the post-processing is low in applicability, and the whole process cannot play a role in improving the management efficiency.

Example 4

On the basis of the above embodiment, the engineering project description is characterized by using a set of binary data, and the binary data is binary data corresponding to the text data of the engineering project description.

Example 5

On the basis of the above embodiment, the boundary model set in step 2 is expressed by using the following formula:

Specifically, the definition of the set boundary is: the project archive data within the boundary have identity. The engineering archive data with uniformity has a large amount of data overlapping, namely data redundancy. In this way, redundant data in the engineering archive data can be found with greater efficiency. In the prior art, a comparison method for each data is generally used, and the whole data document needs to be traversed, so that the efficiency is low.

Example 6

On the basis of the previous embodiment, in step 4, the following steps are performed on the project archive data in each set by using a method for secondary classification based on content identification: extracting data characteristics of the engineering archive data; inputting the engineering archive data into a pre-established content classifier set, and acquiring a probability value output by each content classifier in the content classifier set according to data characteristics of the engineering archive data, wherein each content classifier comprises a category, the category corresponds to a type label, each category comprises a plurality of contents, each content in the plurality of contents corresponds to an entity label, and the type label is determined according to the number of the same entity label of the content in each category; selecting at least one content classifier from the content classifier set as a target content classifier according to the probability value output by each content classifier; determining the similarity between each content in the target content classifier and the engineering archive data, and selecting a plurality of contents from the target content classifier as target contents according to the similarity; and labeling the engineering archive data by using the type label corresponding to the category in the target content classifier and the entity label corresponding to the target content in the target content classifier.

Specifically, coordination is the key content in the construction industry, and is the coordination and coordination work of construction units, owners and design units. Once problems are encountered in the implementation process of the project, all related people need to be organized and coordinated, the reasons and the solutions of all construction problems are found, then, changes are made, corresponding remedial measures are made, and the like, so that the problems are solved. In design, collision between various specialties often occurs because communication between professional designers is not in place. When pipelines in professions such as heating ventilation and the like are arranged, construction drawings are respectively drawn on the respective construction drawings, and in the real construction process, structural design members such as beams and the like possibly block the arrangement of the pipelines at the right positions when the pipelines are arranged, and the coordination and the solution of the collision problem can be only solved after the problems occur. The BIM coordination service can help to deal with the problems, namely the BIM building information model can coordinate collision problems of each specialty in the early stage of building construction, generate coordination data and provide the coordination data. Of course, the coordination of BIMs does not address the issue of collisions between professionals, but it can address, for example, coordination of hoistway arrangements with other design arrangements and clearance requirements, coordination of fire zones with other design arrangements, coordination of underground drainage arrangements with other design arrangements, and the like.

Example 7

On the basis of the previous embodiment, each content classifier corresponds to one coefficient vector, and the obtaining of the probability value output by each content classifier in the content classifier set according to the data characteristics of the engineering archive data comprises: converting the data characteristics of the engineering archive data into characteristic vectors of the engineering archive data; calculating the probability value output by each content classifier according to the coefficient vector of each content classifier and the feature vector of the engineering archive data by using the following formula:

Specifically, the whole design, construction and operation process is a continuous optimization process. Of course, there is no substantial necessary link between optimization and BIM, but better optimization can be made based on BIM. Optimization is limited by three factors: information, complexity, and time. Without accurate information, no reasonable optimization results can be made, and the BIM model provides information of the actual existence of the building, including geometric information, physical information, rule information, and also provides information of the actual existence of the building after changes. When the complexity is high, the ability of the participators cannot master all information, and certain scientific technology and equipment help must be used. The complexity of modern buildings is mostly beyond the capability limit of the participators, and BIM and various optimization tools matched with the BIM provide the possibility of optimizing complex projects.

Example 8

On the basis of the above embodiment, the method for performing redundancy analysis on any two sets in all sets in step 6 to obtain the redundant part includes: performing cross comparison on the two sets, and taking the similar part of the cross comparison as a redundant part; and each set in all the sets is subjected to cross comparison by traversing all other sets except the set, the set with the highest redundant part ratio after cross comparison is used as the cross comparison set of the set, and the obtained redundant part is used as the redundant part of the set and the corresponding cross comparison set.

Specifically, through the ergodic cross-comparison, an overlapping portion is generated between one project file data and a plurality of other project file data. If each overlapping part is treated as a redundant part, the engineering document data is divided into a plurality of parts, the integrity of the data is lost, the management efficiency is not improved, and the management efficiency is reduced because the data dispersion is overlarge.

Example 9

On the basis of the above embodiment, after the engineering archive data is acquired, the method further includes a step of performing data cleaning on the engineering archive data.

Example 10

Based on the above embodiment, the engineering archive data is subjected to data cleaning and then is subjected to data standardization processing, although several embodiments have been provided in the present invention, it should be understood that the disclosed system and method may be embodied in many other specific forms without departing from the spirit or scope of the present invention.

The present examples are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein. For example, various elements or components may be combined or combined in another system, or certain features may be omitted, or not implemented.

Furthermore, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled or directly coupled or communicating with each other may also be indirectly coupled or communicating through some interface, device, or intermediate component, whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the spirit and scope disclosed herein.

Claims

1. The active collection method for the engineering file implementation based on the information difference comparison is characterized by comprising the following steps:

2. The method of claim 1, wherein the flag data comprises: time, description, and related information; the time is the time of creating the engineering archive data; the description is an engineering project description corresponding to the engineering archive data; the related information is other additional information of the engineering file.

3. The method of claim 2, wherein the threshold set in step 1 satisfies the following equation: the set threshold > average data volume of engineering archive data 100.

4. The method of claim 3, wherein the engineering project description is characterized using a set of binary data corresponding to textual data of the engineering project description.

5. The method of claim 3, wherein the boundary model set in step 2 is represented using the following formula:

6. The method of claim 5, wherein the step 4 of secondarily categorizing the engineering archive data in each set using content recognition-based method performs the steps of: extracting data characteristics of the engineering archive data; inputting the engineering archive data into a pre-established content classifier set, and acquiring a probability value output by each content classifier in the content classifier set according to data characteristics of the engineering archive data, wherein each content classifier comprises a category, the category corresponds to a type label, each category comprises a plurality of contents, each content in the plurality of contents corresponds to an entity label, and the type label is determined according to the number of the same entity label of the content in each category; selecting at least one content classifier from the content classifier set as a target content classifier according to the probability value output by each content classifier; determining the similarity between each content in the target content classifier and the engineering archive data, and selecting a plurality of contents from the target content classifier as target contents according to the similarity; and labeling the engineering archive data by using the type label corresponding to the category in the target content classifier and the entity label corresponding to the target content in the target content classifier.

7. The method of claim 6, wherein each content classifier corresponds to a coefficient vector, and wherein obtaining a probability value for each content classifier in the set of content classifiers based on the data characteristics of the engineering profile data comprises: converting the data characteristics of the engineering archive data into characteristic vectors of the engineering archive data; calculating the probability value output by each content classifier according to the coefficient vector of each content classifier and the feature vector of the engineering archive data by using the following formula:

8. The method of claim 7, wherein the step 6 of performing redundancy analysis on any two of all sets to obtain the redundant parts comprises: performing cross comparison on the two sets, and taking the similar part of the cross comparison as a redundant part; and each set in all the sets is subjected to cross comparison by traversing all other sets except the set, the set with the highest redundant part ratio after cross comparison is used as the cross comparison set of the set, and the obtained redundant part is used as the redundant part of the set and the corresponding cross comparison set.

9. The method of claim 8, further comprising the step of performing a data cleansing on the engineering archive data after the engineering archive data is obtained.

10. The method of claim 9, wherein after the data cleaning of the engineering archive data, the data normalization processing of the engineering archive data is further performed.