CN112906717A - Method and system for identifying engineering file content - Google Patents

Method and system for identifying engineering file content Download PDF

Info

Publication number
CN112906717A
CN112906717A CN202110240632.5A CN202110240632A CN112906717A CN 112906717 A CN112906717 A CN 112906717A CN 202110240632 A CN202110240632 A CN 202110240632A CN 112906717 A CN112906717 A CN 112906717A
Authority
CN
China
Prior art keywords
engineering
model
project
file
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110240632.5A
Other languages
Chinese (zh)
Other versions
CN112906717B (en
Inventor
师玉鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Glodon Co Ltd
Original Assignee
Glodon Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Glodon Co Ltd filed Critical Glodon Co Ltd
Priority to CN202110240632.5A priority Critical patent/CN112906717B/en
Publication of CN112906717A publication Critical patent/CN112906717A/en
Application granted granted Critical
Publication of CN112906717B publication Critical patent/CN112906717B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • G06V10/422Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation for representing the structure of the pattern or shape of an object therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a system for identifying engineering file contents, wherein the method for identifying the engineering file contents comprises the following steps: acquiring engineering information of an engineering file; matching an adaptive target model according to the engineering information, wherein the target model is a pre-trained prediction model corresponding to different types of engineering files; and identifying the engineering file content by using the target model. The invention matches the adaptive target model through the engineering information of the engineering file, then identifies the engineering file content by using the target model, and provides reference for the user to select the model, thereby accelerating the speed of engineering identification and increasing the identification accuracy rate, so that the engineering which needs to be identified by the user can obtain satisfactory identification effect through the matched target model.

Description

Method and system for identifying engineering file content
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method and a system for identifying engineering file contents.
Background
Artificial intelligence technology has been introduced into the construction industry, taking project budget as an example, budget identification of a new project can be performed through a budget identification model established in advance, in the process of building the model, original data is analyzed through artificial intelligence, needed information can be accurately and rapidly identified, waste of labor cost is avoided, and how to accurately identify information data needed by a user is an important problem to be solved. The traditional machine learning method is to use a large amount of engineering data to train the recognition model, and then use the model to recognize the user's engineering, but this method needs a large amount of original engineering, the workload is huge, it is very long consuming time, and the graphic information of different engineering is complicated and changeable, and the correlation between each other is low, so that the engineering that the user needs to recognize is difficult to obtain satisfactory recognition effect through the model obtained by training, and then the practicability of the recognition model is influenced.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and a system for identifying engineering document contents, which solve the problems in the prior art that training of an identification model requires a large amount of engineering data, which results in a large workload, long time consumption, and low practicability of the identification model.
According to a first aspect, an embodiment of the present invention provides a method for identifying project file content, including: acquiring engineering information of an engineering file; matching an adaptive target model according to the engineering information, wherein the target model is a pre-trained prediction model corresponding to different types of engineering files; and identifying the engineering file content by using the target model.
Optionally, the matching an adapted target model according to the engineering information includes: searching a model consistent with the first attribute of the engineering information in a preset model by using the first attribute of the engineering information; and when a model consistent with the first attribute of the engineering information is found in a preset model, determining the found model as a target model.
Optionally, the matching an adapted target model according to the engineering information includes: when a model which is consistent with the first attribute of the engineering information is not found in a preset model, searching a model meeting a first preset searching condition in the preset model by using the second attribute of the engineering information; when a model meeting a first preset searching condition is searched in a preset model, searching a model meeting a second preset searching condition in the searched model by using a third attribute of the engineering information, and determining an identification model; determining the engineering similarity between the engineering information and each preset historical engineering information according to a preset similarity algorithm; and determining a target model in the recognition model according to the engineering similarity result.
Optionally, the determining a target model in the recognition model according to the engineering similarity result includes: determining each candidate model from each recognition model according to the sequencing result of the engineering similarity from large to small; determining the candidate model as a target model in response to the candidate model selection operation.
Optionally, the method for identifying the content of the project file provided in the embodiment of the present invention further includes: and when the candidate model selection operation is not acquired within the preset time, determining the candidate model of the first sequence in the sequencing result as a target model.
Optionally, the matching an adapted target model according to the engineering information includes: training prediction models of different types of engineering files to determine engineering training models; and matching an adaptive target model in the engineering training model according to the engineering information.
Optionally, the training the prediction models of the different types of engineering documents to determine the engineering training model includes: acquiring a current project file and various historical project files; determining the project identification rate of the drawing files in the historical project files according to a preset identification algorithm; determining a training data set according to the project recognition rate corresponding to each historical project file and the project attributes corresponding to the current project file and each historical project file; and performing model training according to the training data set, and determining the engineering training model corresponding to the current engineering file.
Optionally, the determining, according to a preset identification algorithm, the project identification rate of the drawing file in the historical project file further includes: and determining the project identification rate of the drawing files in the historical project files based on the primitive reference information in the historical project files.
Optionally, the determining the project identification rate of the drawing file in the historical project file based on the primitive reference information in the historical project file includes: determining the primitives contained in the drawing file in the historical engineering file according to a preset primitive recognition algorithm; determining primitive identification information of each primitive according to a preset service rule; determining the recognition rate corresponding to each primitive according to the primitive recognition information of each primitive and the primitive reference information corresponding to each primitive in the historical project file; and determining the project recognition rate according to the recognition rate corresponding to each primitive.
Optionally, the determining a training data set according to the project recognition rate corresponding to each historical project file and the project attributes corresponding to the current project file and each historical project file includes: determining the project classification results of the current project file and each historical project file according to the project attributes; and determining the historical project files which are the same as the current project file classification result and have the project recognition rate meeting the preset recognition rate requirement as the training data set according to the project classification result and the preset recognition rate threshold requirement.
Optionally, the identifying the engineering document content by using the target model includes: acquiring an engineering file; and inputting the drawing file in the engineering file into the target model, and determining an engineering recognition result.
According to a second aspect, an embodiment of the present invention provides a system for identifying engineering document contents, including: the acquisition module is used for acquiring the project information of the project file; the first processing module is used for matching an adaptive target model according to the engineering information, wherein the target model is a pre-trained prediction model corresponding to different types of engineering files; and the second processing module is used for identifying the engineering file content by utilizing the target model.
An embodiment of the present invention provides a non-transitory computer-readable storage medium, which stores computer instructions, and when the computer instructions are executed by a processor, the computer instructions implement the method for identifying engineering file content according to the first aspect of the present invention and any one of the optional manners of the first aspect of the present invention.
An embodiment of the present invention provides an electronic device, including: the system comprises a memory and a processor, wherein the memory and the processor are connected in communication with each other, the memory stores computer instructions, and the processor executes the computer instructions to execute the method for identifying the engineering file content according to the first aspect of the present invention and any one of the optional modes thereof.
The technical scheme of the invention has the following advantages:
1. the embodiment of the invention provides a method for identifying engineering file contents, which matches an adaptive target model through engineering information of an engineering file, identifies the engineering file contents by using the target model, provides reference for a user to select the model, accelerates the speed of engineering identification and increases the identification accuracy rate, so that the engineering which needs to be identified by the user can obtain a satisfactory identification effect through the matched target model.
2. The embodiment of the invention provides a method for identifying the content of an engineering file, which provides a selection basis of training data for a user by determining the engineering identification rate and each project attribute of historical engineering; a training data set is selected by self-definition, and the training set is narrowed and focused, so that a quick training model and an accurate recognition effect are achieved; the method has the advantages that model training is carried out according to the training data set, the engineering training model corresponding to the current engineering file is determined, the identification model can be provided for identification of all subsequent similar engineering, simplicity and high efficiency are achieved, the speed and the identification rate of model training are increased, the model training is more targeted, and the practicability of the engineering training model is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart of a method for identifying project file content in an embodiment of the present invention;
FIG. 2 is a diagram illustrating a result of ranking engineering similarity according to an embodiment of the present invention;
FIG. 3 is another detailed flowchart of the method for identifying the content of the project file according to the embodiment of the present invention;
FIG. 4 is a flow chart illustrating the identification of project file content in an embodiment of the present invention;
FIG. 5 is a diagram illustrating an identification system for project file content in accordance with an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device in an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The traditional machine learning method is that a large amount of engineering data is used for training an identification model, a large amount of errors or useless information is interfered in engineering, thousands of engineering data are trained together, data needs to be derived, data is cleaned, model training and other steps are required, so that the training speed is slow, good effect cannot be obtained for the engineering which needs to be predicted by a user, and drawings made by the same design institute are often similar and have certain similarity, so that the user has great reference if the user makes the budget engineering of the design institute; in order to provide a recognition model for a user more accurately and quickly, the user can customize training data, reduce and focus a training set, and achieve quick training models and accurate recognition effects.
The embodiment of the invention provides a method for identifying engineering file contents, which specifically comprises the following steps of:
step S1: and acquiring the project information of the project file. The engineering information is description of various information of the engineering file by a user, and can include information such as an engineering file name, an engineering type (stadium construction engineering and market construction engineering), a building purpose, a project location, a building type, a building area, a service type and the like corresponding to the engineering file, the engineering information can be directly acquired by the electronic equipment from the outside, or can be acquired by a data acquisition device in the electronic equipment, and the manner of acquiring the engineering information by the electronic equipment is not limited at all, and only the electronic equipment needs to be ensured to acquire the engineering information.
Step S2: and matching an adaptive target model according to the engineering information, wherein the target model is a pre-trained prediction model corresponding to different types of engineering files.
Specifically, in an embodiment, the step S2 includes the following steps:
step S01: and searching a model consistent with the first attribute of the engineering information in the preset model by using the first attribute of the engineering information.
In the embodiment of the present invention, the first attribute in the obtained engineering information is utilized to perform accurate identification, that is, a model consistent with the first attribute of the engineering information is searched in a preset model, where the first attribute of the engineering information may be a corresponding engineering file name of an engineering file, it should be noted that, in this embodiment, only the engineering file name is taken as an example for description, and in practical application, the first attribute may be set according to an actual requirement as long as a requirement for performing accurate search can be met, which is not limited in this embodiment.
Step S02: and when the model consistent with the first attribute of the engineering information is found in the preset model, determining the found model as a target model. And when the first attribute is utilized to carry out accurate searching, the model corresponding to the engineering file exists in the current preset model, and the searched model is determined as the target model.
Step S03: and when the model which is consistent with the first attribute of the engineering information is not searched in the preset model, searching the model meeting the first preset searching condition in the preset model by using the second attribute of the engineering information.
When the model consistent with the first attribute of the engineering information is not found in the preset model, it represents that there is no model consistent with the current engineering file completion in the preset model, and a model meeting the first preset finding condition needs to be found in the preset model by using the second attribute of the engineering information, where the second attribute may be an engineering type in the engineering information, for example. In this embodiment, the fuzzy search may be performed by using the second attribute, and it is only required to find a model that meets a first preset search condition, where the first preset search condition may be an engineering file similar to the current engineering file, where a range of the fuzzy search is 70%.
Step S04: and when the model meeting the first preset searching condition is searched in the preset models, searching the model meeting the second preset searching condition in the searched model by using the third attribute of the engineering information, and determining the identification model.
In this embodiment, when only one model is determined by using the first preset search condition, the model may be directly determined as the identification model, and when a plurality of models can be determined by using the first preset search condition, a model satisfying the second preset search condition is searched for in the searched models by using the third attribute of the engineering information, and the identification model is determined, where the third attribute may be information such as building use.
Step S05: and determining the engineering similarity between the engineering information and each preset historical engineering information according to a preset similarity algorithm.
In the embodiment of the present invention, the preset similarity algorithm selects an algorithm mature in the industry, the Levenshtein distance, also called an edit distance, refers to the minimum number of editing operations required to convert one character into another between two character strings, and the allowable editing operations include replacing one character with another, inserting one character, and deleting one character. It should be noted that, in the embodiment of the present invention, the edit distance is selected to perform the similarity calculation, and in practical application, other similarity calculation methods may be selected according to practical requirements, for example, algorithms such as manhattan distance and log likelihood similarity, which is not limited to this.
Step S06: and determining a target model in the recognition model according to the engineering similarity result.
Specifically, in an embodiment, the step S06 further includes the following steps:
step S061: and determining each candidate model from each recognition model according to the sequencing result of the engineering similarity from large to small.
In the embodiment of the present invention, as shown in fig. 2, the engineering similarity is ranked from large to small, and each candidate model is determined from the recognition models according to the ranking result of the engineering similarity from large to small, where the candidate model may be customized as a model with a similarity greater than a certain value, for example, a model with a similarity greater than 60% is selected and determined as the candidate model.
Step S062: in response to the candidate model selection operation, the candidate model is determined as the target model.
Step S063: and when the candidate model selection operation is not acquired within the preset time, determining the candidate model of the first sequence in the sequencing result as the target model.
Specifically, in an embodiment, as shown in fig. 3, the step S2 further includes the following steps:
step S21: and training the prediction models of different types of engineering files to determine the engineering training model.
In an embodiment, the step S21 further includes the following steps:
step S211: and acquiring the current project file and each historical project file.
In the embodiment of the invention, a large number of historical engineering files and current engineering files uploaded by a user in an earlier stage are acquired to a server and stored to provide training data for subsequent engineering training, wherein the engineering files are CAD (computer-aided design) construction drawings of corresponding engineering, and it should be noted that other types of engineering drawings can be selected as well as the invention is not limited to the above.
Step S212: and determining the project identification rate of the drawing files in the historical project files according to a preset identification algorithm.
In an embodiment, the step S212 further includes: and determining the project recognition rate of the drawing files in the historical project files based on the primitive reference information in the historical project files. Determining the overall project recognition rate of the drawing files in all the uploaded project files, wherein specific service primitives of each project file are determined according to the primitive reference information in the historical project files and a preset recognition algorithm, for example: determining the primitive recognition rate of the masonry wall, the shear wall, the column, the slab rib and the like, and finally representing the engineering recognition rate of the drawing file by using the average value of each specific primitive, specifically, the process of determining the primitive recognition rate of the slab rib is taken as an example for detailed description, and no further description is provided herein.
The method comprises the following steps of determining the project identification rate of a drawing file in a historical project file based on the primitive reference information in the historical project file, and further comprising the following steps:
(1) and determining the primitives contained in the drawing file in the historical engineering file according to a preset primitive recognition algorithm. In the embodiment of the invention, the engineering establishment needs CAD drawings which contain service information, so that CAD drawing identification operation can be carried out firstly, taking the identification of the plate bars as an example, CAD primitives representing the plate bars are extracted according to a preset identification algorithm, wherein the preset identification algorithm can identify the primitives by scanning pixels, the scanning mode is line-by-line scanning, when an unmarked primitive pixel is scanned, the primitive is marked, then the primitive pixel is used as a seed of a new primitive, and the remaining pixels of the new primitive are searched by identifying and marking all the primitive pixels adjacent to the seed; all the primitives contained in the paper file are identified in the same way. It should be noted that the preset recognition algorithm in the embodiment of the present invention is only illustrated by way of example, and other existing recognition algorithms may be selected in practical application as long as the preset recognition algorithm can recognize the primitive in the engineering file, which is not limited to this.
(2) And determining the primitive identification information of each primitive according to a preset service rule.
In the embodiment of the present invention, after the primitive of the engineering file is identified, the primitive identification information of each primitive, that is, the information of the range, thickness, height, and the like of the plate bar, is determined according to the preset service rule, where the preset service rule is the existing analysis service rule, as long as the primitive identification information of each primitive can be extracted, and the present invention is not limited thereto.
(3) And determining the recognition rate corresponding to each primitive according to the primitive recognition information of each primitive and the primitive reference information corresponding to each primitive in the historical engineering file.
In the embodiment of the present invention, each historical engineering file includes information on actual related parameters of a tendon primitive established by a user, that is, primitive reference information, for example: the range, the thickness, the height and the like are used as reference objects, the primitive identification information of each primitive in the current engineering file is compared with the primitive reference information, whether the identified plate rib primitive is a real plate rib or not is determined, and then the identification rate corresponding to the same primitive is calculated according to the final identification result.
(4) And determining the project recognition rate according to the recognition rate corresponding to each primitive.
In the embodiment of the invention, the CAD drawing identifies the corresponding layer, then generates a service primitive set alpha according to the layer information, compares the service primitive set alpha with a service primitive set gamma in historical engineering to generate the identification rate (alpha/gamma) of the corresponding primitive, carries out the top operation on a plate, a plate rib, a masonry wall, a shear wall and the like, calculates various identification rates and calculates the average value, namely the integral engineering identification rate.
Step S213: and determining a training data set according to the project recognition rate corresponding to each historical project file and the project attributes corresponding to the current project file and each historical project file.
In the embodiment of the invention, a user selects training data according to the project recognition rate and the current project (namely the project to be manufactured subsequently), and uses the data to train the model, determining a training data set according to the project recognition rate corresponding to each historical project file and the project attributes corresponding to the current project file and each historical project file, wherein the project attributes corresponding to the current project file and each historical project file can be specific drawing sources, basic information of projects and the like, for example, in a real-world application, where the server stores a series of stadium projects for a certain group, then if the current project file is also a stadium project for that group, determining historical engineering files corresponding to projects with the engineering recognition rates meeting the requirements in a series of playground projects of the group stored in historical data as a training data set; it should be noted that, in the embodiment of the present invention, only a certain item in a certain group is illustrated, and the item attribute may also include an attribute of another identification item, which is not limited to this.
Taking the identification of the plate ribs as an example, the calculation method of the characteristics comprises the following steps:
int m _ nGroupColor; the color of each CAD line segment can be obtained in the set of color CAD drawings, and the color can be directly obtained
int m _ nGroupObjectCount; the number of groups is connected through lines connected with the image layer, the color and the color, and the number is counted
double m _ dGroupAvgLengthTimes; the ratio of the average length of the group to the average length of the whole drawing, wherein each line segment in the drawing has a length, the lengths are uniform, and the sum is the total length; the lengths of the group are also the sum lengths, and the ratio can be calculated
double m _ dGroupObjectCountEDOTimes; the ratio of the number of groups to the number of the group of drawing elements, and the ratio of the number of the drawing elements in the connected group to the number of all line segments of the CAD drawing
int m _ dGroupCircleCount; v/number of complete circles formed, number of circles in the connected group, and calculating by closing the circular arc to see whether the circle is a circle or not
double m _ dGroupAvgObjectEDOCount; // number of primitives averaged for each group, total primitives of the drawing divided by the number of total connected groups
double m _ dGroupArcCountNLineCountTimes; // ratio of the set of arcs to the set of lines, ratio of the set of arcs to the set of lines
bool m _ bGroupIsLineContinuous; whether the line type is continuous or not, whether the line type of the CAD drawing is continuous or not, and whether the CAD drawing has a mark or not can be directly obtained
Cool m _ bGroupIsDashEmpty; if there is a special line, there is a mark in CAD, it is directly obtained
int m _ nGroupLineType; line type, with identification in CAD, direct acquisition
Pool m _ bGroupISALLLLinePallel; if all lines are not parallel, calculate the angle of the lines in this group to see if they are consistent
int m _ nggroupclosedareacount; v/number of enclosed areas formed, count the number of enclosed areas in the group, whether each line can be combined with other lines into an enclosed area
int m _ nggroupwlinecount; // the number of primitives of the group, the total number of lines of the group.
And each group carries out further feature extraction according to the refined connected clustering group:
int m _ nObjectLineCount; // the number of lines in the group, the total number of lines in the group
int m _ nObjectAvgPtCountOfEndPt; // average number of end points wire intersections, number of end points of each wire intersecting other wires
int m _ nObjectAvgPtCountNotEndPt; // average number of lines intersected by non-endpoints, number of intersections of each line out-endpoint with other lines
double m _ dObjectArcCountNLineCount; // the ratio of the number of arcs (not circles) to the number of straight lines in the set
double m _ dObjectLonestLineNshortLineTimes; // ratio of longest line to short line connecting other end points
double m _ dObjectLonestLineNshortLinestartangle; // starting angle of short line connecting longest line with other end point
double m _ dObjectLonestLineN short LineEndAngle; // end angle of the short line connecting the longest line to the other end
int m _ nObjectClosedAreaCount; // number of enclosed areas formed
int m _ nObjectArcEDOCount; number of arcs
double m _ dObjectArcEDOTimes; // ratio of number of arcs to overall number
double m _ dObjectArcEndPtToLine; the ratio of two end points of the arc line connected with the line segment to the other end point of the arc line connected with the line segment (including one end connected with the line segment and the other end connected with no line)
double m _ dObjectHeadTailLinePaella; v/end-to-end arc segments, parallel ratio of each other (because there may be multiple lines, it is necessary to calculate whether the straight lines at the two ends of each arc are parallel, calculate the ratio)
double m _ dObjectArcTangentVctParalleToLine; the ratio of the tangent direction of the arc to the parallelism of the line segments connected end to end (because there may be multiple lines, the ratio of the tangent of each arc to the parallelism of the line connecting other end points is calculated)
double m _ dObjectAvgLengthTimes; the ratio of the average length of the connected group to the average length of the whole drawing
double m _ dObjectAvgGroupLengthTimes; the ratio of the average length of the connected group to the average length of the group is the ratio of the sum of the lines to the sum of the line lengths of the group
double m _ dObjectIsAllParalleOrVertical; the angle of the set of proportional calculations whether all are vertical or parallel (except for the arcs) to horizontal determines whether vertical or horizontal.
In practical application, taking a steel bar model as an example, the system derives various data with labels, the last 0 represents not the steel bar, the 1 represents the steel bar, and the front is the values of various characteristics, a prediction model can be obtained by training by using the data, when the prediction model is used, a piece of characteristic information is input, and the model outputs 0 or 1, namely not the steel bar or the steel bar; what is described above is a binary classification, i.e. whether the prediction is for something or not. The model can also be classified in multiple ways, that is, for predicting various things, only the last label needs to be modified, for example, 0 represents unknown, 1 represents steel bar, 2 represents wall, 3 represents column, etc. The method can be used for specifically predicting through a random forest, a forest is established in a random mode, a plurality of decision trees are arranged in the forest, and each decision tree of the random forest is not related. After a forest is obtained, when a new input sample enters, each decision tree in the forest is judged, the class to which the sample belongs is seen (for a classification algorithm), and then the class is selected most, so that the sample is predicted to be the class.
It should be noted that, the embodiment of the present invention illustrates that feature prediction is performed by using a random forest, and in practical applications, other existing prediction algorithms may also be selected according to actual items and system requirements, which is not limited to the present invention.
Specifically, in an embodiment, the step S213 further includes the following steps:
(1) and determining the project classification results of the current project file and each historical project file according to the project attributes. In the embodiment of the invention, the project classification results of the current project file and each historical project file are determined according to the project attributes, the historical project files are completely classified according to the project attributes in the identification process or the input process, and the historical project files similar to the current project file are determined before the user-defined project training model corresponding to the current project file is constructed so as to determine the training data set.
(2) And determining the historical engineering files which have the same classification result as the current engineering files and the engineering recognition rate meeting the preset recognition rate requirement as a training data set according to the project classification result and the preset recognition rate threshold requirement.
In the embodiment of the invention, after the project classification result is determined, the projects with the classified project recognition rate of more than 80% are determined as the training data set, so that some project files with inaccurate recognition are avoided, and the accuracy of recognition and model training is ensured.
Step S214: and performing model training according to the training data set, and determining the engineering training model corresponding to the current engineering file. In the embodiment of the invention, only some necessary engineering information is determined by effectively selecting the training data, the error and repeated information in the engineering information is filtered, and the training set is narrowed and focused by customizing the training data, so that the rapid training model and the accurate recognition effect are achieved.
Step S22: and matching an adaptive target model in the engineering training model according to the engineering information.
Step S3: and identifying the engineering file content by using the target model.
Specifically, in an embodiment, as shown in fig. 4, the step S3 includes the following steps:
step S31: and acquiring the project file. The engineering file can be obtained by the electronic equipment directly from the outside, or can be an engineering file to be identified, which is obtained by utilizing a data acquisition device in the electronic equipment, wherein the mode of obtaining the engineering file to be identified by the electronic equipment is not limited at all, and only the electronic equipment is required to be ensured to be capable of obtaining the engineering file to be identified.
Step S32: and inputting a drawing file in the engineering file into the target model, and determining an engineering recognition result.
The embodiment of the invention provides a method for identifying engineering file contents, which is characterized in that an adaptive target model is matched through engineering information of an engineering file, then the target model is used for identifying the engineering file contents, and a reference is provided for a user to select the model, so that the speed of engineering identification is increased, the identification accuracy is increased, and the engineering to be identified by the user can obtain a satisfactory identification effect through the matched target model; providing a selection basis of training data for a user by determining the project recognition rate and each project attribute of the historical project; a training data set is selected by self-definition, and the training set is narrowed and focused, so that a quick training model and an accurate recognition effect are achieved; the method has the advantages that model training is carried out according to the training data set, the engineering training model corresponding to the current engineering file is determined, the identification model can be provided for identification of all subsequent similar engineering, simplicity and high efficiency are achieved, the speed and the identification rate of model training are increased, the model training is more targeted, and the practicability of the engineering training model is improved.
An embodiment of the present invention further provides a system for identifying engineering document content, as shown in fig. 5, including:
the acquisition module 1 is used for acquiring the project information of the project file. For details, reference is made to the description relating to step S1 in the above method embodiment.
And the first processing module 2 is used for matching an adaptive target model according to the engineering information, wherein the target model is a pre-trained prediction model corresponding to different types of engineering files. For details, reference is made to the description relating to step S2 in the above method embodiment.
And the second processing module 3 is used for identifying the engineering file content by using the target model. For details, reference is made to the description relating to step S3 in the above method embodiment.
Through the cooperative cooperation of the components, the embodiment of the invention provides an engineering file content identification system, which matches an adaptive target model with the engineering information of an engineering file, identifies the engineering file content by using the target model, provides reference for a user to select the model, further accelerates the engineering identification speed and increases the identification accuracy rate, so that the engineering to be identified by the user can obtain a satisfactory identification effect through the matched target model.
An embodiment of the present invention further provides an electronic device, as shown in fig. 6, the electronic device may include a processor 901 and a memory 902, where the processor 901 and the memory 902 may be connected by a bus or in another manner, and fig. 6 takes the connection by the bus as an example.
Processor 901 may be a Central Processing Unit (CPU). The Processor 901 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.
The memory 902, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods in the embodiments of the present invention. The processor 901 executes various functional applications and data processing of the processor, i.e., implements the above-described method, by executing non-transitory software programs, instructions, and modules stored in the memory 902.
The memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 901, and the like. Further, the memory 902 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 902 may optionally include memory located remotely from the processor 901, which may be connected to the processor 901 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory 902, which when executed by the processor 901 performs the methods described above.
The specific details of the electronic device may be understood by referring to the corresponding related descriptions and effects in the above method embodiments, and are not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, and the program can be stored in a computer readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.
The above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (14)

1. A method for identifying the content of a project file is characterized by comprising the following steps:
acquiring engineering information of an engineering file;
matching an adaptive target model according to the engineering information, wherein the target model is a pre-trained prediction model corresponding to different types of engineering files;
and identifying the engineering file content by using the target model.
2. The method for identifying the contents of the engineering document according to claim 1, wherein the matching an adaptive target model according to the engineering information comprises:
searching a model consistent with the first attribute of the engineering information in a preset model by using the first attribute of the engineering information;
and when a model consistent with the first attribute of the engineering information is found in a preset model, determining the found model as a target model.
3. The method for identifying the contents of the engineering document according to claim 2, wherein the matching an adaptive target model according to the engineering information comprises:
when a model which is consistent with the first attribute of the engineering information is not found in a preset model, searching a model meeting a first preset searching condition in the preset model by using the second attribute of the engineering information;
when a model meeting a first preset searching condition is searched in a preset model, searching a model meeting a second preset searching condition in the searched model by using a third attribute of the engineering information, and determining an identification model;
determining the engineering similarity between the engineering information and each preset historical engineering information according to a preset similarity algorithm;
and determining a target model in the recognition model according to the engineering similarity result.
4. The method for identifying the engineering document content according to claim 3, wherein the determining the target model in the identification model according to the engineering similarity result comprises:
determining each candidate model from each recognition model according to the sequencing result of the engineering similarity from large to small;
determining the candidate model as a target model in response to the candidate model selection operation.
5. The method for identifying the content of the project file as claimed in claim 4, further comprising: and when the candidate model selection operation is not acquired within the preset time, determining the candidate model of the first sequence in the sequencing result as a target model.
6. The method for identifying the contents of the engineering document according to claim 1, wherein the matching an adaptive target model according to the engineering information comprises:
training prediction models of different types of engineering files to determine engineering training models;
and matching an adaptive target model in the engineering training model according to the engineering information.
7. The method for identifying the contents of the engineering documents as claimed in claim 6, wherein the training of the prediction models of the engineering documents of different types to determine the engineering training model comprises:
acquiring a current project file and various historical project files;
determining the project identification rate of the drawing files in the historical project files according to a preset identification algorithm;
determining a training data set according to the project recognition rate corresponding to each historical project file and the project attributes corresponding to the current project file and each historical project file;
and performing model training according to the training data set, and determining the engineering training model corresponding to the current engineering file.
8. The method for recognizing the contents of the engineering documents as claimed in claim 7, wherein the determining the engineering recognition rate of the drawing documents in the historical engineering documents according to a preset recognition algorithm further comprises: and determining the project identification rate of the drawing files in the historical project files based on the primitive reference information in the historical project files.
9. The method for identifying the contents of the project files according to claim 8, wherein the determining the project identification rate of the drawing files in the historical project files based on the primitive reference information in the historical project files comprises:
determining the primitives contained in the drawing file in the historical engineering file according to a preset primitive recognition algorithm;
determining primitive identification information of each primitive according to a preset service rule;
determining the recognition rate corresponding to each primitive according to the primitive recognition information of each primitive and the primitive reference information corresponding to each primitive in the historical project file;
and determining the project recognition rate according to the recognition rate corresponding to each primitive.
10. The method for identifying the contents of the engineering documents according to claim 8, wherein the determining the training data set according to the engineering identification rate corresponding to each historical engineering document and the project attributes corresponding to the current engineering document and each historical engineering document comprises:
determining the project classification results of the current project file and each historical project file according to the project attributes;
and determining the historical project files which are the same as the current project file classification result and have the project recognition rate meeting the preset recognition rate requirement as the training data set according to the project classification result and the preset recognition rate threshold requirement.
11. The method for identifying the engineering document content according to claim 1, wherein the identifying the engineering document content by using the target model comprises:
acquiring an engineering file;
and inputting the drawing file in the engineering file into the target model, and determining an engineering recognition result.
12. A system for identifying project file content, comprising:
the acquisition module is used for acquiring the project information of the project file;
the first processing module is used for matching an adaptive target model according to the engineering information, wherein the target model is a pre-trained prediction model corresponding to different types of engineering files;
and the second processing module is used for identifying the engineering file content by utilizing the target model.
13. A non-transitory computer-readable storage medium storing computer instructions which, when executed by a processor, implement the method of identifying project file content of any of claims 1-11.
14. An electronic device, comprising:
a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the method of identifying project file content according to any one of claims 1 to 11.
CN202110240632.5A 2021-03-04 2021-03-04 Method and system for identifying engineering file content Active CN112906717B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110240632.5A CN112906717B (en) 2021-03-04 2021-03-04 Method and system for identifying engineering file content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110240632.5A CN112906717B (en) 2021-03-04 2021-03-04 Method and system for identifying engineering file content

Publications (2)

Publication Number Publication Date
CN112906717A true CN112906717A (en) 2021-06-04
CN112906717B CN112906717B (en) 2024-05-28

Family

ID=76108213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110240632.5A Active CN112906717B (en) 2021-03-04 2021-03-04 Method and system for identifying engineering file content

Country Status (1)

Country Link
CN (1) CN112906717B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553454A (en) * 2021-07-21 2021-10-26 广联达科技股份有限公司 Primitive data processing method and device and electronic equipment

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020129001A1 (en) * 2000-12-12 2002-09-12 Levkoff Jonathan S. Method and system for assimilation, integration and deployment of architectural, engineering and construction information technology
WO2014176456A1 (en) * 2013-04-25 2014-10-30 Brigham Young University Method and apparatus for concurrent multi-user toolpath creation
CN105302931A (en) * 2014-06-30 2016-02-03 上海神机软件有限公司 Recognition system and method for construction engineering drawing, and template arrangement system and method
CN105976077A (en) * 2016-03-04 2016-09-28 国家电网公司 Power transmission and transformation project cost dynamic control target calculating system and calculating method
US20180157769A1 (en) * 2016-03-21 2018-06-07 Brigham Young University Multi-reference interface inheritance for concurrent cad interoperability applications
CN109670267A (en) * 2018-12-29 2019-04-23 北京航天数据股份有限公司 A kind of data processing method and device
EP3575966A1 (en) * 2018-05-28 2019-12-04 Siemens Aktiengesellschaft Method and system for handling engineering data in a multi- engineering system environment
CN110543946A (en) * 2018-05-29 2019-12-06 百度在线网络技术(北京)有限公司 method and apparatus for training a model
CN110688445A (en) * 2018-06-19 2020-01-14 中国石化工程建设有限公司 Digital archive construction method
CN110765891A (en) * 2019-09-30 2020-02-07 万翼科技有限公司 Engineering drawing identification method, electronic equipment and related product
CN111143923A (en) * 2019-12-17 2020-05-12 万翼科技有限公司 Model processing method and related device
CN111177445A (en) * 2019-12-30 2020-05-19 湖南特能博世科技有限公司 Standard primitive determining method, primitive identifying method and device and electronic equipment
WO2020147395A1 (en) * 2019-01-17 2020-07-23 平安科技(深圳)有限公司 Emotion-based text classification method and device, and computer apparatus
CN112000860A (en) * 2020-07-27 2020-11-27 宁夏宁电电力设计有限公司 Construction standard visualization application method and device, equipment and storage medium
CN112100422A (en) * 2020-09-24 2020-12-18 武汉百家云科技有限公司 Engineering drawing processing method, device, equipment and storage medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020129001A1 (en) * 2000-12-12 2002-09-12 Levkoff Jonathan S. Method and system for assimilation, integration and deployment of architectural, engineering and construction information technology
WO2014176456A1 (en) * 2013-04-25 2014-10-30 Brigham Young University Method and apparatus for concurrent multi-user toolpath creation
CN105302931A (en) * 2014-06-30 2016-02-03 上海神机软件有限公司 Recognition system and method for construction engineering drawing, and template arrangement system and method
CN105976077A (en) * 2016-03-04 2016-09-28 国家电网公司 Power transmission and transformation project cost dynamic control target calculating system and calculating method
US20180157769A1 (en) * 2016-03-21 2018-06-07 Brigham Young University Multi-reference interface inheritance for concurrent cad interoperability applications
EP3575966A1 (en) * 2018-05-28 2019-12-04 Siemens Aktiengesellschaft Method and system for handling engineering data in a multi- engineering system environment
CN110543946A (en) * 2018-05-29 2019-12-06 百度在线网络技术(北京)有限公司 method and apparatus for training a model
CN110688445A (en) * 2018-06-19 2020-01-14 中国石化工程建设有限公司 Digital archive construction method
CN109670267A (en) * 2018-12-29 2019-04-23 北京航天数据股份有限公司 A kind of data processing method and device
WO2020147395A1 (en) * 2019-01-17 2020-07-23 平安科技(深圳)有限公司 Emotion-based text classification method and device, and computer apparatus
CN110765891A (en) * 2019-09-30 2020-02-07 万翼科技有限公司 Engineering drawing identification method, electronic equipment and related product
CN111143923A (en) * 2019-12-17 2020-05-12 万翼科技有限公司 Model processing method and related device
CN111177445A (en) * 2019-12-30 2020-05-19 湖南特能博世科技有限公司 Standard primitive determining method, primitive identifying method and device and electronic equipment
CN112000860A (en) * 2020-07-27 2020-11-27 宁夏宁电电力设计有限公司 Construction standard visualization application method and device, equipment and storage medium
CN112100422A (en) * 2020-09-24 2020-12-18 武汉百家云科技有限公司 Engineering drawing processing method, device, equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ABHILASH MANE ET AL.: "Investigating Application of Machine Learning in Identification of Polygon Shapes for Recognition of Mechanical Engineering Drawings", 《2019 INTERNATIONAL CONFERENCE ON NASCENT TECHNOLOGIES IN ENGINEERING (ICNTE)》, pages 1 - 6 *
LIANGRUI PENG: "Multilingual document recognition research and its application in China", SECOND INTERNATIONAL CONFERENCE ON DOCUMENT IMAGE ANALYSIS FOR LIBRARIES (DIAL\'06), 8 May 2006 (2006-05-08), pages 1 - 7 *
路通等: "建筑工程图识别与理解——模型与算法", 《计算机研究与发展》, vol. 42, no. 01, pages 144 - 152 *
陈晓勿: "工程设计档案数字化管理探析", 《兰台内外》, no. 15, 31 May 2020 (2020-05-31), pages 75 - 76 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553454A (en) * 2021-07-21 2021-10-26 广联达科技股份有限公司 Primitive data processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN112906717B (en) 2024-05-28

Similar Documents

Publication Publication Date Title
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN107833213B (en) Weak supervision object detection method based on false-true value self-adaptive method
CN110610166B (en) Text region detection model training method and device, electronic equipment and storage medium
US9098741B1 (en) Discriminitive learning for object detection
CN111400607B (en) Search content output method and device, computer equipment and readable storage medium
CN108446741B (en) Method, system and storage medium for evaluating importance of machine learning hyper-parameter
US9711117B2 (en) Method and apparatus for recognising music symbols
CN112235327A (en) Abnormal log detection method, device, equipment and computer readable storage medium
CN107730553B (en) Weak supervision object detection method based on false-true value search method
CN112163424A (en) Data labeling method, device, equipment and medium
CN110737805B (en) Method and device for processing graph model data and terminal equipment
CN111914099A (en) Intelligent question-answering method, system, device and medium for traffic optimization strategy
CN116881430B (en) Industrial chain identification method and device, electronic equipment and readable storage medium
CN110580507B (en) City texture classification and identification method
CN115797962A (en) Wall column identification method and device based on assembly type building AI design
CN111476165A (en) Method for detecting fingerprint characteristics of title seal in electronic document based on deep learning
JP2007066019A (en) Method and apparatus for retrieving image
Boillet et al. Confidence estimation for object detection in document images
CN112906717B (en) Method and system for identifying engineering file content
CN109885680B (en) Short text classification preprocessing method, system and device based on semantic extension
CN109558883B (en) Blade feature extraction method and device
Mercioni et al. A study on Hierarchical Clustering and the Distance metrics for Identifying Architectural Styles
CN116226526A (en) Intellectual property intelligent retrieval platform and method
Mehri et al. A pixel labeling approach for historical digitized books
JP2016014990A (en) Moving image search method, moving image search device, and program thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant