CN115329903B - Spatial data integration method and system applied to digital twin city - Google Patents

Spatial data integration method and system applied to digital twin city Download PDF

Info

Publication number
CN115329903B
CN115329903B CN202211249273.0A CN202211249273A CN115329903B CN 115329903 B CN115329903 B CN 115329903B CN 202211249273 A CN202211249273 A CN 202211249273A CN 115329903 B CN115329903 B CN 115329903B
Authority
CN
China
Prior art keywords
data
library
value
format
topic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211249273.0A
Other languages
Chinese (zh)
Other versions
CN115329903A (en
Inventor
周春煦
张建平
陈梨春
姜显贵
李丹
谢云飞
施峰
吉顺莉
赵苏政
倪飞
施小飞
戴雨
王兆能
曹野
郑玉能
仲文正
陆丁炜
鲍志鹏
成海峰
闵钰强
丁杨
曹毅
沈媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Meifang Times Technology Co ltd
Original Assignee
Fujian Meifang Times Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Meifang Times Technology Co ltd filed Critical Fujian Meifang Times Technology Co ltd
Priority to CN202211249273.0A priority Critical patent/CN115329903B/en
Publication of CN115329903A publication Critical patent/CN115329903A/en
Application granted granted Critical
Publication of CN115329903B publication Critical patent/CN115329903B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention discloses a spatial data integration method and a system applied to a digital twin city, which relate to the technical field of digital twin cities and comprise the following steps: receiving data of all data sources, uploading the data sources to a server, and searching and analyzing the multi-source data to determine the quantity and format of the data in the library; identifying the subject of the data in the library, and marking the classified data by taking the subject as a label; quantifying the numerical values of a plurality of libraries to obtain standard values of the libraries, and quantitatively evaluating the libraries with the standard values; obtaining an evaluation value of the library according to the conversion difficulty between the data formats; obtaining a library evaluation value PG, and sorting all libraries according to the library evaluation value PG; sequentially acquiring data from the library according to the sorting of the library; the format difficulty value GsN and the standard value BZ of the library are obtained, the library evaluation value PG is obtained, the libraries are ordered according to the library evaluation value PG, the data acquisition strategy is determined, and the comprehensive degree is high when the spatial data acquisition is performed.

Description

Spatial data integration method and system applied to digital twin city
Technical Field
The invention relates to the technical field of digital twin cities, in particular to a spatial data integration method and a system applied to the digital twin cities.
Background
The digital twin is to fully use data such as physical models, sensors, operation histories and the like, integrate the simulation whole process of multidisciplinary, multidisciplinary quantity, multiscale and multiscale probability, complete mapping in a virtual space and reflect the whole life cycle process of corresponding entity equipment.
The digital twin city is based on a building information model and a city three-dimensional geographic information system, all elements of people, things, events, water, electricity, gas and the like of a physical city are digitalized by utilizing the technology of the Internet of things, and then a virtual city which is completely corresponding to the physical city is constructed on a network space, so that the situation that the physical city in physical dimension and the digital city in information dimension coexist and are virtual and real is fused is formed.
When a digital twin city is established, a large amount of space data needs to be acquired, but the space data is large in data size and is in a multi-source and multi-format state, so that the number of libraries of data sources is increased, the data acquisition amount is large, the acquisition difficulty is high, effective data is difficult to acquire when the space is integrated, and the efficiency of space data integration is reduced.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a space data integration method and a system applied to a digital twin city, which are used for searching and analyzing multi-source data to determine the quantity and the format of the data in a library by receiving the data of all data sources; marking the classified data by taking the theme as a label; obtaining standard values of all libraries, and obtaining evaluation values of the libraries according to the conversion difficulty between data formats; sorting all the libraries according to the library evaluation values; sequentially acquiring data; the data acquisition strategy is determined by sequencing a plurality of libraries according to the library evaluation values PG, so that the comprehensive degree is high when spatial data acquisition is performed, and the problems in the background technology are solved.
In order to achieve the above purpose, the invention is realized by the following technical scheme: the spatial data integration method applied to the digital twin city comprises the following steps:
receiving data of all data sources, uploading the data sources to a server, retrieving and analyzing multi-source data, and determining the quantity and format of the data in a library; identifying the subject of the data in the library, and marking the classified data by taking the subject as a label;
quantifying the numerical values of a plurality of libraries to obtain standard values of the libraries, and quantitatively evaluating the libraries with the standard values; according to the conversion difficulty between the data formats, correlating with the standard value of the library to obtain the evaluation value of the library;
obtaining a library evaluation value PG, and sorting all libraries according to the library evaluation value PG; when data acquisition is carried out, the data are sequentially acquired from the library according to the sorting of the library.
Further, the format in the multi-source data is identified, and the total data amount in each library and the format information of the data in the library are determined; classifying the data in the library according to the format information of the data to form different data categories;
sorting the data types according to the data volume under each format; after the total amount of the data of each format in the library is obtained, determining the format with the largest occurrence number in each category, and converting the other formats into the format with the largest occurrence number, so that the formats in the library are unified;
and through a training filter, invalid data in the library is filtered by utilizing the trained data filter, so that noise generated by the invalid data or blank data in the library is reduced, and interference to normal data is reduced.
Further, selecting a plurality of data from one of a plurality of libraries, and respectively taking the data as a theme extraction training set and a theme model test set; training the LDA topic model by using a training set to generate a trained LDA topic model, and testing by using a topic testing set to determine that the trained LDA topic model is error-free;
performing topic extraction on a plurality of data in a library by using the LDA model obtained through training to obtain a plurality of data topics;
judging the similarity among different topics by using a similarity model, and classifying a plurality of acquired topics according to the similarity;
generating a theme tag according to the theme name, adding the theme tag into a corresponding data classification category, and characterizing the category by the theme tag.
Further, the number of the topic labels in all libraries is obtained, and the activity of the topic labels is calculated; acquiring the number of the topic labels in all libraries, and calculating the contribution degree of the topic labels; calculating the similarity between the topic labels in the library and the topics of the data acquisition and analysis strategy, and obtaining similarity data;
acquiring contribution GxL, similarity Xs and total activity ZhY, carrying out normalization processing, and then, associating and summarizing to form a library standard value which is recorded as a library standard value BZ;
the calculation mode accords with the following formula:
Figure SMS_1
wherein ,
Figure SMS_2
,/>
Figure SMS_3
,/>
Figure SMS_4
and->
Figure SMS_5
,/>
Figure SMS_6
The specific value of the weight can be adjusted and corrected by the user according to the actual experience by changing +.>
Figure SMS_7
To correct the standard value BZ of the library.
Further, to
Figure SMS_8
Representing the most recent liveness of the subject tags in the library; to->
Figure SMS_9
Representing the total number of occurrences of data topics in the library in the data class represented by the topic label; to->
Figure SMS_10
Is the total data volume in the library; the expression of the expression is that,
Figure SMS_11
Figure SMS_12
the value of the correction coefficient is set by a user according to the requirement, so that the correction of the liveness of the theme labels of the library is facilitated;
the number of topic tags in each data category and the corresponding liveness are determined and summarized to form a total liveness, which is denoted as total liveness ZhY.
Further, under the topic label of the data category, all topic numbers are marked as LtS, and the percentage of the data under the topic label in the total data in the database is marked as Zb; the contribution degree of the theme label is GxL;
the calculation method of the contribution GxL conforms to the following expression:
Figure SMS_13
;/>
wherein ,
Figure SMS_14
and the value of the correction coefficient is set by a user according to the requirement to correct the contribution degree of the theme label of the library.
Further, acquiring a strategy for data acquisition and analysis, and extracting a strategy theme through a trained LDA theme model to acquire the strategy theme; obtaining a topic label from a library, judging the similarity between the topic label in the library and a strategy topic by using a similarity model, and quantifying the value of the similarity;
obtaining quantized similarity values of all the theme labels, and sorting to form sorting information; and obtaining the similarity of all the topic labels in the library, summarizing, obtaining the maximum value of the similarity of the topic labels of the library, and determining the maximum value as the similarity Xs.
Further, the data format in each library and the corresponding data quantity are obtained, and format quantity data are determined; grading the conversion difficulty according to the conversion difficulty among the formats, acquiring the grading average value during conversion among different formats, and marking the formats by the grading average value;
taking the product of the score average value and the data quantity in the library as a format difficulty value GsN of the library, and sorting a plurality of libraries according to the format difficulty value GsN; and converting the data formats of the other libraries into the format of the library with the lowest format difficulty value GsN, and unifying the formats of the libraries.
Further, the format difficulty value GsN and the standard value BZ of the library are obtained, and the format difficulty value GsN and the standard value BZ of the library are associated to determine the library evaluation value PG:
the calculation method of the standard value of the library is as follows:
Figure SMS_15
wherein ,
Figure SMS_18
for correction coefficients, the library evaluation value PG is corrected,/->
Figure SMS_20
Is->
Figure SMS_22
Is a weight coefficient>
Figure SMS_17
Figure SMS_19
And->
Figure SMS_21
,/>
Figure SMS_23
Is->
Figure SMS_16
The value of (2) is set by the user.
A spatial data integration system for digital twinning cities, comprising:
the data analysis module is used for receiving the data of all the data sources, searching and analyzing the multi-source data and determining the quantity and the format of the data in the library;
the theme marking module is used for identifying the theme of the data in the library and marking the classified data by taking the theme as a label;
the data quantization module quantizes the numerical values of a plurality of libraries to obtain the standard value of each library, and the libraries are quantized and evaluated by the standard value;
the data association module is used for associating the data format conversion difficulty with the standard value of the library to obtain the evaluation value of the library;
the sorting module is used for obtaining a library evaluation value PG and sorting all libraries according to the library evaluation value PG; when data acquisition is carried out, the data are sequentially acquired from the library according to the sorting of the library.
The invention provides a spatial data integration method and a spatial data integration system applied to a digital twin city. The beneficial effects are as follows:
the data format can be converted through the distribution condition of the data format of a single library, the workload of format conversion can be reduced, and the strategy of format conversion is formulated through judging the difficulty of format conversion, so that the efficiency and the speed of format conversion can be improved when the data of each library is converted, and the difficulty of multi-format multi-source data identification is reduced when the data is acquired and identified.
By identifying and sequencing the topics of the library, data with high similarity can be preferentially acquired based on the similarity when multi-source data are acquired, so that the requirement of data acquisition can be rapidly met when the multi-source data are acquired, and the efficiency of data acquisition is improved.
The data acquisition strategy is determined by acquiring the format difficulty value GsN and the standard value BZ of the library and acquiring the library evaluation value PG, and sorting a plurality of libraries according to the library evaluation value PG, so that the comprehensive degree is high, and the data acquisition efficiency is improved when the spatial data acquisition is performed.
Drawings
FIG. 1 is a schematic diagram of the structure of the evaluation values of the library of the present invention;
FIG. 2 is a schematic flow chart of the spatial data integration method of the present invention;
FIG. 3 is a schematic diagram of a spatial data integration system according to the present invention.
In the figure:
10. a data analysis module; 20. a theme marking module; 30. a data quantization module; 40. a data association module; 50. and a sequencing module.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Referring to fig. 1-3, the present invention provides a spatial data integration method applied to a digital twin city, comprising the following steps:
step 1, receiving data of all data sources, uploading the data sources to a server, searching and analyzing multi-source data, and determining the quantity and format of the data in a library; the method specifically comprises the following steps:
step 101, identifying formats in multi-source data, and determining total data amount in each library and format information of the data in the library;
step 102, classifying the data in the library according to the format information of the data to form different data categories;
step 103, sorting the data types according to the data volume under each format;
104, after the total amount of the data of each format in the library is obtained, determining the format with the largest occurrence number in each category, and converting the other formats into the format with the largest occurrence number, so as to unify the formats in the library;
step 105, through training a filter, filtering invalid data in the database by utilizing the trained data filter, thereby reducing noise generated by the invalid data or blank data in the database and reducing interference to normal data.
When the method is used, the format with the largest occurrence number is determined in the library, the format of other data in the library is converted into the format, the thought of format conversion is determined by utilizing the steps 101 to 104, the workload of format conversion is reduced, and meanwhile, after the formats of all the data in the library are subjected to unified processing, all the data in the library are conveniently identified and analyzed.
Step 2, identifying the subject of the data in the library, and marking the classified data by taking the subject as a label; the method specifically comprises the following steps:
step 201, selecting a plurality of data from one of a plurality of libraries, and respectively taking the data as a theme extraction training set and a theme model test set;
step 202, training an LDA topic model by using a training set to generate a trained LDA topic model, and testing by using a topic testing set to determine that the trained LDA topic model is error-free;
step 203, performing topic extraction on a plurality of data in a library by using the LDA model obtained through training to obtain a plurality of data topics;
204, judging the similarity among different topics by using a similarity model, and classifying a plurality of acquired topics according to the similarity;
step 205, generating a theme label according to the theme name, adding the theme label into the corresponding data classification category, and characterizing the category by the theme label.
In the step 2, the same formatting process is performed on a plurality of format data in a single library, the topics are extracted, classification is performed according to the similarity among the topics, a plurality of topic classifications are obtained, the data classification in the library is completed, and finally the topics are used as classified labels to mark the data classification.
Step 3, quantifying the numerical values of a plurality of libraries to obtain the standard value of each library, and quantitatively evaluating the libraries by using the standard value; the method specifically comprises the following steps:
step T1, obtaining the number of the topic labels in all libraries, and calculating the activity of the topic labels; the method comprises the following steps:
the liveness of the theme tag is defined as: the frequency of occurrence of tags in each library in the data class represented by the subject tag;
to be used for
Figure SMS_24
Representing the most recent liveness of the subject tags in the library; to->
Figure SMS_25
Representing the total number of occurrences of data topics in the library in the data class represented by the topic label; to->
Figure SMS_26
Is the total data volume in the library; the expression of the expression is that,
Figure SMS_27
Figure SMS_28
the value of the correction coefficient is set by a user according to the requirement, so that the correction of the liveness of the theme labels of the library is facilitated;
determining the number of topic labels in each data category and the corresponding liveness, and summarizing to form total liveness, namely, total liveness ZhY;
in this step, the activity level of the topic label is quantified according to the frequency of occurrence of the topic label, so that the activity level of the topic label can be simply determined, and the number of topic labels in the library is quantified so as to facilitate further evaluation.
Step T2, obtaining the number of the topic labels in all libraries, and calculating the contribution degree of the topic labels; the method comprises the following steps:
the contribution degree of the theme label is defined as: a ratio of a sum of data amounts represented by topics under the topic label with the largest topic number in the library to a total data amount in the library, wherein the ratio is a ratio of the sum of the data amounts represented by the topics under the topic label to the topic number; wherein, under the topic label of the data category, all topic numbers are marked as LtS,
the percentage of the data under the subject label in the total data in the database is recorded as Zb; the contribution degree of the theme label is GxL;
the calculation method of the contribution GxL conforms to the following expression:
Figure SMS_29
wherein ,
Figure SMS_30
and the value of the correction coefficient for the contribution degree is set by a user according to the requirement, so that the contribution degree of the theme label of the library is conveniently corrected.
In this step, the attribute of the library is characterized according to the contribution degree of the topic label after the topic label in the library is determined by determining the contribution degree of the topic label, the contribution degree of the topic label shows the importance degree of the topic label occupied in the library, and if the contribution degree of a certain topic label is highest, the importance degree of the topic label in the library is highest, and the topic label can most characterize the library.
Step T3, calculating the similarity between the topic labels in the library and the topics of the data acquisition and analysis strategy, and obtaining similarity data; wherein, the step T3 includes the following:
step T031, acquiring a strategy for data acquisition and analysis, and extracting a strategy theme through a trained LDA theme model;
step T302, obtaining a theme label from a library, judging the similarity between the theme label in the library and a strategy theme by using a similarity model, and quantifying the value of the similarity;
obtaining quantized similarity values of all the theme labels, and sorting to form sorting information;
step T303, obtaining the similarity of all the topic labels in the library, summarizing, obtaining the maximum value of the similarity of the topic labels in the library, and determining the maximum value as the similarity Xs;
in this step, the value with the highest similarity to the strategy topic in the library is determined by judging the similarity between topics, and since the sequence of data acquisition is determined by the topics, the corresponding library can be characterized by determining the highest similarity to the topic label in the library.
Step T4, taking the problem label in the library as a target, and carrying out quantitative evaluation on the library to form a standard value; the method specifically comprises the following steps:
acquiring contribution GxL, similarity Xs and total activity ZhY obtained in the steps T1 to T3, carrying out normalization processing, and then, associating and summarizing to form a library standard value which is recorded as a library standard value BZ;
the calculation mode accords with the following formula:
Figure SMS_31
wherein ,
Figure SMS_32
,/>
Figure SMS_33
,/>
Figure SMS_34
and->
Figure SMS_35
,/>
Figure SMS_36
The specific value of the weight can be adjusted and corrected by the user according to the actual experience by changing +.>
Figure SMS_37
To correct the standard value BZ of the library.
In the step, the contribution GxL, the similarity Xs and the total activity ZhY of the theme labels in the library are acquired, summarized and correlated to form a standard value BZ of the library, and the library can be quantized on the basis of the theme labels through the standard value BZ of the library so as to acquire the quantized standard value BZ of the library, and the library is evaluated; also based on this, it is possible to sort several libraries according to their standard values BZ when needed, and to collect multi-source, multi-format data according to this sort when needed.
Meanwhile, the sorting also considers the similarity with the collection strategy, so that the data can be collected according to the sorting sequence when the data is collected, so that the data most suitable for the requirement can be quickly obtained, the difficulty of data collection and the data collection time are shortened, and the data source association degree is highest when the data is collected, so that the data collected from the libraries just before and the first libraries are collected, and the requirement can be possibly met.
When data acquisition is carried out, data acquisition can be carried out on a plurality of libraries serving as data sources at the same time, and data can be acquired sequentially along sorting, and if the data acquisition is carried out on a plurality of libraries at the same time, the similarity of the theme labels is already determined in each library, so that the speed and the efficiency in data acquisition can be still realized.
Step 4, according to the conversion difficulty between the data formats, correlating with the standard value of the library to obtain the evaluation value of the library;
step 401, determining format data according to the acquired data formats in each library and the corresponding data quantity;
step 402, scoring the conversion difficulty according to the conversion difficulty among the formats, obtaining the average value of the scores during conversion among different formats, and marking the formats by the average value of the scores;
step 403, taking the product of the score average value and the data quantity in the library as a format difficulty value GsN of the library, and sorting a plurality of libraries according to the format difficulty value GsN;
step 404, converting the data formats of the other libraries into the format of the library with the lowest format difficulty value GsN, and unifying the formats of the libraries.
When the method is used, in the step 4, the conversion difficulty among the data formats is judged, the data formats are scored, and the strategy of converting the data formats in a plurality of libraries can be determined according to the magnitude of the scoring value, so that when the plurality of data formats are subjected to unified processing, the processing difficulty is reduced, the processing efficiency is improved, and finally, all the data formats in the libraries are converted into the format with the highest universality, thereby being convenient for data acquisition and also being convenient for later data and identification.
The method solves the problem of difficult recognition of the multi-source multi-format data by carrying out unified processing on the formats of the data, and improves the efficiency of data recognition analysis.
Step 405, obtaining a format difficulty value GsN and a standard value BZ of the library, and associating the two values to determine a library evaluation value PG: the calculation method of the standard value of the library is as follows:
Figure SMS_38
wherein ,
Figure SMS_39
for correction coefficients, the library evaluation value PG is corrected,/->
Figure SMS_42
Is->
Figure SMS_44
Is a weight coefficient>
Figure SMS_41
Figure SMS_43
And->
Figure SMS_45
,/>
Figure SMS_46
Is->
Figure SMS_40
The value of (2) is set by the user.
When in use, the format difficulty value GsN and the standard value BZ of the library are connected together in step 405, the library is evaluated and quantized, and the data subject and format in the library are contained, so that the reference factors are comprehensive if the strategy for data identification and collection is formulated based on the library evaluation value PG.
Step 5, obtaining a library evaluation value PG, and sequencing all libraries according to the library evaluation value PG; when data acquisition is carried out, the data are sequentially acquired from the library according to the sorting of the library.
Example 2
Referring to fig. 1-3, the present invention provides a spatial data integration system applied to a digital twin city, comprising:
the data analysis module 10 receives the data of all the data sources, retrieves and analyzes the multi-source data to determine the quantity and the format of the data in the library;
the theme marking module 20 is used for identifying the theme of the data in the library and marking the classified data by taking the theme as a label;
the data quantization module 30 quantizes the values of the plurality of libraries to obtain standard values of the libraries, and performs quantization evaluation on the libraries by using the standard values;
the data association module 40 is associated with the standard value of the library according to the conversion difficulty between the data formats, and acquires the evaluation value of the library;
the sorting module 50 acquires the library evaluation values PG and sorts all the libraries according to the library evaluation values PG; when data acquisition is carried out, the data are sequentially acquired from the library according to the sorting of the library.
Combining step 1 to step 5, in the present application:
the data format can be converted through the distribution condition of the data format of a single library, the workload of format conversion can be reduced, and the strategy of format conversion is formulated through judging the difficulty of format conversion, so that the efficiency and the speed of format conversion can be improved when the data of each library is converted, and the difficulty of multi-format multi-source data identification is reduced when the data is acquired and identified.
By identifying and sequencing the topics of the library, data with high similarity can be preferentially acquired based on the similarity when multi-source data are acquired, so that the requirement of data acquisition can be rapidly met when the multi-source data are acquired, and the efficiency of data acquisition is improved.
The data acquisition strategy is determined by acquiring the format difficulty value GsN and the standard value BZ of the library and acquiring the library evaluation value PG, and sorting a plurality of libraries according to the library evaluation value PG, so that the comprehensive degree is high, and the data acquisition efficiency is improved when the spatial data acquisition is performed.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with the embodiments of the present application are all or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is merely a channel underwater topography change analysis system and method logic function division, and other divisions may be implemented in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Finally: the foregoing description of the preferred embodiments of the invention is not intended to limit the invention, but to enable any modification, equivalent or improvement to be made without departing from the spirit and principles of the invention.

Claims (8)

1. The spatial data integration method applied to the digital twin city is characterized by comprising the following steps of: comprising the following steps:
receiving data of all data sources, uploading the data sources to a server, retrieving and analyzing multi-source data, and determining the quantity and format of the data in a library;
identifying the subject of the data in the library, and marking the classified data by taking the subject as a label;
the method comprises the steps of obtaining the number of topic labels in all libraries and calculating the activity of the topic labels; acquiring the number of the topic labels in all libraries, and calculating the contribution degree of the topic labels; calculating the similarity between the topic labels in the library and the topics of the data acquisition and analysis strategy, and obtaining similarity data;
acquiring contribution GxL, similarity Xs and total activity ZhY, carrying out normalization processing, and then, associating and summarizing to form a library standard value which is recorded as a library standard value BZ;
the calculation mode accords with the following formula:
Figure QLYQS_1
wherein ,
Figure QLYQS_2
,/>
Figure QLYQS_3
,/>
Figure QLYQS_4
and->
Figure QLYQS_5
,/>
Figure QLYQS_6
Is weight, its concrete value can be obtained by user according to actual practiceEmpirically adjusted and corrected by changing +.>
Figure QLYQS_7
To correct the standard value BZ of the library;
quantifying the numerical values of a plurality of libraries to obtain standard values of the libraries, and quantitatively evaluating the libraries with the standard values;
according to the conversion difficulty between the data formats, correlating with the standard value of the library to obtain the evaluation value of the library;
obtaining a library evaluation value PG, and sorting all libraries according to the library evaluation value PG; when data acquisition is carried out, the data are sequentially acquired from the library according to the sorting of the library.
2. The spatial data integration method applied to a digital twin city according to claim 1, wherein:
identifying formats in the multi-source data, and determining the total data amount in each library and format information of the data in the library; classifying the data in the library according to the format information of the data to form different data categories;
sorting the data types according to the data volume under each format; after the total amount of the data of each format in the library is obtained, determining the format with the largest occurrence number in each category, and converting the other formats into the format with the largest occurrence number, so that the formats in the library are unified;
and through a training filter, invalid data in the library is filtered by utilizing the trained data filter, so that noise generated by the invalid data or blank data in the library is reduced, and interference to normal data is reduced.
3. The spatial data integration method applied to a digital twin city according to claim 1, wherein:
selecting a plurality of data from one of a plurality of libraries to be respectively used as a theme extraction training set and a theme model test set; training the LDA topic model by using a training set to generate a trained LDA topic model, and testing by using a topic testing set to determine that the trained LDA topic model is error-free;
performing topic extraction on a plurality of data in a library by using the LDA model obtained through training to obtain a plurality of data topics;
judging the similarity among different topics by using a similarity model, and classifying a plurality of acquired topics according to the similarity;
generating a theme tag according to the theme name, adding the theme tag into a corresponding data classification category, and characterizing the category by the theme tag.
4. The spatial data integration method applied to a digital twin city according to claim 1, wherein:
to be used for
Figure QLYQS_8
Representing the most recent liveness of the subject tags in the library; to->
Figure QLYQS_9
Representing the total number of occurrences of data topics in the library in the data class represented by the topic label; to->
Figure QLYQS_10
Is the total data volume in the library; the expression of the expression is that,
Figure QLYQS_11
Figure QLYQS_12
the value of the correction coefficient is set by a user according to the requirement, so that the correction of the liveness of the theme labels of the library is facilitated;
the number of topic tags in each data category and the corresponding liveness are determined and summarized to form a total liveness, which is denoted as total liveness ZhY.
5. The spatial data integration method applied to a digital twin city according to claim 1, wherein:
under the topic label of the data category, the number of all topics is LtS, and the percentage of the data under the topic label in the total data in the database is Zb; the contribution degree of the theme label is GxL;
the calculation method of the contribution GxL conforms to the following expression:
Figure QLYQS_13
wherein ,
Figure QLYQS_14
and the value of the correction coefficient is set by a user according to the requirement to correct the contribution degree of the theme label of the library.
6. The spatial data integration method applied to a digital twin city according to claim 1, wherein:
acquiring a strategy for data acquisition and analysis, and extracting a strategy theme through a trained LDA theme model; obtaining a topic label from a library, judging the similarity between the topic label in the library and a strategy topic by using a similarity model, and quantifying the value of the similarity;
obtaining quantized similarity values of all the theme labels, and sorting to form sorting information; and obtaining the similarity of all the topic labels in the library, summarizing, obtaining the maximum value of the similarity of the topic labels of the library, and determining the maximum value as the similarity Xs.
7. The spatial data integration method applied to a digital twin city according to claim 1, wherein:
the data format in each library and the corresponding data quantity are obtained, and format quantity data are determined; grading the conversion difficulty according to the conversion difficulty among the formats, acquiring the grading average value during conversion among different formats, and marking the formats by the grading average value;
taking the product of the score average value and the data quantity in the library as a format difficulty value GsN of the library, and sorting a plurality of libraries according to the format difficulty value GsN; and converting the data formats of the other libraries into the format of the library with the lowest format difficulty value GsN, and unifying the formats of the libraries.
8. The spatial data integration method applied to a digital twin city according to claim 1, wherein:
the format difficulty value GsN and the standard value BZ of the library are acquired, and the format difficulty value GsN and the standard value BZ of the library are associated to determine the library evaluation value PG:
the calculation method of the standard value of the library is as follows:
Figure QLYQS_15
wherein ,
Figure QLYQS_16
for correction coefficients, the library evaluation value PG is corrected,/->
Figure QLYQS_19
Is->
Figure QLYQS_21
Is a weight coefficient>
Figure QLYQS_17
Figure QLYQS_20
And->
Figure QLYQS_22
,/>
Figure QLYQS_23
Is->
Figure QLYQS_18
The value of (2) is set by the user. />
CN202211249273.0A 2022-10-12 2022-10-12 Spatial data integration method and system applied to digital twin city Active CN115329903B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211249273.0A CN115329903B (en) 2022-10-12 2022-10-12 Spatial data integration method and system applied to digital twin city

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211249273.0A CN115329903B (en) 2022-10-12 2022-10-12 Spatial data integration method and system applied to digital twin city

Publications (2)

Publication Number Publication Date
CN115329903A CN115329903A (en) 2022-11-11
CN115329903B true CN115329903B (en) 2023-05-30

Family

ID=83914865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211249273.0A Active CN115329903B (en) 2022-10-12 2022-10-12 Spatial data integration method and system applied to digital twin city

Country Status (1)

Country Link
CN (1) CN115329903B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250513A (en) * 2016-08-02 2016-12-21 西南石油大学 A kind of event personalization sorting technique based on event modeling and system
CN108090048A (en) * 2018-01-12 2018-05-29 安徽大学 A kind of colleges and universities' evaluation system based on multivariate data analysis
CN111339423A (en) * 2020-03-04 2020-06-26 携程计算机技术(上海)有限公司 User-based travel city pushing method, system, equipment and storage medium
CN111723469A (en) * 2020-05-25 2020-09-29 贵州华泰智远大数据服务有限公司 Digital portrait modeling method and system based on twin data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818484A (en) * 2021-01-29 2021-05-18 山东大学 Physical entity digital twin comprehensive implementation capability assessment method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250513A (en) * 2016-08-02 2016-12-21 西南石油大学 A kind of event personalization sorting technique based on event modeling and system
CN108090048A (en) * 2018-01-12 2018-05-29 安徽大学 A kind of colleges and universities' evaluation system based on multivariate data analysis
CN111339423A (en) * 2020-03-04 2020-06-26 携程计算机技术(上海)有限公司 User-based travel city pushing method, system, equipment and storage medium
CN111723469A (en) * 2020-05-25 2020-09-29 贵州华泰智远大数据服务有限公司 Digital portrait modeling method and system based on twin data

Also Published As

Publication number Publication date
CN115329903A (en) 2022-11-11

Similar Documents

Publication Publication Date Title
CN110059311B (en) Judicial text data-oriented keyword extraction method and system
CN110047512B (en) Environmental sound classification method, system and related device
CN108392213B (en) Psychological analysis method and device based on painting psychology
CN112700325A (en) Method for predicting online credit return customers based on Stacking ensemble learning
CN113688635B (en) Class case recommendation method based on semantic similarity
CN110781333A (en) Method for processing unstructured monitoring data of cable-stayed bridge based on machine learning
CN111899740A (en) Voice recognition system crowdsourcing test case generation method based on test requirements
CN110737821B (en) Similar event query method, device, storage medium and terminal equipment
CN114742211B (en) Convolutional neural network deployment and optimization method facing microcontroller
CN111709775A (en) House property price evaluation method and device, electronic equipment and storage medium
CN111524570B (en) Ultrasonic follow-up patient screening method based on machine learning
CN111143689A (en) Method for constructing recommendation engine according to user requirements and user portrait
CN115329903B (en) Spatial data integration method and system applied to digital twin city
CN112052154A (en) Test case processing method and device
CN113936795A (en) Quantitative analysis method for risk factors of senile lung cancer onset
CN113360643A (en) Electronic medical record data quality evaluation method based on short text classification
CN117131345A (en) Multi-source data parameter evaluation method based on data deep learning calculation
CN112286799A (en) Software defect positioning method combining sentence embedding and particle swarm optimization algorithm
CN103380457A (en) Audio processing device, method, program, and integrated circuit
CN114443930A (en) News public opinion intelligent monitoring and analyzing method, system and computer storage medium
CN115456693A (en) Automatic evaluation method for automobile exterior design driven by big data
CN114742599A (en) User value layering method and device based on self-training, storage medium and equipment
CN113869355A (en) XGboost-based personnel risk assessment method
CN114118305A (en) Sample screening method, device, equipment and computer medium
CN113962216A (en) Text processing method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230509

Address after: Office Building 4-21, No. 14 Longteng South Road, Donghua Community, Dongxiao Town, Xinluo District, Longyan City, Fujian Province, 364000

Applicant after: Fujian Meifang Times Technology Co.,Ltd.

Address before: 226000 No.185 Tongsheng Avenue, Nantong Economic and Technological Development Zone, Jiangsu Province

Applicant before: JIANGSU VOCATIONAL & TECHNICAL SHIPPING College

GR01 Patent grant
GR01 Patent grant