WO2023137700A1 - 一种面向城市决策和评估的多尺度信息目录构建系统 - Google Patents

一种面向城市决策和评估的多尺度信息目录构建系统 Download PDF

Info

Publication number
WO2023137700A1
WO2023137700A1 PCT/CN2022/073175 CN2022073175W WO2023137700A1 WO 2023137700 A1 WO2023137700 A1 WO 2023137700A1 CN 2022073175 W CN2022073175 W CN 2022073175W WO 2023137700 A1 WO2023137700 A1 WO 2023137700A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
information
module
urban
search
Prior art date
Application number
PCT/CN2022/073175
Other languages
English (en)
French (fr)
Inventor
李攀
周婵
孙立群
张涌
宁立
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Priority to PCT/CN2022/073175 priority Critical patent/WO2023137700A1/zh
Publication of WO2023137700A1 publication Critical patent/WO2023137700A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Definitions

  • This application relates to the field of data analysis, and specifically relates to a multi-scale information catalog construction system for urban decision-making and evaluation.
  • the current solution only summarizes the data of various departments, and does not take into account the possible differences and timeliness of the data collected in different departments and in different periods;
  • the existing data preprocessing work is to simply integrate the collected data, and implement missing values and outliers with the method of nearest neighbor complement, which cannot achieve organic mixing and intelligent filling of multi-field data;
  • existing technologies and methods can only match keywords and keywords, and cannot achieve higher-level semantic understanding searches. Terminology in the professional field cannot be generally understood, resulting in multiple words with one meaning, which increases the difficulty of searching; in terms of data access and user feedback, the existing technology cannot iteratively modify search suggestions according to the user's historical access information.
  • the embodiment of the present application provides a multi-scale information catalog construction system oriented to urban decision-making and evaluation, so as to improve the query convenience and index accuracy of data information.
  • a multi-scale information catalog construction system oriented to urban decision-making and evaluation is provided, including the following steps:
  • the data collection module is used to record the collected raw data according to time and field during the process of collecting raw data
  • the data preprocessing module is used for intelligent processing of outliers and missing values of the original data, and cross-fusion of multi-scale and multi-field data to fill in missing values and reduce redundancy;
  • the search tool module is used to refine the data in high dimensions, expand the correlation behind the data and correlate it, and cluster the information with high correlation;
  • the data access and feedback module is used to automatically learn the undiscovered data information correlation through artificial intelligence according to the access information and feedback information of different user histories, and feed back the newly learned data information correlation to the data collection module;
  • the data collection module, the data preprocessing module, the search tool module and the data access and feedback module are connected sequentially, and the data access and feedback module is connected with the data collection module.
  • the data collection module includes:
  • the time recording sub-module is used to record the collected raw data submitted by a single department as a single time series during the raw data collection process;
  • the multi-information collaboration sub-module is used for simple labeling of various data in the same field submitted by a single department.
  • the technical solution adopted in the embodiment of the present application also includes: during the raw data collection process, recording the collected raw data submitted by a single department as a single time series, specifically:
  • the multi-information collaboration sub-module obtains simple labels through scanning data and fills them in the timetable.
  • the data preprocessing module includes:
  • the outlier correction sub-module is used to automatically detect the outliers and missing values of the original data, mark this type of data and use the method close to the mean value for temporary filling;
  • the data fusion sub-module is used for cross-fusion of data with similar multi-scale and multi-field labels, filling missing values and reducing redundancy.
  • the technical solution adopted in the embodiment of the present application also includes: marking this type of data and temporarily filling it with a method close to the mean value, specifically:
  • the technical solution adopted in the embodiment of the present application also includes: performing high-dimensional refinement on the data, expanding the correlation behind the data and correlating them, and clustering information with a high degree of correlation as follows:
  • the data access and feedback module includes:
  • the historical access information module is used to save the historical information of the user's access, and update the current search result in combination with the historical information.
  • the technical solution adopted in the embodiment of the present application also includes: designing a historical access information module by using a long-term short-term memory network;
  • the long short-term memory network specifies precise search suggestions for individual users, and improves the search algorithm of the long-term short-term memory network by recording the feedback of each search.
  • the technical solution adopted in the embodiment of the present application further includes: the long-short-term memory network judges the relevance of the search suggestion according to the length of the search time.
  • the technical solution adopted in the embodiment of the present application also includes: the data collection module, the data preprocessing module, the search tool module and the data access and feedback module are connected through the main board.
  • the beneficial effect of the embodiment of the present application lies in: the multi-scale information catalog construction system oriented to urban decision-making and evaluation in the embodiment of the present application includes: a data collection module, which is used to record the collected raw data according to time and field in the process of collecting raw data; a data preprocessing module, which is used to intelligently process the abnormal values and missing values of the raw data, and perform cross fusion of multi-scale and multi-field data, fill in missing values and reduce redundancy; a search tool module is used to perform high-dimensional refinement on the data and expand the correlation behind the data And correlating them, clustering information with a high degree of correlation; the data access and feedback module is used to automatically learn the undiscovered data information correlation through artificial intelligence according to the access information and feedback information of different user history, and feed back the newly learned data information correlation to the data collection module.
  • a data collection module which is used to record the collected raw data according to time and field in the process of collecting raw data
  • a data preprocessing module which is used to intelligently process the abnormal values and
  • This application automatically processes and completes the collected original information, then integrates multi-scale and multi-field data in a unified manner, and finally feeds back the correlation of the obtained data information to the data collection module, so that the entire system has the ability of self-learning, and the output results become more and more accurate with the training of large amounts of data and long-term optimization, thereby improving the convenience of data information query and the accuracy of indexing.
  • Figure 1 is a block diagram of the application's multi-scale information catalog construction system for urban decision-making and evaluation
  • Figure 2 is a functional schematic diagram of the application's multi-scale information catalog construction system for urban decision-making and evaluation
  • Figure 3 is a structural diagram of the application information data schedule
  • Fig. 4 is the schematic diagram of the classification support vector machine of the present application.
  • Figure 5 is a schematic diagram of the structure of the knowledge map of the application.
  • FIG. 6 is a schematic diagram of the long-short-term memory network structure of the present application.
  • FIG. 7 is a schematic diagram of feedback optimization of the search mechanism in this application.
  • a multi-scale information catalog construction system oriented to urban decision-making and evaluation including:
  • the data collection module 100 is used to record the collected raw data according to time and field during the process of collecting raw data;
  • the data preprocessing module 200 is used to intelligently process the outliers and missing values of the original data, and perform cross fusion on multi-scale and multi-field data to fill in missing values and reduce redundancy;
  • the search tool module 300 is used for high-dimensional refinement of data, expanding the correlation behind the data and correlating them, and clustering information with high correlation;
  • the data access and feedback module 400 is used to automatically learn the undiscovered data information correlation through artificial intelligence according to the access information and feedback information of different user histories, and feed back the newly learned data information correlation to the data collection module 100;
  • the data collection module 100 , the data preprocessing module 200 , the search tool module 300 and the data access and feedback module 400 are sequentially connected, and the data access and feedback module 400 is connected to the data collection module 100 .
  • This application automatically processes and completes the collected original information, then performs unified fusion of multi-scale and multi-field data, and finally feeds back the correlation of the obtained data information to the data collection module 100, so that the entire system has the ability of self-learning, and the output results become more and more accurate with the training of a large amount of data and long-term optimization, thereby improving the convenience of querying data information and the accuracy of indexing.
  • This application discloses a multi-scale information catalog construction system for urban decision-making and evaluation.
  • the system consists of a data collection module, a data preprocessing module 200, a search tool module 300, and a data access and feedback module 400 to realize the construction of a multi-scale information catalog, and apply this information catalog to urban decision-making and evaluation.
  • This application can be applied in the field of data analysis. Through the fusion of multi-scale, multi-scenario, and multi-source data, macro information indicators for urban decision-making can be obtained to ensure the accuracy and long-term nature of urban decision-making.
  • the data collection module 100, the data preprocessing module 200, the search tool module 300, and the data access and feedback module 400 are connected through the mainboard, and can transmit data through the mainboard.
  • the data collection module 100 includes:
  • the time recording sub-module is used to record the collected raw data submitted by a single department as a single time series during the raw data collection process;
  • the multi-information collaboration sub-module is used for simple labeling of various data in the same field submitted by a single department.
  • the data collection module 100 includes a time recording sub-module and a multi-information coordination sub-module.
  • the time recording sub-module records the original data submitted by a single government department collected during the original data collection process as a single time series
  • the multi-information coordination sub-module makes simple labels for various data in the same field submitted by a single government department (such as traffic jam time and traffic accident incidence rate submitted by the transportation department).
  • this application designed a time recording sub-module during data collection, which can accurately locate the required time dimension when dealing with multi-department, long-term span, and high-dimensional data information, and do a good job in data update and caching.
  • the specific implementation is to build an information data timetable.
  • the table also includes the location of the tag information reserved for the data to prepare for the next step of multi-information collaboration.
  • simple tags are obtained by scanning data and filled in the timetable, and the determination of tags can refer to historical data information.
  • the information data schedule is shown in Figure 3. A single government department submits multiple information sheets, and each submission is displayed on its own timeline (only the data sheet submitted for a single submission is shown in Figure 3), and each data sheet has reserved multiple label positions for later adjustment according to the actual situation.
  • the data collection module 100 designed in this application can conveniently quickly find the accurate information history records of each department in subsequent data processing, and provides a good foundation for subsequent data fusion.
  • the data preprocessing module 200 includes:
  • the outlier correction sub-module is used to automatically detect the outliers and missing values of the original data, mark this type of data and use the method close to the mean value for temporary filling;
  • the data fusion sub-module is used for cross-fusion of data with similar multi-scale and multi-field labels, filling missing values and reducing redundancy.
  • the data preprocessing module 200 includes an outlier correction sub-module and a data fusion sub-module.
  • the outlier correction sub-module automatically detects the outliers and missing values of the original data through a single-class support vector machine (One-class SVM) method, and marks such data to be temporarily filled by a method close to the mean value.
  • the data fusion sub-module performs cross-fusion on data with similar multi-scale and multi-field labels, and fills in missing values to reduce redundancy.
  • an outlier correction sub-module is designed for this application, and the outlier points in the data are found through the technology of single classification support vector machine, and the outliers in the data are filled with the approaching average value to ensure the rationality of the data. This operation is performed automatically, and the range of selected outliers can be controlled manually to apply to various application scenarios.
  • the schematic diagram of a single-category support vector machine is shown in Figure 4.
  • the dots are normal data, and the origin of the coordinates can be regarded as outliers, that is, abnormal data.
  • the support vector machine algorithm it is found that basically all data points are separated from the coordinate origin in the feature space (the space where the circle is located), and the distance from the separating hyperplane to the origin is maximized (that is, the solid line and the dotted line can also be separated, but the solid line works best and is the most robust).
  • multi-department, multi-scenario, and multi-scale data are inspected and preliminarily fused to lay the foundation for the construction of the information catalog.
  • high-dimensional refinement is performed on the data, the correlation behind the data is expanded and correlated, and the information with a high correlation is clustered as follows:
  • the search tool module 300 mainly uses the knowledge map technology to refine the data in high dimensions, expand the correlation behind the data and correlate it, and cluster the highly correlated information to provide better search suggestions and search connections.
  • Knowledge graphs can also be used to assist in data analysis and decision-making; knowledge from different sources is integrated through knowledge fusion, and the association between data is enhanced through knowledge graphs and semantic technology, so that users can analyze data more intuitively.
  • a simplified diagram of the knowledge map structure is shown in Figure 5.
  • this application conducts high-dimensional refinement of the post-processed data information, clusters related entries, and displays them in related search suggestions.
  • this application realizes the high-dimensional refinement of massive subdivided data, and obtains the internal links in the data.
  • it can intelligently select appropriate relevant information without manually searching for related term links.
  • the data access and feedback module 400 includes:
  • the historical access information module is used to save the historical information accessed by the user, and update the current search results in combination with the historical information; in this embodiment, the historical access information module is designed by using the long-term short-term memory network; the long-term short-term memory network specifies accurate search suggestions for individual users, and improves the search algorithm of the long-term short-term memory network by recording the feedback of each search.
  • the data access and feedback module 400 mainly realizes the automatic learning of previously undiscovered data information correlations through the technology of long short-term memory network (LSTM) based on the access information and feedback information of different user histories, saves and applies the learned information correlations to the initial data extraction, and optimizes the operation of the entire method.
  • LSTM long short-term memory network
  • This application designs a historical access information module by using a long-term and short-term memory network, which can save the historical information accessed by users and update the current search results in combination with the historical information.
  • the long short-term memory network is shown in Figure 6.
  • the long-short-term memory network judges the relevance of the search suggestion according to the length of the search time.
  • the term relevance obtained from the user feedback can also be fed back to the data collection module 100, which is beneficial for labeling.
  • the continuous optimization of the user feedback mechanism enables the entire process to have the ability of self-learning, and the output results become more and more accurate with the training of large amounts of data and long-term optimization.
  • this application designs search suggestions for individual users, and can learn to adjust the next search suggestion based on historical information and current search results.
  • a multi-scale information catalog construction method is designed, which can be applied to information collection and processing in the era of big data to help cities make decisions and evaluate.
  • this application designs an information data time table to store data, and uses a single classification support vector machine to detect the correctness of the data, and improves the data by approaching the mean.
  • This application uses knowledge graphs in information search to extract the internal links of large-scale, refined, and multi-scale data, and connects abstract high-dimensional information with strong correlations through clustering to facilitate nearby searches.
  • This application considers the user's access and feedback, and integrates the search habits and search history of specific users through the long-term and short-term memory network to make precise searches for them.
  • the feedback optimization mechanism can also ensure that the search results are more in line with the user's search habits.

Landscapes

  • Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请属于数据分析领域,具体涉及一种面向城市决策和评估的多尺度信息目录构建系统,包括:数据收集模块,用于在收集原始数据过程中,将收集的原始数据进行记录;数据预处理模块,用于对原始数据的异常值、缺失值进行智能处理,填补缺失值降低冗余性;搜索工具模块,用于对数据进行高维提炼,将关联度较高的信息聚类;数据访问与反馈模块,用于根据不同用户历史的访问信息与反馈信息,通过人工智能自动学习未被发现的数据信息相关性,并反馈到数据收集模块。本申请通过将获得的数据信息相关性反馈到数据收集模块,使整个系统具备自我学习的能力,输出结果随大量数据的训练、长时间的优化而越来越精确,进而提高数据信息的查询便利性、索引准确性。

Description

一种面向城市决策和评估的多尺度信息目录构建系统 技术领域
本申请涉及数据分析领域,具体而言,涉及一种面向城市决策和评估的多尺度信息目录构建系统。
背景技术
信息技术与经济社会的交汇融合引发了数据迅猛增长,数据已成为国家基础性战略资源,大数据正日益对全球生产、流通、分配、消费活动以及经济运行机制、社会生活方式和国家治理能力产生重要影响。大规模数据是21世纪重要的基础信息资源和战略资源之一,做好数据信息目录处理有利于各部门宏观决策和科学研究、有利于加快国民经济建设与发展进程,促进信息目录为社会、国民经济建设、科学研究等提供服务。
现阶段,我国正在准备着手整合不同尺度、不同维度的数据,将各个部门提交的原始数据通过预处理,得到一份可以包含目前绝大部分信息的信息目录。然而,由于数据的冗余与离散,又没有一种能够简单描述数据库中数据信息的数据目录,从整体上展示现有数据的存储信息状况,因此不能被充分利用来满足科学研究和社会需求,导致研究人员取数据比较困难,需要花费更多宝贵的时间分析、研究大规模数据特征。
在数据收集方面,目前的解决方案仅仅将各部门的数据做汇总,并没有考虑到不同部门、不同时期所收集的数据可能存在差异性、时效性;现有的数据预处理工作是将收集到的数据做简单的融合,对缺失值、异常值以近邻补齐的方法来实现,不能做到对多领域的数据进行有机混合、智能填充;在数据的搜 索工具上,现有的技术和方法只能做到对关键词、关键字的匹配搜索,不能实现较高层次的语义理解搜索,而且在专业领域的术语上不能做到通用理解,导致多词一意,加大搜索的难度;在数据访问及用户反馈上,现有技术做不到随着用户的历史访问信息迭代修改搜索建议。因此需要一种能自动化处理并补全收集到的原始信息,然后对多尺度、多领域的数据做统一的融合,最后可以对用户反馈改进搜索建议的信息目录方法来辅助城市决策;所以,为全面推进数据信息的查询便利性、索引准确性,迫切的需要构建一种面向城市决策与评估的多尺度信息目录构建系统。
发明内容
本申请实施例提供了一种面向城市决策和评估的多尺度信息目录构建系统,以提高数据信息的查询便利性及索引准确性。
根据本申请的一实施例,提供了一种面向城市决策和评估的多尺度信息目录构建系统,包括以下步骤:
数据收集模块,用于在收集原始数据过程中,将收集的原始数据按照时间、领域进行记录;
数据预处理模块,用于原始数据的异常值、缺失值进行智能处理,并对多尺度、多领域的数据进行交叉式融合,填补缺失值降低冗余性;
搜索工具模块,用于对数据进行高维提炼,拓展出数据背后的相关性并对其进行关联,将关联度较高的信息聚类;
数据访问与反馈模块,用于根据不同用户历史的访问信息与反馈信息,通过人工智能自动学习未被发现的数据信息相关性,并将新学习的数据信息相关性反馈到数据收集模块;
数据收集模块、数据预处理模块、搜索工具模块及数据访问与反馈模块依次连接,数据访问与反馈模块与数据收集模块连接。
本申请实施例采取的技术方案还包括:数据收集模块包括:
时间记录子模块,用于在原始数据收集过程中,将收集到的单个部门提交的原始数据记录为单一的时间序列;
多信息协同子模块,用于将单个部门提交的同一领域的多种数据进行简易标签标注。
本申请实施例采取的技术方案还包括:在原始数据收集过程中,将收集到的单个部门提交的原始数据记录为单一的时间序列具体为:
构建信息数据时间表,时间表包含数据信息、时间信息、预留该数据的标签信息位置;
多信息协同子模块通过扫描数据得到简易标签,并填到时间表中。
本申请实施例采取的技术方案还包括:数据预处理模块包括:
异常值修正子模块,用于对原始数据的异常值、缺失值做出自动化检测,标记该类数据并采用临近均值的方法进行临时填充;
数据融合子模块,用于对多尺度、多领域标签相似的数据进行交叉式融合,填补缺失值降低冗余性。
本申请实施例采取的技术方案还包括:标记该类数据并采用临近均值的方法进行临时填充具体为:
通过单分类支持向量机的技术进行做出自动化检测。
本申请实施例采取的技术方案还包括:对数据进行高维提炼,拓展出数据背后的相关性并对其进行关联,将关联度较高的信息聚类具体为:
通过采用知识图谱技术寻找数据中的内在联系并提取出来。
本申请实施例采取的技术方案还包括:数据访问与反馈模块包括:
历史访问信息模块,用于将用户访问的历史信息保存,并结合该历史信息更新当前的搜索结果。
本申请实施例采取的技术方案还包括:通过使用长短期记忆网络设计历史访问信息模块;
长短期记忆网络对单独的用户指定精确的搜索建议,通过记录每一次搜索的反馈来完善长短期记忆网络的搜索算法。
本申请实施例采取的技术方案还包括:长短期记忆网络根据查阅时间的长短来判断搜索建议的相关度。
本申请实施例采取的技术方案还包括:数据收集模块、数据预处理模块、搜索工具模块及数据访问与反馈模块通过主板连接。
相对于现有技术,本申请实施例产生的有益效果在于:本申请实施例中的面向城市决策和评估的多尺度信息目录构建系统,包括:数据收集模块,用于在收集原始数据过程中,将收集的原始数据按照时间、领域进行记录;数据预处理模块,用于对原始数据的异常值、缺失值进行智能处理,并对多尺度、多领域的数据进行交叉式融合,填补缺失值降低冗余性;搜索工具模块,用于对数据进行高维提炼,拓展出数据背后的相关性并对其进行关联,将关联度较高的信息聚类;数据访问与反馈模块,用于根据不同用户历史的访问信息与反馈信息,通过人工智能自动学习未被发现的数据信息相关性,并将新学习的数据信息相关性反馈到数据收集模块。本申请通过自动化处理并补全收集到的原始信息,然后对多尺度、多领域的数据做统一的融合,最后通过将获得的数据信息相关性反馈到数据收集模块,使整个系统具备自我学习的能力,输出结果随大量数据的训练、长时间的优化而越来越精确,进而提高数据信息的查询便利 性、索引准确性。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1为本申请面向城市决策和评估的多尺度信息目录构建系统的模块图;
图2为本申请面向城市决策和评估的多尺度信息目录构建系统的功能示意图;
图3为本申请信息数据时间表结构图;
图4为本申请单分类支持向量机示意图;
图5为本申请知识图谱结构示意图;
图6为本申请长短期记忆网络结构示意图;
图7为本申请搜索机制反馈优化示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第 一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
参见图1和图2,根据本申请一实施例,提供了一种面向城市决策和评估的多尺度信息目录构建系统,包括:
数据收集模块100,用于在收集原始数据过程中,将收集的原始数据按照时间、领域进行记录;
数据预处理模块200,用于原始数据的异常值、缺失值进行智能处理,并对多尺度、多领域的数据进行交叉式融合,填补缺失值降低冗余性;
搜索工具模块300,用于对数据进行高维提炼,拓展出数据背后的相关性并对其进行关联,将关联度较高的信息聚类;
数据访问与反馈模块400,用于根据不同用户历史的访问信息与反馈信息,通过人工智能自动学习未被发现的数据信息相关性,并将新学习的数据信息相关性反馈到数据收集模块100;
数据收集模块100、数据预处理模块200、搜索工具模块300及数据访问与反馈模块400依次连接,数据访问与反馈模块400与数据收集模块100连接。
本申请通过自动化处理并补全收集到的原始信息,然后对多尺度、多领域的数据做统一的融合,最后通过将获得的数据信息相关性反馈到数据收集模块100,使整个系统具备自我学习的能力,输出结果随大量数据的训练、长时间 的优化而越来越精确,进而提高数据信息的查询便利性、索引准确性。
本申请公开了一种面向城市决策和评估的多尺度信息目录构建系统,该系统由收集数据模块、数据预处理模块200、搜索工具模块300、数据访问与反馈模块400来实现多尺度信息目录的构建,并将此信息目录应用在城市决策与评估中。本申请可应用在数据分析领域,通过融合多尺度、多场景、多种来源的数据可以得到应对城市决策的宏观信息指标,保证城市决策的准确性与长期性。
数据收集模块100、数据预处理模块200、搜索工具模块300及数据访问与反馈模块400通过主板连接,并可通过主板进行数据传递。
实施例中,数据收集模块100包括:
时间记录子模块,用于在原始数据收集过程中,将收集到的单个部门提交的原始数据记录为单一的时间序列;
多信息协同子模块,用于将单个部门提交的同一领域的多种数据进行简易标签标注。
具体为,数据收集模块100包含时间记录子模块、多信息协同子模块,时间记录子模块将原始数据在收集过程中收集到的单个政务部门提交的原始数据记录为单一的时间序列,多信息协同子模块将单个政务部门提交的同一领域多种数据(例如交通部门提交的交通阻塞时间、交通事故发生率等)做好简易标签标注。通过按照时间、不同领域做好记录,为后续的数据迭代、数据相关性判断做好前期准备。
为了解决数据收集过程中时间信息混乱,单部门信息庞杂缺乏相关性等问题,本申请在数据收集时通过设计时间记录子模块,在应对多部门、长时间跨度、高维数据信息时能够准确定位到需要的时间维度,做好数据的更新、缓存 工作。
具体实现为构建信息数据时间表,表中除了包含数据信息、时间信息还包括预留该数据的标签信息位置,为下一步多信息协同做准备。在多信息协同子模块通过扫描数据得到简易标签,并填到时间表中,而标签的确定可以参照历史数据信息。信息数据时间表如图3所示。单个政务部门提交了多个信息表,每次提交都显示在自己的时间轴上(图3中仅画出单次提交的数据表),每个数据表都预留了多个标签位置,方便后期根据实际情况调整。通过本申请设计的数据收集模块100可以方便的在之后的数据处理中快速查找到各部门精确的信息历史记录,而且对之后的数据融合提供了良好基础。
实施例中,数据预处理模块200包括:
异常值修正子模块,用于对原始数据的异常值、缺失值做出自动化检测,标记该类数据并采用临近均值的方法进行临时填充;
数据融合子模块,用于对多尺度、多领域标签相似的数据进行交叉式融合,填补缺失值降低冗余性。
具体为,数据预处理模块200包含异常值修正子模块、数据融合子模块,异常值修正子模块对原始数据的异常值、缺失值等通过单分类支持向量机(One-class SVM)的方法做出自动化检测,标记此类数据采用临近均值的方法进行临时填充,数据融合子模块对多尺度、多领域标签相似的数据进行交叉式融合,填补缺失值降低冗余性。
由于收集到的数据量庞大,在收集过程中难免遇到数据出错而未被发现,例如数据的缺失,录入错误等形成异常值、缺失值。为了保证数据的正确性不受到异常点的扰动,为本申请设计了异常值修正子模块,通过单分类支持向量机的技术找到数据中的离群点,并对其实行临近平均值填充,保证数据的合理 性。此操作自动进行,可人工操作控制选取离群点的范围,以适用多种的应用场景。单分类支持向量机示意图如图4所示,圆点为正常数据,可将坐标原点视为离群点,即异常数据。通过支持向量机的算法找到基本上将所有的数据点与坐标原点在特征空间(圆点所在空间)分离开,并且最大化分离超平面到原点的距离(即实线,虚线同样可以分离但是实线效果最好,鲁棒性最强)。
找到离群点之后记录并采用临近平均值填充:通过计算临近数据的平均值来估计离群点应有的数据。将整理好的数据进行多部门的数据融合,通过数据收集模块100得到的信息标签将相关联的多部门数据进行结合,互相验证。判断是否有相同含义的数据,检验数据是否有冲突,如有冲突以时间轴最新为准。
在数据预处理模块200中将多部门、多场景、多尺度的数据检验后进行初步融合,为之后信息目录的构建打下基础。
实施例中,对数据进行高维提炼,拓展出数据背后的相关性并对其进行关联,将关联度较高的信息聚类具体为:
通过采用知识图谱技术寻找数据中的内在联系并提取出来。
具体为,搜索工具模块300主要采用知识图谱技术的方法对数据进行高维提炼,拓展出数据背后的相关性并对其进行关联,将关联度较高的信息聚类,以提供更好的搜索建议与搜索连接。
现有的数据规模大而分类细致,想要从中提取详细信息比较容易,而对于城市的决策与估计,需要能从海量的高维数据中找出相关性较强的关联数据。假设城市要规划一条道路,涉及的问题有路径规划、交通阻塞影响、经济发展适用性等等问题,这需要查找当地经济水平数据以确定道路修建的经费、误工时间;查找当地地质情况以确定路线;查找当地汽车保有量以确定道路宽度等等。所以需要本申请采用知识图谱的技术以寻找到海量数据中的内在联系并提 取出来。
传统的搜索是靠网页之间的超链接实现网页的搜索,而语义搜索是直接对事物进行搜索,比如人、物、机构、地点等,这些事物可以来自文本、图片、视频、音频、物联网设备等。知识图谱和语义技术提供了关于这些事物的分类、属性和关系的描述,这样搜索引擎就可以直接对事物进行搜索。比如想知道A地到B地的路径规划,那么在进行搜素时,搜索引擎会把这句话进行分解,获得“A地”,“B地”,“路径规划”,再与现有的知识库中的词条进行匹配,最后展现在用面前。传统的搜索模式下,进行这样的搜索后得到的通常是包含其中关键词的网页链接,因此还需要在多个网页中进行筛选;可以看出基于知识图谱的搜索更加便捷与准确。
知识图谱也可以用于辅助进行数据分析与决策;不同来源的知识通过知识融合进行集成,通过知识图谱和语义技术增强数据之间的关联,用户可以更直观地对数据进行分析。知识图谱结构简图如图5所示。
通过知识图谱技术,本申请将后处理的数据信息进行高维提炼,对相关的词条进行信息聚类,展示在相关搜索建议中。通过搜索工具模块300,本申请实现了对海量细分数据的高维提炼,获得数据当中的内在联系,当输入搜索词条时可以智能挑选出合适的相关信息而不用去相关词条链接中去人工寻找。
实施例中,数据访问与反馈模块400包括:
历史访问信息模块,用于将用户访问的历史信息保存,并结合该历史信息更新当前的搜索结果;本实施例中通过使用长短期记忆网络设计历史访问信息模块;长短期记忆网络对单独的用户指定精确的搜索建议,通过记录每一次搜索的反馈来完善长短期记忆网络的搜索算法。
具体为,数据访问与反馈模块400主要实现根据不同用户历史的访问信息 与反馈信息,通过长短期记忆网络(LSTM)的技术自动学习之前未被发现的数据信息相关性,将学到的信息相关性保存并应用到初始的数据提取中,优化整个方法的运行。
以往的搜索中,搜索结果是普适性的,即输入相同词条会得到一致的搜索结果,在城市决策数据中,数据的安全信息很重要,所以访问此类信息需要特定的账户权限,因此应该依据不同账户的搜索历史给出针对性的搜索结果,这比普适性的结果要更精确。本申请通过使用长短期记忆网络设计了历史访问信息模块,可以将用户访问的历史信息保存,并结合历史信息更新当前的搜索结果。长短期记忆网络如图6。
实施例中,长短期记忆网络根据查阅时间的长短来判断搜索建议的相关度。
具体为,通过长短期记忆网络对单独的用户指定精确的搜索建议,这只是利用了历史信息。本申请给出的搜索建议不可能完全匹配当前搜索的需要,本申请通过记录每一次搜索的反馈来完善自身的搜索算法,当搜索用户输入词条,搜索工具给出搜索数据后,用户查阅时间短的搜索建议被视为弱相关,降低相关数据的权重;查阅时间长视为相关联性较强;查阅完某条数据后直接退出视为强相关(获取到满意答案)。用户反馈优化机制如图7所示。
用户反馈中获得的词条相关性还可以反馈到数据收集模块100,有利于标签的标注。用户反馈机制的不断优化,使整个流程具备自我学习的能力,输出结果随大量数据的训练、长时间的优化而越来越精确。
现有技术的缺陷包括:
1.现有技术中,在处理大规模的政务数据方面,由于其数据源分散、数据相对割裂、应用及服务碎片化,使得难以有方法对整个城市的各种数据构建信 息目录,仅能处理单一部门的信息数据。
2.现有技术中功能较为单一,对原始数据没有很好的保存与及时更新,导致数据混乱,通过本申请的数据收集模块100与数据预处理模块200处理,可以实现数据有序的保存与更新。
3.在构建数据目录或信息目录时,现有技术只能做到根据搜索词条找到相关超链接,需要人工筛选链接的可用性、关联性,本申请直接将相关性数据整理后呈现出来,数据直接可以使用而且包含多领域的信息,节省人工查找的时间,提高查找的精度。
4.本申请应对城市决策的私密性,设计了针对单个用户的搜索建议,并可以根据历史信息,当前搜索结果来学习调整下一次的搜索建议。
本申请的有益效果在于:
1.本申请中设计了一种多尺度的信息目录构建方法可以应用在大数据时代的信息收集与处理,帮助城市决策与评估。
2.本申请在数据收集与处理中设计了一种信息数据时间表来存放数据,并采用单分类支持向量机来检测数据的正确性,通过临近均值的方法完善数据。
3.本申请在信息搜索上采用了知识图谱来提取大规模、精细化、多尺度数据的内在联系,通过聚类将相关性较强的抽象高维信息连接,方便临近搜索。
4.本申请考虑到用户的访问与反馈,通过长短期记忆网络实现融合特定用户的搜索习惯、搜索历史,以便对其做出精准搜索,反馈优化机制也能保证搜索结果更加符合用户的搜索习惯。
以上所述仅是本申请的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。

Claims (10)

  1. 一种面向城市决策和评估的多尺度信息目录构建系统,其特征在于,包括:
    数据收集模块,用于在收集原始数据过程中,将收集的原始数据按照时间、领域进行记录;
    数据预处理模块,用于所述原始数据的异常值、缺失值进行智能处理,并对多尺度、多领域的数据进行交叉式融合,填补缺失值降低冗余性;
    搜索工具模块,用于对数据进行高维提炼,拓展出数据背后的相关性并对其进行关联,将关联度较高的信息聚类;
    数据访问与反馈模块,用于根据不同用户历史的访问信息与反馈信息,通过人工智能自动学习未被发现的数据信息相关性,并将新学习的数据信息相关性反馈到所述数据收集模块;
    所述数据收集模块、数据预处理模块、搜索工具模块及数据访问与反馈模块依次连接,所述数据访问与反馈模块与所述数据收集模块连接。
  2. 根据权利要求1所述的面向城市决策和评估的多尺度信息目录构建系统,其特征在于,所述数据收集模块包括:
    时间记录子模块,用于在所述原始数据收集过程中,将收集到的单个部门提交的原始数据记录为单一的时间序列;
    多信息协同子模块,用于将单个部门提交的同一领域的多种数据进行简易标签标注。
  3. 根据权利要求2所述的面向城市决策和评估的多尺度信息目录构建系统,其特征在于,所述在所述原始数据收集过程中,将收集到的单个部门提交的原始数据记录为单一的时间序列具体为:
    构建信息数据时间表,所述时间表包含数据信息、时间信息、预留该数据的标签信息位置;
    所述多信息协同子模块通过扫描数据得到简易标签,并填到所述时间表中。
  4. 根据权利要求1所述的面向城市决策和评估的多尺度信息目录构建系统,其特征在于,所述数据预处理模块包括:
    异常值修正子模块,用于对所述原始数据的异常值、缺失值做出自动化检测,标记该类数据并采用临近均值的方法进行临时填充;
    数据融合子模块,用于对多尺度、多领域标签相似的数据进行交叉式融合,填补缺失值降低冗余性。
  5. 根据权利要求4所述的面向城市决策和评估的多尺度信息目录构建系统,其特征在于,所述标记该类数据并采用临近均值的方法进行临时填充具体为:
    通过单分类支持向量机的技术进行做出自动化检测。
  6. 根据权利要求1所述的面向城市决策和评估的多尺度信息目录构建系统,其特征在于,所述对数据进行高维提炼,拓展出数据背后的相关性并对其进行关联,将关联度较高的信息聚类具体为:
    通过采用知识图谱技术寻找数据中的内在联系并提取出来。
  7. 根据权利要求1所述的面向城市决策和评估的多尺度信息目录构建系统,其特征在于,所述数据访问与反馈模块包括:
    历史访问信息模块,用于将用户访问的历史信息保存,并结合该历史信息更新当前的搜索结果。
  8. 根据权利要求7所述的面向城市决策和评估的多尺度信息目录构建系统,其特征在于,通过使用长短期记忆网络设计所述历史访问信息模块;
    所述长短期记忆网络对单独的用户指定精确的搜索建议,通过记录每一次 搜索的反馈来完善所述长短期记忆网络的搜索算法。
  9. 根据权利要求8所述的面向城市决策和评估的多尺度信息目录构建系统,其特征在于,所述长短期记忆网络根据查阅时间的长短来判断所述搜索建议的相关度。
  10. 根据权利要求1-9任意一项的所述的面向城市决策和评估的多尺度信息目录构建系统,其特征在于,所述数据收集模块、数据预处理模块、搜索工具模块及数据访问与反馈模块通过主板连接。
PCT/CN2022/073175 2022-01-21 2022-01-21 一种面向城市决策和评估的多尺度信息目录构建系统 WO2023137700A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/073175 WO2023137700A1 (zh) 2022-01-21 2022-01-21 一种面向城市决策和评估的多尺度信息目录构建系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/073175 WO2023137700A1 (zh) 2022-01-21 2022-01-21 一种面向城市决策和评估的多尺度信息目录构建系统

Publications (1)

Publication Number Publication Date
WO2023137700A1 true WO2023137700A1 (zh) 2023-07-27

Family

ID=87347700

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/073175 WO2023137700A1 (zh) 2022-01-21 2022-01-21 一种面向城市决策和评估的多尺度信息目录构建系统

Country Status (1)

Country Link
WO (1) WO2023137700A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117370582A (zh) * 2023-11-02 2024-01-09 广州蓝图地理信息技术有限公司 基于多数据融合的自然资源要素三维实体化建模方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095281A (zh) * 2014-05-13 2015-11-25 南京理工大学 一种基于日志挖掘的网站分类目录优化分析方法
CN111400369A (zh) * 2020-03-06 2020-07-10 湖南城市学院 一种基于大数据分析的政策信息服务系统及方法
US20200364407A1 (en) * 2019-05-14 2020-11-19 Korea University Research And Business Foundation Method and server for text classification using multi-task learning
US11138200B1 (en) * 2019-12-04 2021-10-05 Tubular Labs, Inc. Efficient aggregation of time series data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095281A (zh) * 2014-05-13 2015-11-25 南京理工大学 一种基于日志挖掘的网站分类目录优化分析方法
US20200364407A1 (en) * 2019-05-14 2020-11-19 Korea University Research And Business Foundation Method and server for text classification using multi-task learning
US11138200B1 (en) * 2019-12-04 2021-10-05 Tubular Labs, Inc. Efficient aggregation of time series data
CN111400369A (zh) * 2020-03-06 2020-07-10 湖南城市学院 一种基于大数据分析的政策信息服务系统及方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117370582A (zh) * 2023-11-02 2024-01-09 广州蓝图地理信息技术有限公司 基于多数据融合的自然资源要素三维实体化建模方法
CN117370582B (zh) * 2023-11-02 2024-06-04 广州蓝图地理信息技术有限公司 基于多数据融合的自然资源要素三维实体化建模方法

Similar Documents

Publication Publication Date Title
US8401771B2 (en) Discovering points of interest from users map annotations
CN110472066B (zh) 一种城市地理语义知识图谱的构建方法
CN105183869B (zh) 楼宇知识图谱数据库及其构建方法
CN108090223B (zh) 一种基于互联网信息的开放学者画像方法
CN108345596A (zh) 楼宇信息融合服务平台
CN111460252A (zh) 一种基于网络舆情分析的自动化搜索引擎方法及系统
CN112508743B (zh) 技术转移办公室通用信息交互方法、终端及介质
CN113656647B (zh) 一种面向智能运维的工程档案数据管理平台、方法及系统
WO2023137700A1 (zh) 一种面向城市决策和评估的多尺度信息目录构建系统
US20200334314A1 (en) Emergency disposal support system
CN104317897A (zh) 一种数字图书馆中基于可视化标签主题图的导航方法
CN108763573A (zh) 一种基于机器学习的olap引擎路由方法及系统
CN116384889A (zh) 基于自然语言处理技术的情报大数据智能分析方法
Wu et al. Tourism forecasting research: a bibliometric visualization review (1999–2022)
CN108446380B (zh) 基于模块化信息存储结构提升产业质量水平的系统
CN104142922A (zh) 一种移动图像在线搜索与挖掘的分类方法
Shi et al. [Retracted] Research on Fast Recommendation Algorithm of Library Personalized Information Based on Density Clustering
Yan et al. Analysis of research papers on E-commerce (2000–2013): based on a text mining approach
Xiaolu Design of travel route recommendation system based on fast Spark artificial intelligence architecture
WO2020118517A1 (zh) 交通管理与控制服务指数的评价指标体系建立及发布方法
US20180293299A1 (en) Query processing
Bai RETRACTED ARTICLE: Data cleansing method of talent management data in wireless sensor network based on data mining technology
Liu Smart financial management system based on intelligent data dimensionality reduction technology
Sun Research on interest reading recommendation method of intelligent library based on big data technology
Xu et al. An improved algorithm for clustering uncertain traffic data streams based on Hadoop platform

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22921143

Country of ref document: EP

Kind code of ref document: A1