WO2020207252A1 - Data storage method and device, storage medium, and electronic apparatus - Google Patents

Data storage method and device, storage medium, and electronic apparatus Download PDF

Info

Publication number
WO2020207252A1
WO2020207252A1 PCT/CN2020/081158 CN2020081158W WO2020207252A1 WO 2020207252 A1 WO2020207252 A1 WO 2020207252A1 CN 2020081158 W CN2020081158 W CN 2020081158W WO 2020207252 A1 WO2020207252 A1 WO 2020207252A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
feature
list
database
time
Prior art date
Application number
PCT/CN2020/081158
Other languages
French (fr)
Chinese (zh)
Inventor
何明
陈仲铭
徐鑫
刘耀勇
陈岩
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2020207252A1 publication Critical patent/WO2020207252A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Abstract

A data storage method and device, a storage medium, and an electronic apparatus. The method comprises: obtaining multiple pieces of basic data respectively belonging to multiple categories (110); summarizing and integrating the multiple pieces of basic data according to the respective categories thereof, and then performing a first storage operation (120); performing feature extraction on basic data in each of databases so as to obtain feature data corresponding to each database, and performing a second storage operation (130); and fusing the feature data so as to obtain fused feature data, and performing a third storage operation (140).

Description

数据存储方法、装置、存储介质及电子设备Data storage method, device, storage medium and electronic equipment
本申请要求于2019年4月9日提交中国专利局、申请号为201910282158.5、发明名称为“数据存储方法、装置、存储介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201910282158.5, and the invention title is "data storage methods, devices, storage media and electronic equipment" on April 9, 2019, the entire contents of which are incorporated by reference In this application.
技术领域Technical field
本申请涉及电子技术领域,具体涉及一种数据存储方法、装置、存储介质及电子设备。This application relates to the field of electronic technology, in particular to a data storage method, device, storage medium and electronic equipment.
背景技术Background technique
随着电子技术的发展,诸如智能手机等电子设备的智能化程度越来越高。电子设备可以通过各种各样的算法模型来进行数据处理,从而为用户提供各种功能。对于需要收集大量数据的电子设备来说,系统数据的安全性和用户隐私数据的安全性都很重要。With the development of electronic technology, electronic devices such as smart phones are becoming more and more intelligent. Electronic equipment can process data through various algorithm models to provide users with various functions. For electronic devices that need to collect large amounts of data, the security of system data and the security of user privacy data are both important.
发明内容Summary of the invention
本申请实施例提供一种数据存储方法、装置、存储介质及电子设备,可以兼顾系统数据的安全性和用户隐私数据的安全性。The embodiments of the present application provide a data storage method, device, storage medium, and electronic equipment, which can take into account the security of system data and the security of user privacy data.
第一方面,本申请实施例提供了一种数据存储方法,应用于电子设备,其中,数据存储方法包括:In the first aspect, an embodiment of the present application provides a data storage method applied to an electronic device, wherein the data storage method includes:
获取多个基础数据,所述多个基础数据分属于多个类别;Acquiring multiple basic data, the multiple basic data belonging to multiple categories;
将所述多个基础数据按照分属的类别进行归纳整合,将归纳整合后的所述多个数据进行第一次存储,存储到对应类别的数据库中;Summarize and integrate the multiple basic data according to their respective categories, and store the summarized and integrated data for the first time in a database of the corresponding category;
分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将所述特征数据进行第二次存储;Perform feature extraction of basic data on each database, obtain feature data corresponding to each database, and store the feature data for a second time;
将所述特征数据进行融合,得到融合特征数据,将所述融合特征数据进行第三次存储。The feature data is fused to obtain fused feature data, and the fused feature data is stored for the third time.
第二方面,本申请实施例提供了一种数据存储装置,包括:In the second aspect, an embodiment of the present application provides a data storage device, including:
获取模块,用于获取多个基础数据,所述多个基础数据分属于多个类别;An obtaining module, used to obtain a plurality of basic data, the plurality of basic data belong to a plurality of categories;
第一存储模块,用于将所述多个基础数据按照分属的类别进行归纳整合,将归纳整合后的所述多个数据进行第一次存储,存储到对应类别的数据库中;The first storage module is configured to summarize and integrate the multiple basic data according to their respective categories, and store the summarized and integrated multiple data for the first time in a database of the corresponding category;
第二存储模块,用于分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将所述特征数据进行第二次存储;The second storage module is used to perform feature extraction of basic data for each database to obtain feature data corresponding to each database, and store the feature data for the second time;
第三存储模块,用于将所述特征数据进行融合,得到融合特征数据,将所述融合特征数据进行第三次存储。The third storage module is used to fuse the feature data to obtain the fusion feature data, and store the fusion feature data for the third time.
第三方面,本申请实施例提供了一种存储介质,其中,存储介质中存储有计算机程序,当计算机程序在计算机上运行时,使得计算机执行:In a third aspect, embodiments of the present application provide a storage medium, in which a computer program is stored in the storage medium, and when the computer program runs on a computer, the computer executes:
获取多个基础数据,所述多个基础数据分属于多个类别;Acquiring multiple basic data, the multiple basic data belonging to multiple categories;
将所述多个基础数据按照分属的类别进行归纳整合,将归纳整合后的所述多个数据进行第一次存储,存储到对应类别的数据库中;Summarize and integrate the multiple basic data according to their respective categories, and store the summarized and integrated data for the first time in a database of the corresponding category;
分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将所述特征数据进行第二次存储;Perform feature extraction of basic data on each database, obtain feature data corresponding to each database, and store the feature data for a second time;
将所述特征数据进行融合,得到融合特征数据,将所述融合特征数据进行第三次存储。The feature data is fused to obtain fused feature data, and the fused feature data is stored for the third time.
第四方面,本申请实施例提供了一种电子设备,其中,电子设备包括处理器和存储器,存储器中存储有计算机程序,处理器通过调用存储器中存储的计算机程序,用于执行:In a fourth aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a processor and a memory, and a computer program is stored in the memory, and the processor calls the computer program stored in the memory to execute:
获取多个基础数据,所述多个基础数据分属于多个类别;Acquiring multiple basic data, the multiple basic data belonging to multiple categories;
将所述多个基础数据按照分属的类别进行归纳整合,将归纳整合后的所述多个数据进行第一次存储,存储到对应类别的数据库中;Summarize and integrate the multiple basic data according to their respective categories, and store the summarized and integrated data for the first time in a database of the corresponding category;
分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将所述特征数据进行第二次存储;Perform feature extraction of basic data on each database, obtain feature data corresponding to each database, and store the feature data for a second time;
将所述特征数据进行融合,得到融合特征数据,将所述融合特征数据进行第三次存储。The feature data is fused to obtain fused feature data, and the fused feature data is stored for the third time.
附图说明Description of the drawings
图1为本申请实施例提供的数据存储方法的应用场景示意图。FIG. 1 is a schematic diagram of an application scenario of a data storage method provided by an embodiment of the application.
图2为本申请实施例提供的数据存储方法的第一种流程示意图。FIG. 2 is a schematic diagram of the first flow of a data storage method provided by an embodiment of this application.
图3为本申请实施例提供的数据存储方法的另一应用场景示意图。FIG. 3 is a schematic diagram of another application scenario of the data storage method provided by an embodiment of the application.
图4为本申请实施例提供的数据存储方法的第二种流程示意图。FIG. 4 is a schematic diagram of a second flow of a data storage method provided by an embodiment of the application.
图5为本申请实施例提供的数据存储装置的结构示意图。FIG. 5 is a schematic structural diagram of a data storage device provided by an embodiment of the application.
图6为本申请实施例提供的数据存储装置的另一结构示意图。FIG. 6 is a schematic diagram of another structure of a data storage device provided by an embodiment of the application.
图7为本申请实施例提供的数据存储装置的又一结构示意图。FIG. 7 is a schematic diagram of another structure of a data storage device provided by an embodiment of the application.
图8为本申请实施例提供的电子设备的第一种结构示意图。FIG. 8 is a schematic diagram of the first structure of an electronic device provided by an embodiment of this application.
图9为本申请实施例提供的电子设备的第二种结构示意图。FIG. 9 is a schematic diagram of a second structure of an electronic device provided by an embodiment of the application.
具体实施方式detailed description
请参照图式,其中相同的组件符号代表相同的组件,本申请的原理是以实施在一适当的运算环境中来举例说明。以下的说明是基于所例示的本申请具体实施例,其不应被视为限制本申请未在此详述的其它具体实施例。Please refer to the drawings, in which the same component symbols represent the same components, and the principle of the present application is implemented in an appropriate computing environment for illustration. The following description is based on the exemplified specific embodiments of the present application, which should not be regarded as limiting other specific embodiments that are not described in detail herein.
本申请实施例提供一种数据存储方法,包括:The embodiment of the present application provides a data storage method, including:
获取多个基础数据,所述多个基础数据分属于多个类别;Acquiring multiple basic data, the multiple basic data belonging to multiple categories;
将所述多个基础数据按照分属的类别进行归纳整合,将归纳整合后的所述多个数据进行第一次存储,存储到对应类别的数据库中;Summarize and integrate the multiple basic data according to their respective categories, and store the summarized and integrated data for the first time in a database of the corresponding category;
分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将所述特征数据进行第二次存储;Perform feature extraction of basic data on each database, obtain feature data corresponding to each database, and store the feature data for a second time;
将所述特征数据进行融合,得到融合特征数据,将所述融合特征数据进行第三次存储。The feature data is fused to obtain fused feature data, and the fused feature data is stored for the third time.
在一实施例中,所述基础数据的类别至少包括用户操作终端的行为数据、传感器数据和系统运行数据。In an embodiment, the categories of the basic data include at least behavior data of the user operating terminal, sensor data, and system operation data.
在一实施例中,所述分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据之前,还包括:In an embodiment, before the feature extraction of the basic data is performed on each database to obtain the feature data corresponding to each database, the method further includes:
采集各数据库的基础数据;Collect basic data of each database;
利用数据处理算法从所述基础数据中提取出特征数据;Extracting characteristic data from the basic data by using a data processing algorithm;
基于所述特征数据,训练并优化机器学习模型;Based on the feature data, train and optimize a machine learning model;
当获取到新的基础数据时,将所述新的基础数据输入至所述机器学习模型,得到新的 特征数据。When new basic data is acquired, the new basic data is input to the machine learning model to obtain new feature data.
在一实施例中,所述将所述特征数据进行融合包括:In an embodiment, the fusing the feature data includes:
将所述特征数据以多表连接的方式进行融合;Fuse the characteristic data in a multi-table connection manner;
将所述特征数据以时序对齐的方式进行融合。The feature data is fused in a time-aligned manner.
在一实施例中,所述将所述特征数据以多表连接的方式进行融合,包括:In an embodiment, the fusion of the characteristic data in a multi-table connection manner includes:
获取第一列表与第二列表,所述第一列表与第二列表分别包含两组不同类型的特征数据,所述第一列表的数据源小于第二列表的数据源;Acquiring a first list and a second list, the first list and the second list respectively containing two sets of different types of characteristic data, the data source of the first list is smaller than the data source of the second list;
利用连接键为所述第一列表的数据源建立散列表;Establishing a hash table for the data source of the first list by using the connection key;
提取所述第一列表的列数据,将所述第一列表的列数据存储到散列表中;Extract the column data of the first list, and store the column data of the first list in a hash table;
扫描第二列表,获取所述第二列表中与所述散列表匹配的行数据,将所述与所述散列表匹配的行与第一列表中对应的内容组合成记录放入结果集中。The second list is scanned to obtain row data in the second list that matches the hash table, and the rows that match the hash table and the corresponding content in the first list are combined into a record and placed in a result set.
在一实施例中,所述扫描第二列表,获取所述第二列表中与所述散列表匹配的行数据,包括:In an embodiment, the scanning the second list to obtain row data in the second list that matches the hash table includes:
扫描第二列表,对所述连接键进行散列映射,检测所述散列表;Scan the second list, perform hash mapping on the connection key, and detect the hash table;
当检测出所述第二列表中存在与所述散列表相匹配的行数据,获取所述第二列表中与所述散射表匹配的行数据,所述行数据与所述第一列表的列数据相匹配。When it is detected that there is row data in the second list that matches the hash table, the row data in the second list that matches the scatter table is acquired, and the row data is the same as the column of the first list. The data matches.
在一实施例中,所述将所述特征数据以时序对齐的方式进行融合,包括:。In an embodiment, the fusion of the feature data in a time-aligned manner includes:.
获取两个特征数据库及分别与所述两个特征数据库对应的两个时序信息,每一个所述特征数据库包含其对应数据库的全部特征数据;Acquiring two feature databases and two time series information corresponding to the two feature databases, each of the feature databases contains all the feature data of its corresponding database;
将两个所述特征数据库中的特征数据分别按照时序信息进行排列;Arrange the feature data in the two feature databases according to time sequence information;
获取两个时序信息中相同的时序,将所述相同的时序对应的特征数据进行对齐。Obtain the same sequence in the two sequence information, and align the characteristic data corresponding to the same sequence.
在一实施例中,所述获取多个基础数据包括:In an embodiment, the acquiring multiple basic data includes:
实时通过多个不同传感器采集基础数据。Collect basic data through multiple different sensors in real time.
在一实施例中,所述将所述融合特征数据进行第三次存储之后,还包括:In an embodiment, after storing the fused feature data for the third time, the method further includes:
将所述融合特征数据在终端进行实时备份。The fusion feature data is backed up in real time at the terminal.
参考图1,图1为本申请实施例提供的数据存储方法的应用场景示意图。数据存储方法应用于电子设备。电子设备中设置有全景感知架构。全景感知架构为电子设备中用于实现数据存储方法的硬件和软件的集成。Referring to FIG. 1, FIG. 1 is a schematic diagram of an application scenario of a data storage method provided by an embodiment of the application. The data storage method is applied to electronic equipment. A panoramic perception architecture is provided in the electronic device. The panoramic perception architecture is the integration of hardware and software used to implement data storage methods in electronic devices.
其中,全景感知架构包括信息感知层、数据处理层、特征抽取层、情景建模层以及智能服务层。Among them, the panoramic perception architecture includes an information perception layer, a data processing layer, a feature extraction layer, a scenario modeling layer, and an intelligent service layer.
信息感知层用于获取电子设备自身的信息和/或外部环境中的信息。信息感知层可以包括多个传感器。例如,信息感知层包括距离传感器、磁场传感器、光线传感器、加速度传感器、指纹传感器、霍尔传感器、位置传感器、陀螺仪、惯性传感器、姿态感应器、气压计、心率传感器等多个传感器。The information perception layer is used to obtain the information of the electronic device itself and/or the information in the external environment. The information perception layer may include multiple sensors. For example, the information sensing layer includes multiple sensors such as distance sensors, magnetic field sensors, light sensors, acceleration sensors, fingerprint sensors, Hall sensors, position sensors, gyroscopes, inertial sensors, attitude sensors, barometers, heart rate sensors, and so on.
其中,距离传感器可以用于检测电子设备与外部物体之间的距离。磁场传感器可以用于检测电子设备所处环境的磁场信息。光线传感器可以用于检测电子设备所处环境的光线信息。加速度传感器可以用于检测电子设备的加速度数据。指纹传感器可以用于采集用户的指纹信息。霍尔传感器是根据霍尔效应制作的一种磁场传感器,可以用于实现电子设备的自动控制。位置传感器可以用于检测电子设备当前所处的地理位置。陀螺仪可以用于检 测电子设备在各个方向上的角速度。惯性传感器可以用于检测电子设备的运动数据。姿态感应器可以用于感应电子设备的姿态信息。气压计可以用于检测电子设备所处环境的气压。心率传感器可以用于检测用户的心率信息。Among them, the distance sensor can be used to detect the distance between the electronic device and an external object. The magnetic field sensor can be used to detect the magnetic field information of the environment in which the electronic device is located. The light sensor can be used to detect the light information of the environment in which the electronic device is located. The acceleration sensor can be used to detect the acceleration data of the electronic device. The fingerprint sensor can be used to collect the user's fingerprint information. Hall sensor is a kind of magnetic field sensor made according to Hall effect, which can be used to realize automatic control of electronic equipment. The location sensor can be used to detect the current geographic location of the electronic device. The gyroscope can be used to detect the angular velocity of electronic equipment in all directions. Inertial sensors can be used to detect movement data of electronic devices. The attitude sensor can be used to sense the attitude information of the electronic device. The barometer can be used to detect the air pressure of the environment where the electronic device is located. The heart rate sensor can be used to detect the user's heart rate information.
数据处理层用于对信息感知层获取到的数据进行处理。例如,数据处理层可以对信息感知层获取到的数据进行数据清理、数据集成、数据变换、数据归约等处理。The data processing layer is used to process the data obtained by the information perception layer. For example, the data processing layer can perform data cleaning, data integration, data transformation, and data reduction on the data acquired by the information perception layer.
其中,数据清理是指对信息感知层获取到的大量数据进行清理,以剔除无效数据和重复数据。数据集成是指将信息感知层获取到的多个单维度数据集成到一个更高或者更抽象的维度,以对多个单维度的数据进行综合处理。数据变换是指对信息感知层获取到的数据进行数据类型的转换或者格式的转换等,以使变换后的数据满足处理的需求。数据归约是指在尽可能保持数据原貌的前提下,最大限度的精简数据量。Among them, data cleaning refers to cleaning up a large amount of data obtained by the information perception layer to eliminate invalid data and duplicate data. Data integration refers to the integration of multiple single-dimensional data acquired by the information perception layer into a higher or more abstract dimension to comprehensively process multiple single-dimensional data. Data transformation refers to the data type conversion or format conversion of the data acquired by the information perception layer, so that the transformed data meets the processing requirements. Data reduction means to minimize the amount of data while maintaining the original appearance of the data as much as possible.
特征抽取层用于对数据处理层处理后的数据进行特征抽取,以提取数据中包括的特征。提取到的特征可以反映出电子设备自身的状态或者用户的状态或者电子设备所处环境的环境状态等。The feature extraction layer is used to perform feature extraction on the data processed by the data processing layer to extract the features included in the data. The extracted features can reflect the state of the electronic device itself or the state of the user or the environmental state of the environment in which the electronic device is located.
其中,特征抽取层可以通过过滤法、包装法、集成法等方法来提取特征或者对提取到的特征进行处理。Among them, the feature extraction layer can extract features or process the extracted features through methods such as filtering, packaging, and integration.
过滤法是指对提取到的特征进行过滤,以删除冗余的特征数据。包装法用于对提取到的特征进行筛选。集成法是指将多种特征提取方法集成到一起,以构建一种更加高效、更加准确的特征提取方法,用于提取特征。The filtering method refers to filtering the extracted features to delete redundant feature data. The packaging method is used to screen the extracted features. The integration method refers to the integration of multiple feature extraction methods to construct a more efficient and accurate feature extraction method for feature extraction.
情景建模层用于根据特征抽取层提取到的特征来构建模型,所得到的模型可以用于表示电子设备的状态或者用户的状态或者环境状态等。例如,情景建模层可以根据特征抽取层提取到的特征来构建关键值模型、模式标识模型、图模型、实体联系模型、面向对象模型等。The scenario modeling layer is used to construct a model based on the features extracted by the feature extraction layer, and the obtained model can be used to represent the state of the electronic device or the state of the user or the environment. For example, the scenario modeling layer can construct key value models, pattern identification models, graph models, entity connection models, object-oriented models, etc. based on the features extracted by the feature extraction layer.
智能服务层用于根据情景建模层所构建的模型为用户提供智能化的服务。例如,智能服务层可以为用户提供基础应用服务,可以为电子设备进行系统智能优化,还可以为用户提供个性化智能服务。The intelligent service layer is used to provide users with intelligent services based on the model constructed by the scenario modeling layer. For example, the intelligent service layer can provide users with basic application services, can perform system intelligent optimization for electronic devices, and can also provide users with personalized intelligent services.
此外,全景感知架构中还可以包括多种算法,每一种算法都可以用于对数据进行分析处理,多种算法可以构成算法库。例如,算法库中可以包括马尔科夫算法、隐含狄里克雷分布算法、贝叶斯分类算法、支持向量机、K均值聚类算法、K近邻算法、条件随机场、残差网络、长短期记忆网络、卷积神经网络、循环神经网络等算法。In addition, the panoramic perception architecture can also include multiple algorithms, each of which can be used to analyze and process data, and multiple algorithms can form an algorithm library. For example, the algorithm library can include Markov algorithm, implicit Dirichlet distribution algorithm, Bayesian classification algorithm, support vector machine, K-means clustering algorithm, K-nearest neighbor algorithm, conditional random field, residual network, long Algorithms such as short-term memory networks, convolutional neural networks, and recurrent neural networks.
本申请实施例提供一种数据存储方法,数据存储方法可以应用于电子设备中。电子设备可以是智能手机、平板电脑、游戏设备、AR(Augmented Reality,增强现实)设备、汽车、数据存储装置、音频播放装置、视频播放装置、笔记本、桌面计算设备、可穿戴设备诸如手表、眼镜、头盔、电子手链、电子项链、电子衣物等设备。The embodiment of the present application provides a data storage method, and the data storage method can be applied to an electronic device. Electronic equipment can be smart phones, tablet computers, gaming equipment, AR (Augmented Reality) equipment, cars, data storage devices, audio playback devices, video playback devices, notebooks, desktop computing devices, wearable devices such as watches, glasses , Helmets, electronic bracelets, electronic necklaces, electronic clothing and other equipment.
参考图2,图2为本申请实施例提供的数据存储方法的第一种流程示意图。其中,数据存储方法包括以下步骤:Referring to FIG. 2, FIG. 2 is a schematic flowchart of the first data storage method provided by an embodiment of the application. Among them, the data storage method includes the following steps:
110,获取多个基础数据,多个基础数据分属于多个类别。110. Obtain multiple basic data, which belong to multiple categories.
基础数据可以包括电子设备的运行信息、电子设备的配置信息、用户信息、当前环境信息等。具体的,可以通过一个或多个传感器采集基础数据,也可以为实时采集。例如,通过距离传感器、磁场传感器、光线传感器、加速度传感器、指纹传感器、霍尔传感器、 位置传感器、陀螺仪、惯性传感器、姿态感应器、气压计、血压传感器、脉搏传感器、心率传感器等中的至少一个获取当前环境信息和电子设备的相关信息。其中,当前环境信息包括用户的身体信息,如血压、脉搏、心率等。电子设备的相关信息包括电子设备的运行信息、电子设备的配置信息、电子设备内存储的用户信息等。其中,用户信息包括用户的身份信息、个人爱好、浏览记录、个人收藏等人机交互的信息。电子设备的运行信息包括开机时间、关机时间、待机时间、各个时间点的内存使用率、各个时间点的主芯片使用率、当前运行程序信息、后台运行程序信息、各个程序的运行时长、各个程序的下载量等。在一些实施例中,基础数据还可以包括用户操作终端的行为数据、传感器数据和系统运行数据。The basic data may include operating information of the electronic device, configuration information of the electronic device, user information, current environment information, and so on. Specifically, the basic data can be collected through one or more sensors, or can be collected in real time. For example, through at least one of distance sensors, magnetic field sensors, light sensors, acceleration sensors, fingerprint sensors, Hall sensors, position sensors, gyroscopes, inertial sensors, attitude sensors, barometers, blood pressure sensors, pulse sensors, heart rate sensors, etc. One to obtain current environmental information and related information of electronic equipment. Among them, the current environment information includes the user's physical information, such as blood pressure, pulse, heart rate, etc. The related information of the electronic device includes the operation information of the electronic device, the configuration information of the electronic device, and user information stored in the electronic device. Among them, the user information includes the user's identity information, personal hobbies, browsing history, personal collection and other human-computer interaction information. The operating information of electronic equipment includes power-on time, power-off time, standby time, memory usage at each point in time, main chip utilization at each point in time, current running program information, background running program information, running time of each program, and each program Downloads, etc. In some embodiments, the basic data may also include behavior data of the user operating terminal, sensor data, and system operation data.
120,将多个基础数据按照分属的类别进行归纳整合,将归纳整合后的多个数据进行第一次存储,存储到对应类别的数据库中。120. The multiple basic data are summarized and integrated according to their respective categories, and the multiple data after the summary and integration are stored for the first time and stored in the database of the corresponding category.
得到多个基础数据后,将其存储在第一存储模块中。如,可以将多个全景感知信息存储在硬盘中。其中,可以设置多个数据库,将其基础数据按照类别存储到对应的数据库中。After obtaining multiple basic data, store them in the first storage module. For example, multiple panorama perception information can be stored in the hard disk. Among them, you can set up multiple databases, and store their basic data in corresponding databases according to categories.
对所有基础数据进行聚类,将多个基础数据按照分属的类别进行归纳整合,将同类的基础数据聚合在一起,形成一个数据集合,从而得到多类基础数据的多个数据集合。其中,基础数据可以根据数据的硬件属性进行分类,如主芯片相关的数据、显示屏相关的数据、硬盘相关的数据、内存相关的数据、各类传感器相关的数据等。基础数据还可以根据对应的应用程序进行分类,如系统应用程序相关的数据、安装的应用程序相关的数据;其中安装的应用程序相关的数据又可以根据具体的应用程序进行再分类,如即时通讯应用程序相关的数据、地图应用程序相关的数据、购物应用程序相关的数据等。将基础数据按照类别存储到对应的数据库中,有效地隔离了不相关的数据,使得数据能够独立存放。在一些实施例中,获取对应每个数据库中的时序索引,还能够便于基础数据的索引。Cluster all the basic data, summarize and integrate multiple basic data according to their respective categories, and aggregate the same basic data to form a data set, thereby obtaining multiple data sets of multiple types of basic data. Among them, basic data can be classified according to the hardware attributes of the data, such as main chip related data, display screen related data, hard disk related data, memory related data, and various sensor related data. Basic data can also be classified according to corresponding applications, such as data related to system applications and data related to installed applications; among them, data related to installed applications can be further classified according to specific applications, such as instant messaging Application-related data, map application-related data, shopping application-related data, etc. The basic data is stored in the corresponding database according to the category, which effectively isolates the irrelevant data, so that the data can be stored independently. In some embodiments, obtaining the time series index corresponding to each database can also facilitate the indexing of basic data.
同类型的基础数据存储到同一数据库中。一项基础数据可以存储到一个数据库中,例如,加速度传感器数据只存储到加速度传感器数据库中。一项基础数据也可以存储到多个数据库中,例如,当某项基础数据分属于两种类别时,可以将这项基础数据进行复制,将复制后的基础数据和原基础数据分别存储到两个数据库中,两个数据库分别对应于这项基础数据所属的两种类别。需要说明的是,数据库中不仅可以存储当前获取的基础数据,还可以存储之前获取的基础数据。The basic data of the same type is stored in the same database. A piece of basic data can be stored in a database, for example, acceleration sensor data is only stored in the acceleration sensor database. A piece of basic data can also be stored in multiple databases. For example, when a piece of basic data falls into two categories, the basic data can be copied, and the copied basic data and the original basic data can be stored in two Among these databases, the two databases correspond to the two categories to which this basic data belongs. It should be noted that not only the currently acquired basic data can be stored in the database, but also the previously acquired basic data.
130,分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将特征数据进行第二次存储。130. Perform feature extraction of basic data on each database, obtain feature data corresponding to each database, and store the feature data for a second time.
对单独的数据库中的数据进行单独的特征提取,得到每一个数据库对应的特征数据。可以设置特征提取层,用多种方式对基础数据进行特征提取,对应于不同的数据可以有不同的特征提取方法。每一种类型的数据格式和数据内容可以不相同,例如传感器数据中wifi连接信息非常有限,在没有连接wifi信号的时候并不会有wifi信息进行存储记录;相对而言,对于IMU数据则是每秒以赫兹的频率进行回传,一天最多可存储高达上G的数据。对数据库进行基础数据的特征提取,一方面有利于减少冗余信息、节省存储空间,另外一方面可以有效提取基础数据中的重要含义。以音频信息为例,音频信息属于时序信息,随着时间的增长,音频信息的数据不断增长,因此需要对数据进行特征提取,减少数据量。以双麦克风通道、32bit的位宽、采样频率为44100的音频信息为例,其5分钟产生的数据大 概为1G,经过特征提取后得到每个时间窗口的重要特征,此时特征可以以向量形式进行储存,1G的数据可以压缩至几百k不等。Perform a separate feature extraction on the data in a separate database to obtain the feature data corresponding to each database. The feature extraction layer can be set up to perform feature extraction on the basic data in a variety of ways. There can be different feature extraction methods for different data. Each type of data format and data content can be different. For example, the wifi connection information in the sensor data is very limited. When the wifi signal is not connected, the wifi information will not be stored and recorded; relatively speaking, it is for IMU data. It returns at a frequency of Hertz per second, and can store up to G data in one day. The feature extraction of basic data from the database is beneficial to reduce redundant information and save storage space on the one hand, and on the other hand, it can effectively extract important meanings in the basic data. Taking audio information as an example, audio information belongs to time series information. As time increases, the data of audio information continues to grow. Therefore, it is necessary to perform feature extraction on the data to reduce the amount of data. Take audio information with dual microphone channels, 32bit bit width, and sampling frequency of 44100 as an example. The data generated in 5 minutes is about 1G. After feature extraction, the important features of each time window are obtained. At this time, the features can be in vector form For storage, 1G of data can be compressed to hundreds of k.
另外,第一次存储和第二次存储的存储方式可以是触发式的数据回传方法,即步骤110获取多个基础数据时,数据的回传方式可以为触发式回传。例如,对于网络模块来说,其开启WIFI功能的时候会搜索附近可用网络,此时网络模块检测到的数据向系统进行传输,系统在收集基础数据时,对系统通知类消息进行监控和收集。In addition, the storage mode of the first storage and the second storage may be a triggered data return method, that is, when multiple basic data is acquired in step 110, the data return method may be a triggered return method. For example, for the network module, when the WIFI function is turned on, it will search for nearby available networks. At this time, the data detected by the network module is transmitted to the system. When the system collects basic data, it monitors and collects system notification messages.
在一些实施例中,用人工预设的方法对数据库进行基础数据的特征提取,预先设定每一类别的基础数据中的重要特征。将基础数据聚类并存储至相应数据库中,对同一数据库中的基础数据进行相同的重要特征认定,提取预设的重要特征对应到每一项基础数据的具体数据,作为特征数据,将特征数据进行第二次存储。In some embodiments, the feature extraction of basic data is performed on the database by a manual preset method, and important features in the basic data of each category are preset. The basic data is clustered and stored in the corresponding database, the basic data in the same database is identified with the same important characteristics, and the preset important characteristics are extracted to correspond to the specific data of each basic data, as the characteristic data, the characteristic data Perform a second storage.
在一些实施例中,用预先训练机器学习模型的方法对数据库进行基础数据的特征提取,具体可以为:预先训练机器学习模型,得到与基础数据匹配的机器学习模型;将基础数据输入机器学习模型,获取模型输出结果,将模型输出结果作为特征数据。In some embodiments, the method of pre-training the machine learning model is used to extract the features of the basic data from the database. Specifically, it may be: pre-training the machine learning model to obtain a machine learning model that matches the basic data; and inputting the basic data into the machine learning model , Obtain the model output results, and use the model output results as feature data.
首先,采集各数据库的基础数据;利用数据处理算法从基础数据中提取出特征数据;基于特征数据,训练并优化机器学习模型;当获取到新的基础数据时,将新的基础数据输入至机器学习模型,得到新的特征数据。First, collect the basic data of each database; use data processing algorithms to extract feature data from the basic data; train and optimize the machine learning model based on the feature data; when new basic data is obtained, input the new basic data into the machine Learn the model and get new feature data.
得到每一个数据库对应的特征数据,将特征数据进行第二次存储可以存储在第二存储模块中,第二存储模块中不需要存储大量的原始基础数据,只需要存储对应的特征数据即可。对基础数据进行特征提取,有效地提取出基础数据的重要特征,减少了原始基础数据的冗余信息、节省存储空间。相对于步骤120中的第一次存储,第二次存储中中所存储的数据量大大减少。需要说明的是,对数据库进行基础数据的特征提取,将提取得到的特征数据进行存储,能够避免直接存储原始数据格式,严谨把控信息安全,保护用户隐私。通过对数据库进行基础数据的特征提取,能够对源数据进行脱敏处理,有效地记录经过特征层脱敏的用户数据,减少数据冗余,便于后续使用。The characteristic data corresponding to each database is obtained, and the characteristic data can be stored in the second storage module for the second storage. The second storage module does not need to store a large amount of original basic data, only the corresponding characteristic data needs to be stored. The feature extraction of the basic data effectively extracts the important features of the basic data, reduces the redundant information of the original basic data, and saves storage space. Compared with the first storage in step 120, the amount of data stored in the second storage is greatly reduced. It should be noted that the feature extraction of the basic data of the database and the storage of the extracted feature data can avoid directly storing the original data format, strictly control information security, and protect user privacy. By extracting the features of the basic data from the database, the source data can be desensitized, and the user data desensitized by the feature layer can be effectively recorded, reducing data redundancy and facilitating subsequent use.
在一些实施例中,还可以获取对应每个数据库中的时序索引,将对应每个数据库的时序索引也存储在第二存储模块(如内存)中,以便系统其他模块根据时序索引在数据库中查找到对应的基础数据。通过聚类的方法对多源异构的基础数据进行时间序列聚类,有效地对原始基础数据进行压缩,减少了基础数据的冗余信息的同时,实现了实时的基础数据的索引和访问。电子设备的运算资源和存储资源有限,合理地对基础数据进行访问和分配,能够加快全景感知信息的检索速度。In some embodiments, the time series index corresponding to each database can also be obtained, and the time series index corresponding to each database is also stored in the second storage module (such as memory), so that other modules of the system can search in the database according to the time series index. To the corresponding basic data. Through the clustering method, multi-source heterogeneous basic data is clustered in time series, which effectively compresses the original basic data, reduces the redundant information of the basic data, and realizes the real-time indexing and access of the basic data. The computing resources and storage resources of electronic devices are limited, and the reasonable access and distribution of basic data can speed up the retrieval of panoramic perception information.
140,将特征数据进行融合,得到融合特征数据,将融合特征数据进行第三次存储。140. Fusion feature data to obtain fusion feature data, and store the fusion feature data for a third time.
第三次存储前,对第二次存储的内容进行特征数据的融合。具体的,可以使用多表连接的方式对特征数据进行融合,也可以使用时序对齐的方式对特征数据进行融合,还可以使用多表连接与时序对齐的方式共同对特征数据进行融合。由于终端上的数据大部分为时序数据,即不同时间点用户的操作和终端的情景是不相同的,随着时间的改变而改变,因此融合特征数据,可以进一步减少数据之间的不对称性,压缩数据量。Before the third storage, feature data fusion is performed on the content of the second storage. Specifically, the feature data can be fused using a multi-table connection, or the feature data can be fused using a time series alignment, or the feature data can be combined using a multi-table connection and a time sequence alignment. Because most of the data on the terminal is time series data, that is, the user's operation and the terminal scene at different time points are different and change with time. Therefore, the fusion of feature data can further reduce the asymmetry between data , The amount of compressed data.
将特征数据进行融合,得到融合特征数据,将融合特征数据进行第三次存储,可以存储到第三存储模块中。在一些实施例中,得到融合特征数据后,第三存储模块中存储有融合后的全景特征信息,通过级联的存储方式有效地对数据进行容灾备份,并且,能够避免 对明文数据进行存储和传输,通过特有的特征提取步骤对基础数据提取高纬特征(相当于对基础数据进行加密操作),有效地保护用户隐私信息。The feature data is fused to obtain the fused feature data, and the fused feature data is stored for the third time, which can be stored in the third storage module. In some embodiments, after the fused feature data is obtained, the fused panoramic feature information is stored in the third storage module, and the data is effectively disaster-tolerant and backed up through the cascaded storage method, and the storage of plaintext data can be avoided And transmission, through the unique feature extraction step to extract high-latitude features from the basic data (equivalent to encrypting the basic data), effectively protecting user privacy information.
在一些实施例中,该方法还可以包括:把融合特征数据传送到应用服务层或数据处理层,利用融合特征数据进行计算。在一些实施例中,该方法还可以包括:将融合特征信息上传至云端,便于提供给服务器进行数据分析。In some embodiments, the method may further include: transmitting the fused feature data to an application service layer or a data processing layer, and use the fused feature data to perform calculations. In some embodiments, the method may further include: uploading the fusion feature information to the cloud, so that it can be provided to the server for data analysis.
在一些实施例中,该方法还可以包括:对融合特征数据进行终端备份,增加数据冗余度。例如,在聚会场所拍照的时候,音频信息可以对当前环境进行判断,判断出当前环境为开心、热闹、或者滋事等,结合图像信息可以判断终端用户房钱更加细粒度的场所。因此,音频信号经历步骤110、120、130和140,特征进行融合后,会比原来产生稍微多一点的冗余信息,这些冗余信息能够补充数据之间的缺失。In some embodiments, the method may further include: performing terminal backup of the fused feature data to increase data redundancy. For example, when taking pictures in a gathering place, the audio information can judge the current environment to determine whether the current environment is happy, lively, or trouble, etc., combined with image information, can determine a more fine-grained place for the end user's room. Therefore, the audio signal undergoes steps 110, 120, 130, and 140, and after the features are merged, slightly more redundant information will be generated than before, and this redundant information can supplement the lack of data.
终端中关于数据的安全性很重要,本申请实施例不仅照顾到系统数据本身的安全性问题,还照顾到用户隐私数据安全性问题,通过具体细节步骤,能够有效地解决上述缺点。具体而言,对于终端来说(特别地针对全景感知需要收集大量的终端数据),收集大量的数据很容易会造成数据丢失问题,因此通过级联的数据库存储方式能够有效对数据进行容灾备份。其次,对终端来说,通过特征提取,存储特征数据,能够大大降低数据备份存储的压力,有效地减低系统硬盘和I/O(Input/Output,输入/输出)开销。最后,通过特征提取,可以有效地避免对明文数据进行存储和传输,通过特有的特征提取对数据提取高纬特征(相当于对数据进行加密操作),有效地保护用户隐私信息。The security of the data in the terminal is very important. The embodiment of the application not only takes care of the security of the system data itself, but also takes care of the security of the user's private data. The above shortcomings can be effectively solved through specific steps. Specifically, for the terminal (especially for panoramic perception, a large amount of terminal data needs to be collected), collecting a large amount of data can easily cause data loss problems, so the cascaded database storage method can effectively perform disaster recovery backup of data . Secondly, for the terminal, through feature extraction and storage of feature data, the pressure of data backup and storage can be greatly reduced, and the system hard disk and I/O (Input/Output, input/output) overhead can be effectively reduced. Finally, through feature extraction, the storage and transmission of plaintext data can be effectively avoided, and high-latitude features (equivalent to data encryption operations) can be extracted from the data through unique feature extraction, effectively protecting user privacy information.
参考图3,图3为本申请实施例提供的数据存储方法的另一应用场景图。其中,用户行为数据、传感器数据、…、系统运行数据等为基础数据的来源,具体的,可以通过传感器等获取基础数据。然后,将多个基础数据聚类后,进行一级存储。一级存储层中存储有用户行为数据、传感器数据、…、系统运行数据等基础数据。Referring to FIG. 3, FIG. 3 is a diagram of another application scenario of the data storage method provided by an embodiment of the application. Among them, user behavior data, sensor data,..., system operation data, etc. are the source of basic data. Specifically, basic data can be obtained through sensors. Then, after clustering multiple basic data, perform primary storage. The primary storage layer stores basic data such as user behavior data, sensor data,..., system operation data, etc.
随后,特征提取模块对一级存储层中的基础数据进行特征提取,提取出基础数据的重要特征作为特征数据,进行二级存储。二级存储层中存储有行为特征、传感器特征、…、系统特征等特征数据。Subsequently, the feature extraction module performs feature extraction on the basic data in the primary storage layer, and extracts important features of the basic data as feature data for secondary storage. The secondary storage layer stores characteristic data such as behavior characteristics, sensor characteristics,..., system characteristics, etc.
三级存储中,对二级存储层的特征数据进行融合,得到融合全景特征,三级存储为融合特征数据的存储。In the tertiary storage, the feature data of the secondary storage layer is fused to obtain the fused panoramic feature, and the tertiary storage is the storage of the fused feature data.
得到融合特征数据后,可以将融合特征数据上传至云端提供给服务器进行数据分析,也可以将融合特征数据传送给应用服务层或数据处理层,进行计算。此外,还可以对融合全景特征数据库进行冗余备份,增加数据冗余度,有效预防数据丢失。After obtaining the fusion feature data, the fusion feature data can be uploaded to the cloud and provided to the server for data analysis, or the fusion feature data can be transmitted to the application service layer or data processing layer for calculation. In addition, redundant backup of the integrated panoramic feature database can be performed to increase data redundancy and effectively prevent data loss.
参考图4,图4为本申请实施例提供的数据存储方法的第二种流程示意图。其中,数据存储方法包括以下步骤:Referring to FIG. 4, FIG. 4 is a schematic diagram of a second flow of a data storage method provided by an embodiment of this application. Among them, the data storage method includes the following steps:
210,获取多个基础数据,多个基础数据分属于多个类别。210. Obtain multiple basic data, and the multiple basic data belong to multiple categories.
基础数据可以包括电子设备的运行信息、电子设备的配置信息、用户信息、当前环境信息等。具体的,可以通过一个或多个传感器采集基础数据,也可以为实时采集。例如,通过距离传感器、磁场传感器、光线传感器、加速度传感器、指纹传感器、霍尔传感器、位置传感器、陀螺仪、惯性传感器、姿态感应器、气压计、血压传感器、脉搏传感器、心 率传感器等中的至少一个获取当前环境信息和电子设备的相关信息。其中,当前环境信息包括用户的身体信息,如血压、脉搏、心率等。电子设备的相关信息包括电子设备的运行信息、电子设备的配置信息、电子设备内存储的用户信息等。其中,用户信息包括用户的身份信息、个人爱好、浏览记录、个人收藏等人机交互的信息。电子设备的运行信息包括开机时间、关机时间、待机时间、各个时间点的内存使用率、各个时间点的主芯片使用率、当前运行程序信息、后台运行程序信息、各个程序的运行时长、各个程序的下载量等。在一些实施例中,基础数据还可以包括用户操作终端的行为数据、传感器数据和系统运行数据。The basic data may include operating information of the electronic device, configuration information of the electronic device, user information, current environment information, and so on. Specifically, the basic data can be collected through one or more sensors, or can be collected in real time. For example, through at least one of distance sensors, magnetic field sensors, light sensors, acceleration sensors, fingerprint sensors, Hall sensors, position sensors, gyroscopes, inertial sensors, attitude sensors, barometers, blood pressure sensors, pulse sensors, heart rate sensors, etc. One to obtain current environmental information and related information of electronic equipment. Among them, the current environment information includes the user's physical information, such as blood pressure, pulse, heart rate, etc. The related information of the electronic device includes the operation information of the electronic device, the configuration information of the electronic device, and user information stored in the electronic device. Among them, the user information includes the user's identity information, personal hobbies, browsing history, personal collection and other human-computer interaction information. The operating information of electronic equipment includes power-on time, power-off time, standby time, memory usage at each point in time, main chip utilization at each point in time, current running program information, background running program information, running time of each program, and each program Downloads, etc. In some embodiments, the basic data may also include behavior data of the user operating terminal, sensor data, and system operation data.
220,判断各个基础数据的类别,根据判断出的各个基础数据的类别将多个基础数据按照分属的类别进行归纳整合。220. Determine the category of each basic data, and according to the determined category of each basic data, summarize and integrate multiple basic data according to their respective categories.
归纳整合也可称聚类,聚类是指将物理或抽象对象的集合分成由类似的对象组成的多个类。由聚类所生成的簇是一组数据对象的集合,这些对象与同一个簇中的对象彼此相似,与其他簇中的对象相异。Inductive integration can also be called clustering, which refers to dividing a collection of physical or abstract objects into multiple classes composed of similar objects. The cluster generated by clustering is a collection of a set of data objects, which are similar to objects in the same cluster and different from objects in other clusters.
通过对第一存储模块中的所有基础数据进行聚类,可以将同类的基础数据聚合在一起,形成一个数据集合,从而得到多类基础数据的多个数据集合。其中,基础数据可以根据数据的硬件属性进行分类,如主芯片相关的数据、显示屏相关的数据、硬盘相关的数据、内存相关的数据、各类传感器相关的数据等。基础数据还可以根据对应的应用程序进行分类,如系统应用程序相关的数据、安装的应用程序相关的数据;其中安装的应用程序相关的数据又可以根据具体的应用程序进行再分类,如即时通讯应用程序相关的数据、地图应用程序相关的数据、购物应用程序相关的数据等。将基础数据按照类别存储到对应的数据库中,有效地隔离了不相关的数据,使得数据能够独立存放。在一些实施例中,获取对应每个数据库中的时序索引,还能够便于基础数据的索引。By clustering all the basic data in the first storage module, the basic data of the same kind can be aggregated together to form a data set, thereby obtaining multiple data sets of multiple types of basic data. Among them, the basic data can be classified according to the hardware attributes of the data, such as data related to the main chip, data related to the display screen, data related to the hard disk, data related to the memory, data related to various sensors, etc. Basic data can also be classified according to corresponding applications, such as data related to system applications and data related to installed applications; among them, data related to installed applications can be further classified according to specific applications, such as instant messaging Application-related data, map application-related data, shopping application-related data, etc. The basic data is stored in the corresponding database according to the category, which effectively isolates the irrelevant data, so that the data can be stored independently. In some embodiments, obtaining the time series index corresponding to each database can also facilitate the indexing of basic data.
230,将归纳整合后的多个数据进行第一次存储,存储到对应类别的数据库中。230. Store the summarized and integrated multiple data for the first time, and store them in the database of the corresponding category.
同类型的基础数据存储到同一数据库中。一项基础数据可以存储到一个数据库中,例如,加速度传感器数据只存储到加速度传感器数据库中。一项基础数据也可以存储到多个数据库中,例如,当某项基础数据分属于两种类别时,可以将这项基础数据进行复制,将复制后的基础数据和原基础数据分别存储到两个数据库中,两个数据库分别对应于这项基础数据所属的两种类别。需要说明的是,数据库中不仅可以存储当前获取的全景感知信息,还可以存储之前存储的全景感知信息。The basic data of the same type is stored in the same database. A piece of basic data can be stored in a database, for example, acceleration sensor data is only stored in the acceleration sensor database. A piece of basic data can also be stored in multiple databases. For example, when a piece of basic data falls into two categories, the basic data can be copied, and the copied basic data and the original basic data can be stored in two Among these databases, the two databases correspond to the two categories to which this basic data belongs. It should be noted that not only the currently acquired panorama perception information but also the previously stored panorama perception information can be stored in the database.
240,预先训练机器学习模型,根据机器学习模型对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将特征数据进行第二次存储。240. Train a machine learning model in advance, perform feature extraction of basic data on each database according to the machine learning model, obtain feature data corresponding to each database, and store the feature data for a second time.
机器学习即计算机模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。它是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习研究如何在经验学习中改善具体算法的性能,能够通过经验自动改进计算机算法。Machine learning refers to computer simulation or realization of human learning behaviors to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance. It is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence. Machine learning studies how to improve the performance of specific algorithms in experience learning, and can automatically improve computer algorithms through experience.
将基础数据输入机器学习模型,获取模型输出结果,将模型输出结果作为特征数据,将特征数据进行第二次存储。Input the basic data into the machine learning model, obtain the model output result, use the model output result as the feature data, and store the feature data for the second time.
情景建模层将步骤230中存储的历史基础数据作为训练样本,根据该训练样本对机器学习模型进行训练,得到训练后的机器学习模型,可用作预测模型。首先,采集各数据库 的基础数据;利用数据处理算法从基础数据中提取出特征数据;基于特征数据,训练并优化机器学习模型;当获取到新的基础数据时,将新的基础数据输入至机器学习模型,得到新的特征数据。The scenario modeling layer uses the historical basic data stored in step 230 as training samples, and trains the machine learning model according to the training samples to obtain the trained machine learning model, which can be used as a prediction model. First, collect the basic data of each database; use data processing algorithms to extract feature data from the basic data; train and optimize the machine learning model based on the feature data; when new basic data is obtained, input the new basic data into the machine Learn the model and get new feature data.
在一些实施例中,得到训练后的机器学习模型的同时,得到对应各类历史基础数据的重要等级,然后根据重要等级设置各类历史基础数据的采样频率。In some embodiments, while obtaining the trained machine learning model, the importance levels corresponding to various types of historical basic data are obtained, and then the sampling frequency of various types of historical basic data is set according to the importance levels.
在一些实施例中,训练后的机器学习模型用于提取基础数据的特征信息,将基础数据输入机器学习模型,获取模型输出结果,将模型输出结果作为特征数据,将特征数据进行第二次存储。In some embodiments, the trained machine learning model is used to extract the feature information of the basic data, input the basic data into the machine learning model, obtain the model output result, use the model output result as the feature data, and store the feature data for the second time .
通过预先训练机器学习模型,可以得到与基础数据匹配的机器学习模型,便于在后续对基础数据进行进一步的处理,机器自动对学习算法进行更新,有效避免预设人工算法的繁琐及不灵活。By pre-training the machine learning model, a machine learning model that matches the basic data can be obtained, which is convenient for further processing of the basic data. The machine automatically updates the learning algorithm, effectively avoiding the cumbersome and inflexible preset manual algorithm.
251,将特征数据以多表连接的方式进行融合,得到融合特征数据。251. Fusion feature data in a multi-table connection manner to obtain fused feature data.
在编程术语中,“连接(JOIN)”语句用于将数据库中的两个或多个表组合起来。由"连接"生成的集合,可以被保存为表,或者当成表来使用,而多表连接是一种表与表之间的连接方式。In programming terms, the "JOIN" statement is used to combine two or more tables in the database. The collection generated by "connection" can be saved as a table, or used as a table, and a multi-table connection is a way of connecting between tables.
在一些实施例中,多表连接的具体方式可以采用散列连接技术,散列连接为大数据集连接时的常用方式,优化器使用两个表中较小的数据源利用连接键(JOINKEY)在内存中建立散列表,将列数据存储到散列表中,然后扫描较大的表,同样对JOINKEY进行HASH后探测散列表,找出与散列表匹配的行。值得注意的是,具体哪些数据需要对称地进行多表连接由提前设定的程序进行决定,例如对于陀螺仪和加速度传感器的数据,其属于相互补充的传感器,但是其回传数据的频率不同,因此可以采用多表连接的方式。又例如对于加速度和重力传感器,也可以作为多表连接的输入源。In some embodiments, the specific method of multi-table connection can use hash join technology. Hash join is a common way when connecting large data sets. The optimizer uses the smaller data source of the two tables to use the join key (JOINKEY). Build a hash table in memory, store the column data in the hash table, and then scan a larger table, and also HASH the JOINKEY and then detect the hash table to find out the rows that match the hash table. It is worth noting that the specific data that needs to be symmetrically connected to the multi-meters is determined by the program set in advance. For example, for the data of the gyroscope and the acceleration sensor, they are complementary sensors, but the frequency of the return data is different. Therefore, multiple tables can be connected. For another example, acceleration and gravity sensors can also be used as input sources for multi-meter connection.
将特征数据以多表连接的方式进行融合,可以具体包括将特征数据以散列连接的方式进行融合。在一些实施例中,将特征数据以散列连接的方式进行融合的步骤,可以具体包括:获取第一列表与第二列表,第一列表与第二列表分别包含两组不同类型的特征数据,第一列表的数据源小于第二列表的数据源;利用连接键为第一列表的数据源建立散列表;提取第一列表的列数据,将第一列表的列数据存储到散列表中;扫描第二列表,获取第二列表中与散列表匹配的行数据,将与散列表匹配的行与第一列表中对应的内容组合成记录放入结果集中。The fusion of the characteristic data in the manner of multi-table connection may specifically include the fusion of the characteristic data in the manner of hash connection. In some embodiments, the step of fusing the feature data in a hash connection manner may specifically include: obtaining a first list and a second list, the first list and the second list respectively containing two sets of different types of feature data, The data source of the first list is smaller than the data source of the second list; use the connection key to build a hash table for the data source of the first list; extract the column data of the first list, and store the column data of the first list in the hash table; scan In the second list, the row data in the second list that matches the hash table is obtained, and the rows that match the hash table and the corresponding content in the first list are combined into a record and placed in the result set.
其中,扫描第二列表,获取第二列表中与散列表匹配的行数据的步骤,可以包括:扫描第二列表,对连接键进行散列映射,检测散列表;当检测出第二列表中存在与散列表相匹配的行数据,获取第二列表中与散射表匹配的行数据。需要说明的是,行数据还与第一列表的列数据相匹配。The step of scanning the second list to obtain row data matching the hash table in the second list may include: scanning the second list, performing hash mapping on the connection key, and detecting the hash table; when it is detected that there is The row data that matches the hash table is obtained, and the row data that matches the scatter table in the second list is obtained. It should be noted that the row data also matches the column data of the first list.
252,将特征数据以时序对齐的方式进行融合,得到融合特征数据。252. Fusion feature data in a time-aligned manner to obtain fused feature data.
时序即时间顺序,时序对齐即将利用时序对数据进行对齐。Timing is time sequence, and timing alignment is about using timing to align data.
在一些实施例中,将特征数据以时序对齐的方式进行融合的步骤,可以包括:获取两个特征数据库及分别与两个特征数据库对应的两个时序信息;将两个特征数据库中的特征数据分别按照时序信息进行排列;获取两个时序信息中相同的时序,将相同的时序对应的特征数据进行对齐。In some embodiments, the step of fusing the feature data in a time-aligned manner may include: acquiring two feature databases and two time-series information corresponding to the two feature databases; combining the feature data in the two feature databases Arrange according to the timing information respectively; obtain the same timing in the two timing information, and align the characteristic data corresponding to the same timing.
需要说明的是,获取两个特征数据库及分别与两个特征数据库对应的两个时序信息,具体为,获取一个特征数据库及与该一个特征数据库对应的时序信息,获取另一个特征数据库及与该另一个特征数据库对应的时序信息。每一个特征数据库包含其对应数据库的全部特征数据It should be noted that acquiring two feature databases and two timing information corresponding to the two feature databases is specifically acquiring one feature database and timing information corresponding to the one feature database, and acquiring another feature database and the timing information corresponding to the Timing information corresponding to another feature database. Each feature database contains all the feature data of its corresponding database
在一些实施例中,获取两个时序信息中相同的时序,将相同的时序对应的特征数据进行对齐之前,还可以包括:当检测出两个时序信息中的时序不能够完全匹配时,获取两个时序信息中无法进行匹配的待操作时序;判断能否对待操作时序进行数据补齐,数据包括特征数据,进行数据补齐的方法包括插值算法;若判断出能够对待操作时序进行数据补齐,补齐待操作时序对应的数据;若判断出无法对待操作时序进行数据补齐,删除待操作时序。In some embodiments, before acquiring the same timing in the two timing information and aligning the feature data corresponding to the same timing, it may further include: when it is detected that the timing in the two timing information cannot be completely matched, acquiring the two timing information The timing information to be operated that cannot be matched in the timing information; to determine whether the timing of the operation to be operated can be supplemented with data, the data includes characteristic data, and the method of data completion includes interpolation algorithm; if it is determined that the timing of the operation can be supplemented with data, Fill in the data corresponding to the sequence to be operated; if it is determined that the sequence to be operated cannot be filled with data, delete the sequence to be operated.
具体而言,例如,某数据的时序信息为A,B,D,F,某数据的时序信息为A,B,C,D,E,F,为了使得两类数据能够匹配上,利用时序上到的数据插值进行对齐,如果某些数据不能通过插值算法得到,则删除多余的时序。通过时序对齐,能够进一步减少数据之间的不对称性,压缩数据量。Specifically, for example, the timing information of a certain data is A, B, D, F, and the timing information of a certain data is A, B, C, D, E, F. In order to match the two types of data, use the timing information The obtained data is interpolated to align, and if some data cannot be obtained by the interpolation algorithm, the redundant timing is deleted. Through timing alignment, the asymmetry between data can be further reduced, and the amount of data can be compressed.
253,将特征数据以多表连接和时序对齐的方式进行融合,得到融合特征数据。253. Fuse the feature data in a multi-table connection and time sequence alignment to obtain fused feature data.
在特征数据进行融合时,多表连接和时序对齐这两种方法,可以二者选其一,也可以二者兼并。在一些实施例中,即使用多表连接对特征数据进行融合,又实用时序对齐对特征数据进行融合。When the feature data is fused, the two methods of multi-table connection and timing alignment can be selected one of the two, or both can be combined. In some embodiments, not only the multi-table connection is used to fuse the feature data, but also the time sequence alignment is used to fuse the feature data.
260,将融合特征数据进行第三次存储。260. Store the fused feature data for the third time.
第三存储单元中存储有融合后的特征数据,通过级联的存储方式有效地对数据进行容灾备份,并且,能够避免对明文数据进行存储和传输,通过特有的特征提取步骤对基础数据提取高纬特征(相当于对基础数据进行加密操作),有效地保护用户隐私信息。The third storage unit stores the fused feature data, which can effectively perform disaster recovery and backup of the data through cascaded storage, and can avoid storing and transmitting plaintext data, and extract basic data through a unique feature extraction step High-dimensional features (equivalent to encrypting basic data), effectively protecting user privacy information.
270,将融合特征数据在终端进行实时备份。270. Perform real-time backup of the fusion feature data on the terminal.
为了保证待处理数据的安全性,第一存储模块中的基础数据、第二存储模块中的特征数据以及第三存储模块中的融合特征数据都可以在终端进行实时备份。In order to ensure the security of the data to be processed, the basic data in the first storage module, the feature data in the second storage module, and the fusion feature data in the third storage module can all be backed up in real time at the terminal.
具体的,可以冗余备份在另一个存储模块中,或第一存储模块、第二存储模块或第三存储模块的另一个地方。Specifically, it can be redundantly backed up in another storage module, or another place of the first storage module, the second storage module, or the third storage module.
若第一存储模块为硬盘,则将基础数据、特征数据或融合特征数据在第一存储模块进行冗余备份时,可以将硬盘划分为至少两个区域,基础数据存储在其中一个区域,冗余备份在另一个区域。If the first storage module is a hard disk, when the basic data, characteristic data or fusion characteristic data are redundantly backed up in the first storage module, the hard disk can be divided into at least two areas, and the basic data is stored in one of the areas. Backup in another area.
若第一存储模块为硬盘,电子设备包括至少两个硬盘时,则可以冗余备份在另一硬盘中。其中,两个硬盘可以为相同类型的硬盘,如机械硬盘、固态硬盘、混合硬盘等。两个硬盘也可以为不同类型的硬盘,如机械硬盘、固态硬盘、混合硬盘中的两种等。If the first storage module is a hard disk, and the electronic device includes at least two hard disks, it can be redundantly backed up in another hard disk. Among them, the two hard drives can be the same type of hard drives, such as mechanical hard drives, solid state drives, hybrid hard drives, and so on. The two hard disks can also be different types of hard disks, such as mechanical hard disks, solid state hard disks, and hybrid hard disks.
需要说明的是,本实施例中的冗余备份可以备份一份,也可以备份多份。其中,备份多份可以使用同一种方式备份,也可以使用不同种方式备份。It should be noted that the redundant backup in this embodiment can be backed up for one copy or multiple copies. Among them, multiple backups can be backed up in the same way or in different ways.
将融合特征数据在终端进行实时备份,可以增加数据冗余度,补充数据确实。例如,在聚会场所拍照的时候,音频信息可以对当前环境进行判断,判断出当前环境为开心、热闹、或者滋事等,结合图像信息可以判断终端用户房钱更加细粒度的场所。因此,音频信号经历步骤110、120、130和140,特征进行融合后,会比原来产生稍微多一点的冗余信息,这些冗余信息能够补充数据之间的缺失。并且,在将来基础数据有所丢失时,可以利用这 些冗余备份的数据对源数据进行补充。The real-time backup of the fusion feature data at the terminal can increase data redundancy and supplement the data. For example, when taking pictures in a gathering place, the audio information can judge the current environment to determine whether the current environment is happy, lively, or trouble, etc., combined with image information, can determine a more fine-grained place for the end user's room. Therefore, the audio signal undergoes steps 110, 120, 130, and 140, and after the features are merged, slightly more redundant information will be generated than before, and this redundant information can supplement the lack of data. In addition, when the basic data is lost in the future, these redundant backup data can be used to supplement the source data.
应当理解,本申请实施例中,诸如术语“第一”、“第二”等仅用于区别类似的对象,而不必用于描述特定的顺序或先后次序,这样描述的对象在适当情况下可以互换。It should be understood that in the embodiments of the present application, the terms "first" and "second" are only used to distinguish similar objects, and not necessarily used to describe a specific order or sequence. The objects described in this way can be used under appropriate circumstances. exchange.
具体实施时,本申请不受所描述的各个步骤的执行顺序的限制,在不产生冲突的情况下,某些步骤还可以采用其它顺序进行或者同时进行。During specific implementation, this application is not limited by the order of execution of the various steps described, and certain steps may also be carried out in other order or carried out simultaneously without conflict.
由上可知,本申请实施例提供的数据存储方法,首先获取多个基础数据,多个基础数据分属于多个类别;然后将多个基础数据按照分属的类别进行归纳整合,将归纳整合后的多个数据进行第一次存储,存储到对应类别的数据库中;接着分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将特征数据进行第二次存储;最后将特征数据进行融合,得到融合特征数据,将融合特征数据进行第三次存储。通过三级存储的方式,将基础数据的关键特征进行提取和融合,能够减少冗余信息。将提取得到的特征数据以及进一步融合得到的融合特征数据进行存储,能够在操作数据时,避免直接对明文数据进行操作,有效保护系统数据的安全性和用户隐私数据的安全性。It can be seen from the above that the data storage method provided by the embodiment of the present application first obtains multiple basic data, and the multiple basic data belong to multiple categories; then the multiple basic data are summarized and integrated according to the categories they belong to, and after the integration is summarized The multiple data of the data is stored for the first time and stored in the database of the corresponding category; then the feature extraction of the basic data is performed on each database to obtain the feature data corresponding to each database, and the feature data is stored for the second time; finally The feature data is fused to obtain the fused feature data, and the fused feature data is stored for the third time. Through the three-level storage method, the key features of the basic data are extracted and merged, which can reduce redundant information. Storing the extracted feature data and the fused feature data obtained by further fusion can avoid directly operating on the plaintext data when operating the data, and effectively protect the security of system data and the security of user privacy data.
参考图5,图5为本申请实施例提供的数据存储装置的结构示意图。其中,数据存储装置300可以集成在电子设备中,数据存储装置300包括获取模块301、第一存储模块302、第二存储模块303和第三存储模块304。Referring to FIG. 5, FIG. 5 is a schematic structural diagram of a data storage device provided by an embodiment of the application. The data storage device 300 may be integrated in an electronic device. The data storage device 300 includes an acquisition module 301, a first storage module 302, a second storage module 303, and a third storage module 304.
获取模块301,用于获取多个基础数据,多个基础数据分属于多个类别;The obtaining module 301 is used to obtain multiple basic data, and the multiple basic data belong to multiple categories;
第一存储模块302,用于将多个基础数据按照分属的类别进行归纳整合,将归纳整合后的多个数据进行第一次存储,存储到对应类别的数据库中;The first storage module 302 is used to summarize and integrate multiple basic data according to their respective categories, and store the summarized and integrated multiple data for the first time in a database of the corresponding category;
第二存储模块303,用于分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将特征数据进行第二次存储;The second storage module 303 is configured to perform feature extraction of basic data on each database, obtain feature data corresponding to each database, and store the feature data for the second time;
第三存储模块304,用于将特征数据进行融合,得到融合特征数据,将融合特征数据进行第三次存储。The third storage module 304 is configured to fuse the feature data to obtain the fusion feature data, and store the fusion feature data for the third time.
在一些实施例中,获取模块301获取多个基础数据的步骤,包括:实时通过多个不同传感器采集基础数据。In some embodiments, the step of acquiring a plurality of basic data by the acquiring module 301 includes: acquiring the basic data through a plurality of different sensors in real time.
在一些实施例中,基础数据的类别至少包括用户操作终端的行为数据、传感器数据和系统运行数据。In some embodiments, the types of basic data include at least behavior data of the user operating terminal, sensor data, and system operation data.
请一并参阅图6,图6为本申请实施例提供的数据存储装置的另一结构示意图。Please refer to FIG. 6 together. FIG. 6 is a schematic diagram of another structure of a data storage device according to an embodiment of the application.
在一些实施例中,第二存储模块303对数据库进行基础数据的特征提取,可以通过机器学习的方法,此时的第二存储模块303可以包括训练单元3031和特征获取单元3032.In some embodiments, the second storage module 303 performs feature extraction of basic data on the database, which may be performed by a machine learning method. At this time, the second storage module 303 may include a training unit 3031 and a feature acquisition unit 3032.
训练单元3031,用于预先训练机器学习模型,得到与基础数据匹配的机器学习模型。训练单元3031可以具体用于:采集各数据库的基础数据;利用数据处理算法从基础数据中提取出特征数据;基于特征数据,训练并优化机器学习模型。The training unit 3031 is used for pre-training the machine learning model to obtain a machine learning model matching the basic data. The training unit 3031 may be specifically used for: collecting basic data of each database; extracting characteristic data from the basic data using a data processing algorithm; training and optimizing a machine learning model based on the characteristic data.
特征获取单元3032,用于当获取到新的基础数据时,将新的基础数据输入至机器学习模型,得到新的特征数据;分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将特征数据进行第二次存储。The feature acquisition unit 3032 is used to input the new basic data into the machine learning model to obtain new feature data when new basic data is acquired; perform feature extraction of the basic data on each database to obtain the corresponding data for each database Characteristic data, store the characteristic data for the second time.
请一并参阅图7,图7为本申请实施例提供的数据存储装置的又一结构示意图。在一些实施例中,第三存储模块304可以包括多表连接单元3041和/或时序对齐单元3042。Please also refer to FIG. 7. FIG. 7 is a schematic diagram of another structure of the data storage device provided by an embodiment of the application. In some embodiments, the third storage module 304 may include a multi-table connection unit 3041 and/or a timing alignment unit 3042.
多表连接单元3041用于将特征数据以多表连接的方式进行融合,具体可以以散列连接的方式进行融合,步骤包括:The multi-table connection unit 3041 is used to fuse the feature data in a multi-table connection manner, and specifically may be combined in a hash connection manner, and the steps include:
获取第一列表与第二列表,第一列表与第二列表分别包含两组不同类型的特征数据,第一列表的数据源小于第二列表的数据源;Acquire a first list and a second list, the first list and the second list respectively contain two different types of characteristic data, and the data source of the first list is smaller than the data source of the second list;
利用连接键为第一列表的数据源建立散列表;Use the connection key to create a hash table for the data source of the first list;
提取第一列表的列数据,将第一列表的列数据存储到散列表中;Extract the column data of the first list, and store the column data of the first list in a hash table;
扫描第二列表,获取第二列表中与散列表匹配的行数据,将与散列表匹配的行与第一列表中对应的内容组合成记录放入结果集中。Scan the second list to obtain row data in the second list that matches the hash table, and combine the rows that match the hash table with the corresponding content in the first list into a record and put it in the result set.
其中,在扫描第二列表,获取第二列表中与散列表匹配的行数据时,多表连接单元3041还用于:Wherein, when scanning the second list to obtain row data matching the hash table in the second list, the multi-table connection unit 3041 is also used to:
扫描第二列表,对连接键进行散列映射,检测散列表;Scan the second list, perform hash mapping on the connection key, and detect the hash table;
当检测出第二列表中存在与散列表相匹配的行数据,获取第二列表中与散射表匹配的行数据,行数据与第一列表的列数据相匹配。When it is detected that there is row data matching the hash table in the second list, the row data matching the scatter table in the second list is obtained, and the row data matches the column data of the first list.
时序对齐单元3042用于将特征数据以时序对齐的方式进行融合,步骤包括:The timing alignment unit 3042 is used for fusing the feature data in a timing alignment manner, and the steps include:
获取两个特征数据库及分别与两个特征数据库对应的两个时序信息,每一个特征数据库包含其对应数据库的全部特征数据;Acquire two feature databases and two time series information corresponding to the two feature databases, each feature database contains all the feature data of its corresponding database;
将两个特征数据库中的特征数据分别按照时序信息进行排列;Arrange the feature data in the two feature databases according to time sequence information;
获取两个时序信息中相同的时序,将相同的时序对应的特征数据进行对齐。Obtain the same timing in the two timing information, and align the feature data corresponding to the same timing.
在一些实施例中,获取两个时序信息中相同的时序,将相同的时序对应的特征数据进行对齐之前,时序对齐单元3042还用于:In some embodiments, before acquiring the same timing in the two timing information and aligning the feature data corresponding to the same timing, the timing alignment unit 3042 is further configured to:
判断两个时序信息中的时序是否能够完全匹配;Determine whether the timing in the two timing information can be completely matched;
当判断出两个时序信息中的时序能够完全匹配时,将相同的时序对应的特征数据进行对齐;When it is judged that the timings in the two timing information can be completely matched, align the characteristic data corresponding to the same timing;
当检测出两个时序信息中的时序不能够完全匹配时,获取两个时序信息中无法进行匹配的待操作时序;When it is detected that the timings in the two timing information cannot be completely matched, obtain the to-be-operated timings that cannot be matched in the two timing information;
判断能否对待操作时序进行数据补齐,数据包括特征数据,进行数据补齐的方法包括插值算法;Judge whether the data can be complemented for the operation sequence to be processed, the data includes characteristic data, and the method of data complementing includes interpolation algorithm;
若判断出能够对待操作时序进行数据补齐,补齐待操作时序对应的数据;If it is judged that the data can be supplemented for the timing sequence to be operated, the data corresponding to the timing sequence to be operated are supplemented;
若判断出无法对待操作时序进行数据补齐,删除待操作时序。If it is judged that the data to be operated sequence cannot be completed, the sequence to be operated is deleted.
在一些实施例中,完全匹配是指两个时序信息中的时序完全相同。In some embodiments, a perfect match means that the timings in the two timing information are completely the same.
在一些实施例中,装置还可以包括备份模块、传输模块。备份模块用于将融合特征数据在终端进行实时备份。传输模块用于将融合特征数据传输至应用服务层或数据处理层,以便应用服务层或数据处理层利用融合信息特征进行计算;或者,传输模块还可以用于将融合特征数据传输至云端,以便云端服务器进行数据分析。In some embodiments, the device may also include a backup module and a transmission module. The backup module is used to back up the fusion feature data in real time at the terminal. The transmission module is used to transmit the fusion feature data to the application service layer or the data processing layer, so that the application service layer or the data processing layer uses the fusion information feature for calculation; or the transmission module can also be used to transmit the fusion feature data to the cloud for Cloud server for data analysis.
由上可知,本申请实施例提供了一种数据存储装置,首先获取模块301获取多个基础数据,多个基础数据分属于多个类别;然后第一存储模块302将多个基础数据按照分属的类别进行归纳整合,将归纳整合后的多个数据进行第一次存储,存储到对应类别的数据库中;接着第二存储模块303分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将特征数据进行第二次存储;最后第三存储模块304将特征数据进行融 合,得到融合特征数据,将融合特征数据进行第三次存储。通过三级存储的方式,将基础数据的关键特征进行提取和融合,能够减少冗余信息。将提取得到的特征数据以及进一步融合得到的融合特征数据进行存储,能够在操作数据时,避免直接对明文数据进行操作,有效保护系统数据的安全性和用户隐私数据的安全性。It can be seen from the above that an embodiment of the present application provides a data storage device. First, the obtaining module 301 obtains multiple basic data, and the multiple basic data belong to multiple categories; then, the first storage module 302 divides the multiple basic data according to their classification. Perform induction and integration of the categories, and store the summarized and integrated multiple data for the first time and store them in the database of the corresponding category; then the second storage module 303 performs feature extraction of the basic data of each database to obtain the corresponding database The feature data is stored for the second time; finally, the third storage module 304 fuses the feature data to obtain the fused feature data, and stores the fused feature data for the third time. Through the three-level storage method, the key features of the basic data are extracted and merged, which can reduce redundant information. Storing the extracted feature data and the fused feature data obtained by further fusion can avoid directly operating on the plaintext data when operating the data, and effectively protect the security of system data and the security of user privacy data.
本申请实施例还提供一种电子设备。电子设备可以是智能手机、平板电脑、游戏设备、AR(Augmented Reality,增强现实)设备、汽车、数据存储装置、音频播放装置、视频播放装置、笔记本、桌面计算设备、可穿戴设备诸如手表、眼镜、头盔、电子手链、电子项链、电子衣物等设备。The embodiment of the application also provides an electronic device. Electronic equipment can be smart phones, tablet computers, gaming equipment, AR (Augmented Reality) equipment, cars, data storage devices, audio playback devices, video playback devices, notebooks, desktop computing devices, wearable devices such as watches, glasses , Helmets, electronic bracelets, electronic necklaces, electronic clothing and other equipment.
参考图8,图8为本申请实施例提供的电子设备800的第一种结构示意图。其中,电子设备800包括处理器801和存储器802。处理器801与存储器802电性连接。Referring to FIG. 8, FIG. 8 is a schematic diagram of a first structure of an electronic device 800 according to an embodiment of the application. The electronic device 800 includes a processor 801 and a memory 802. The processor 801 is electrically connected to the memory 802.
处理器801是电子设备800的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或调用存储在存储器802内的计算机程序,以及调用存储在存储器802内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。The processor 801 is the control center of the electronic device 800. It uses various interfaces and lines to connect the various parts of the entire electronic device, and executes the electronic device by running or calling the computer program stored in the memory 802 and calling the data stored in the memory 802. Various functions and processing data of the equipment, so as to monitor the electronic equipment as a whole.
在本实施例中,电子设备800中的处理器801会按照如下的步骤,将一个或一个以上的计算机程序的进程对应的指令加载到存储器802中,并由处理器801来运行存储在存储器802中的计算机程序,从而实现各种功能:In this embodiment, the processor 801 in the electronic device 800 loads the instructions corresponding to the process of one or more computer programs into the memory 802 according to the following steps, and the processor 801 runs the instructions stored in the memory 802 In order to realize various functions:
获取多个基础数据,多个基础数据分属于多个类别;Obtain multiple basic data, which belong to multiple categories;
将多个基础数据按照分属的类别进行归纳整合,将归纳整合后的多个数据进行第一次存储,存储到对应类别的数据库中;Summarize and integrate multiple basic data according to their respective categories, and store the summarized and integrated multiple data for the first time in the database of the corresponding category;
分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将特征数据进行第二次存储;Perform feature extraction of basic data for each database, obtain feature data corresponding to each database, and store the feature data for the second time;
将特征数据进行融合,得到融合特征数据,将融合特征数据进行第三次存储。The feature data is fused to obtain the fused feature data, and the fused feature data is stored for the third time.
在一些实施例中,基础数据的类别至少包括用户操作终端的行为数据、传感器数据和系统运行数据。In some embodiments, the types of basic data include at least behavior data of the user operating terminal, sensor data, and system operation data.
在一些实施例中,在对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据之前,处理器801执行以下步骤:In some embodiments, before performing feature extraction of basic data on each database to obtain feature data corresponding to each database, the processor 801 performs the following steps:
采集各数据库的基础数据;Collect basic data of each database;
利用数据处理算法从基础数据中提取出特征数据;Use data processing algorithms to extract feature data from basic data;
基于特征数据,训练并优化机器学习模型;Based on feature data, train and optimize machine learning models;
当获取到新的基础数据时,将新的基础数据输入至机器学习模型,得到新的特征数据。When new basic data is obtained, the new basic data is input to the machine learning model to obtain new feature data.
在一些实施例中,将特征数据进行融合时,处理器801执行以下步骤:In some embodiments, when fusing the feature data, the processor 801 performs the following steps:
将特征数据以多表连接的方式进行融合;Combine characteristic data in a multi-table connection mode;
将特征数据以时序对齐的方式进行融合。The feature data is fused in a time-aligned manner.
其中,将特征数据以多表连接的方式进行融合时,处理器801执行以下步骤:Wherein, when the feature data is merged in a multi-table connection manner, the processor 801 performs the following steps:
获取第一列表与第二列表,第一列表与第二列表分别包含两组不同类型的特征数据,第一列表的数据源小于第二列表的数据源;Acquire a first list and a second list, the first list and the second list respectively contain two different types of characteristic data, and the data source of the first list is smaller than the data source of the second list;
利用连接键为第一列表的数据源建立散列表;Use the connection key to create a hash table for the data source of the first list;
提取第一列表的列数据,将第一列表的列数据存储到散列表中;Extract the column data of the first list, and store the column data of the first list in a hash table;
扫描第二列表,获取第二列表中与散列表匹配的行数据,将与散列表匹配的行与第一列表中对应的内容组合成记录放入结果集中。Scan the second list to obtain row data in the second list that matches the hash table, and combine the rows that match the hash table with the corresponding content in the first list into a record and put it in the result set.
在一些实施例中,扫描第二列表,获取第二列表中与散列表匹配的行数据时,处理器801执行以下步骤:In some embodiments, when scanning the second list to obtain row data in the second list that matches the hash table, the processor 801 performs the following steps:
扫描第二列表,对连接键进行散列映射,检测散列表;Scan the second list, perform hash mapping on the connection key, and detect the hash table;
当检测出第二列表中存在与散列表相匹配的行数据,获取第二列表中与散射表匹配的行数据,行数据与第一列表的列数据相匹配。When it is detected that there is row data matching the hash table in the second list, the row data matching the scatter table in the second list is obtained, and the row data matches the column data of the first list.
在一些实施例中,将特征数据以时序对齐的方式进行融合时,处理器801执行以下步骤:In some embodiments, when the feature data is merged in a time-aligned manner, the processor 801 performs the following steps:
获取两个特征数据库及分别与两个特征数据库对应的两个时序信息,每一个特征数据库包含其对应数据库的全部特征数据;Acquire two feature databases and two time series information corresponding to the two feature databases, each feature database contains all the feature data of its corresponding database;
将两个特征数据库中的特征数据分别按照时序信息进行排列;Arrange the feature data in the two feature databases according to time sequence information;
获取两个时序信息中相同的时序,将相同的时序对应的特征数据进行对齐。Obtain the same timing in the two timing information, and align the feature data corresponding to the same timing.
在一些实施例中,在获取两个时序信息中相同的时序,将相同的时序对应的特征数据进行对齐之前,处理器801执行以下步骤:In some embodiments, before acquiring the same timing in the two timing information and aligning the feature data corresponding to the same timing, the processor 801 performs the following steps:
判断两个时序信息中的时序是否能够完全匹配;Determine whether the timing in the two timing information can be completely matched;
当判断出两个时序信息中的时序能够完全匹配时,将相同的时序对应的特征数据进行对齐;When it is judged that the timings in the two timing information can be completely matched, align the characteristic data corresponding to the same timing;
当检测出两个时序信息中的时序不能够完全匹配时,获取两个时序信息中无法进行匹配的待操作时序;When it is detected that the timings in the two timing information cannot be completely matched, obtain the to-be-operated timings that cannot be matched in the two timing information;
判断能否对待操作时序进行数据补齐,数据包括特征数据,进行数据补齐的方法包括插值算法;Judge whether the data can be complemented for the operation sequence to be processed, the data includes characteristic data, and the method of data complementing includes interpolation algorithm;
若判断出能够对待操作时序进行数据补齐,补齐待操作时序对应的数据;If it is judged that the data can be supplemented for the timing sequence to be operated, the data corresponding to the timing sequence to be operated are supplemented;
若判断出无法对待操作时序进行数据补齐,删除待操作时序。If it is judged that the data to be operated sequence cannot be completed, the sequence to be operated is deleted.
在一些实施例中,在获取多个基础数据时,处理器801执行以下步骤:In some embodiments, when acquiring multiple basic data, the processor 801 performs the following steps:
实时通过多个不同传感器采集基础数据。Collect basic data through multiple different sensors in real time.
在一些实施例中,在将融合特征数据进行第三次存储之后,处理器801执行以下步骤:In some embodiments, after storing the fused feature data for the third time, the processor 801 performs the following steps:
将融合特征数据在终端进行实时备份。The fusion feature data is backed up in real time at the terminal.
在一些实施例中,参考图9,图9为本申请实施例提供的电子设备800的第二种结构示意图。In some embodiments, referring to FIG. 9, FIG. 9 is a schematic diagram of a second structure of an electronic device 800 provided in an embodiment of this application.
其中,电子设备800还包括:显示屏803、控制电路804、输入单元805、传感器806以及电源807。其中,处理器801分别与显示屏803、控制电路804、输入单元805、传感器806以及电源807电性连接。Wherein, the electronic device 800 further includes: a display screen 803, a control circuit 804, an input unit 805, a sensor 806, and a power supply 807. The processor 801 is electrically connected to the display screen 803, the control circuit 804, the input unit 805, the sensor 806, and the power source 807, respectively.
显示屏803可用于显示由用户输入的信息或提供给用户的信息以及电子设备的各种图形用户接口,这些图形用户接口可以由图像、文本、图标、视频和其任意组合来构成。The display screen 803 can be used to display information input by the user or information provided to the user and various graphical user interfaces of the electronic device. These graphical user interfaces can be composed of images, text, icons, videos, and any combination thereof.
控制电路804与显示屏803电性连接,用于控制显示屏803显示信息。The control circuit 804 is electrically connected to the display screen 803 and is used for controlling the display screen 803 to display information.
输入单元805可用于接收输入的数字、字符信息或用户特征信息(例如指纹),以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。其中,输入单元805可以包括指纹识别模组。The input unit 805 can be used to receive inputted numbers, character information or user characteristic information (such as fingerprints), and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control. Wherein, the input unit 805 may include a fingerprint recognition module.
传感器806用于采集电子设备自身的信息或者用户的信息或者外部环境信息。例如,传感器806可以包括距离传感器、磁场传感器、光线传感器、加速度传感器、指纹传感器、霍尔传感器、位置传感器、陀螺仪、惯性传感器、姿态感应器、气压计、心率传感器等多个传感器。The sensor 806 is used to collect information of the electronic device itself or information of the user or external environment information. For example, the sensor 806 may include multiple sensors such as a distance sensor, a magnetic field sensor, a light sensor, an acceleration sensor, a fingerprint sensor, a Hall sensor, a position sensor, a gyroscope, an inertial sensor, a posture sensor, a barometer, and a heart rate sensor.
电源807用于给电子设备800的各个部件供电。在一些实施例中,电源807可以通过电源管理系统与处理器801逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。The power supply 807 is used to supply power to various components of the electronic device 800. In some embodiments, the power supply 807 may be logically connected to the processor 801 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system.
尽管图9中未示出,电子设备800还可以包括摄像头、蓝牙模块等,在此不再赘述。Although not shown in FIG. 9, the electronic device 800 may also include a camera, a Bluetooth module, etc., which will not be repeated here.
由上可知,本申请实施例提供了一种电子设备,电子设备中的处理器执行以下步骤:首先获取多个基础数据,多个基础数据分属于多个类别;然后将多个基础数据按照分属的类别进行归纳整合,将归纳整合后的多个数据进行第一次存储,存储到对应类别的数据库中;接着分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将特征数据进行第二次存储;最后将特征数据进行融合,得到融合特征数据,将融合特征数据进行第三次存储。通过三级存储的方式,将基础数据的关键特征进行提取和融合,能够减少冗余信息。将提取得到的特征数据以及进一步融合得到的融合特征数据进行存储,能够在操作数据时,避免直接对明文数据进行操作,有效保护系统数据的安全性和用户隐私数据的安全性。It can be seen from the above that an embodiment of the present application provides an electronic device, and the processor in the electronic device performs the following steps: first obtain multiple basic data, and the multiple basic data belong to multiple categories; Induction and integration are performed on the category of the genus, and the multiple data after the induction and integration are stored for the first time and stored in the database of the corresponding category; then the feature extraction of the basic data is performed on each database, and the feature data corresponding to each database is obtained. The feature data is stored for the second time; finally, the feature data is fused to obtain the fused feature data, and the fused feature data is stored for the third time. Through the three-level storage method, the key features of the basic data are extracted and merged, which can reduce redundant information. Storing the extracted feature data and the fused feature data obtained by further fusion can avoid directly operating on the plaintext data when operating the data, and effectively protect the security of system data and the security of user privacy data.
本申请实施例还提供一种存储介质,存储介质中存储有计算机程序,当计算机程序在计算机上运行时,计算机执行上述任一实施例的数据存储方法。An embodiment of the present application also provides a storage medium in which a computer program is stored. When the computer program is run on a computer, the computer executes the data storage method of any of the foregoing embodiments.
例如,在一些实施例中,当计算机程序在计算机上运行时,计算机执行以下步骤:For example, in some embodiments, when the computer program runs on the computer, the computer performs the following steps:
获取多个基础数据,多个基础数据分属于多个类别;Obtain multiple basic data, which belong to multiple categories;
将多个基础数据按照分属的类别进行归纳整合,将归纳整合后的多个数据进行第一次存储,存储到对应类别的数据库中;Summarize and integrate multiple basic data according to their respective categories, and store the summarized and integrated multiple data for the first time in the database of the corresponding category;
分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将特征数据进行第二次存储;Perform feature extraction of basic data for each database, obtain feature data corresponding to each database, and store the feature data for the second time;
将特征数据进行融合,得到融合特征数据,将融合特征数据进行第三次存储。The feature data is fused to obtain the fused feature data, and the fused feature data is stored for the third time.
需要说明的是,本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过计算机程序来指令相关的硬件来完成,计算机程序可以存储于计算机可读存储介质中,存储介质可以包括但不限于:只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或光盘等。It should be noted that those of ordinary skill in the art can understand that all or part of the steps in the various methods of the foregoing embodiments can be completed by instructing relevant hardware through a computer program, which can be stored in a computer-readable storage medium. The storage medium may include, but is not limited to: read only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks, etc.
以上对本申请实施例所提供的一种数据存储方法、装置、存储介质及电子设备进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The data storage method, device, storage medium, and electronic equipment provided by the embodiments of the application are described in detail above. Specific examples are used in this article to illustrate the principles and implementation of the application. The description of the above embodiments is only It is used to help understand the methods and core ideas of this application; at the same time, for those skilled in the art, according to the ideas of this application, there will be changes in the specific implementation and the scope of application. In summary, this specification The content should not be construed as a limitation on this application.

Claims (20)

  1. 一种数据存储方法,其中,包括:A data storage method, which includes:
    获取多个基础数据,所述多个基础数据分属于多个类别;Acquiring multiple basic data, the multiple basic data belonging to multiple categories;
    将所述多个基础数据按照分属的类别进行归纳整合,将归纳整合后的所述多个数据进行第一次存储,存储到对应类别的数据库中;Summarize and integrate the multiple basic data according to their respective categories, and store the summarized and integrated data for the first time in a database of the corresponding category;
    分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将所述特征数据进行第二次存储;Perform feature extraction of basic data on each database, obtain feature data corresponding to each database, and store the feature data for a second time;
    将所述特征数据进行融合,得到融合特征数据,将所述融合特征数据进行第三次存储。The feature data is fused to obtain fused feature data, and the fused feature data is stored for the third time.
  2. 根据权利要求1所述的数据存储方法,其中,所述基础数据的类别至少包括用户操作终端的行为数据、传感器数据和系统运行数据。The data storage method according to claim 1, wherein the categories of the basic data include at least behavior data of a user operating terminal, sensor data, and system operation data.
  3. 根据权利要求2所述的数据存储方法,其中,所述分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据之前,还包括:The data storage method according to claim 2, wherein, before the feature extraction of the basic data of each database is performed to obtain the feature data corresponding to each database, the method further comprises:
    采集各数据库的基础数据;Collect basic data of each database;
    利用数据处理算法从所述基础数据中提取出特征数据;Extracting characteristic data from the basic data by using a data processing algorithm;
    基于所述特征数据,训练并优化机器学习模型;Based on the feature data, train and optimize a machine learning model;
    当获取到新的基础数据时,将所述新的基础数据输入至所述机器学习模型,得到新的特征数据。When new basic data is acquired, the new basic data is input to the machine learning model to obtain new feature data.
  4. 根据权利要求1所述的数据存储方法,其中,所述将所述特征数据进行融合包括:The data storage method according to claim 1, wherein said fusing the characteristic data comprises:
    将所述特征数据以多表连接的方式进行融合;Fuse the characteristic data in a multi-table connection manner;
    将所述特征数据以时序对齐的方式进行融合。The feature data is fused in a time-aligned manner.
  5. 根据权利要求4所述的数据存储方法,其中,所述将所述特征数据以多表连接的方式进行融合,包括:The data storage method according to claim 4, wherein said fusing the characteristic data in a multi-table connection manner comprises:
    获取第一列表与第二列表,所述第一列表与第二列表分别包含两组不同类型的特征数据,所述第一列表的数据源小于第二列表的数据源;Acquiring a first list and a second list, the first list and the second list respectively containing two sets of different types of characteristic data, the data source of the first list is smaller than the data source of the second list;
    利用连接键为所述第一列表的数据源建立散列表;Establishing a hash table for the data source of the first list by using the connection key;
    提取所述第一列表的列数据,将所述第一列表的列数据存储到散列表中;Extract the column data of the first list, and store the column data of the first list in a hash table;
    扫描第二列表,获取所述第二列表中与所述散列表匹配的行数据,将所述与所述散列表匹配的行与第一列表中对应的内容组合成记录放入结果集中。The second list is scanned to obtain row data in the second list that matches the hash table, and the rows that match the hash table and the corresponding content in the first list are combined into a record and placed in a result set.
  6. 根据权利要求5所述的数据存储方法,其中,所述扫描第二列表,获取所述第二列表中与所述散列表匹配的行数据,包括:The data storage method according to claim 5, wherein the scanning the second list to obtain row data in the second list that matches the hash table comprises:
    扫描第二列表,对所述连接键进行散列映射,检测所述散列表;Scan the second list, perform hash mapping on the connection key, and detect the hash table;
    当检测出所述第二列表中存在与所述散列表相匹配的行数据,获取所述第二列表中与所述散射表匹配的行数据,所述行数据与所述第一列表的列数据相匹配。When it is detected that there is row data in the second list that matches the hash table, the row data in the second list that matches the scatter table is acquired, and the row data is the same as the column of the first list. The data matches.
  7. 根据权利要求4所述的数据存储方法,其中,所述将所述特征数据以时序对齐的方式进行融合,包括:。The data storage method according to claim 4, wherein said fusing the characteristic data in a time-aligned manner comprises:.
    获取两个特征数据库及分别与所述两个特征数据库对应的两个时序信息,每一个所述特征数据库包含其对应数据库的全部特征数据;Acquiring two feature databases and two time series information corresponding to the two feature databases, each of the feature databases contains all the feature data of its corresponding database;
    将两个所述特征数据库中的特征数据分别按照时序信息进行排列;Arrange the feature data in the two feature databases according to time sequence information;
    获取两个时序信息中相同的时序,将所述相同的时序对应的特征数据进行对齐。Obtain the same sequence in the two sequence information, and align the characteristic data corresponding to the same sequence.
  8. 根据权利要求1所述的数据存储方法,其中,所述获取多个基础数据包括:The data storage method according to claim 1, wherein said acquiring a plurality of basic data comprises:
    实时通过多个不同传感器采集基础数据。Collect basic data through multiple different sensors in real time.
  9. 根据权利要求1所述的数据存储方法,其中,所述将所述融合特征数据进行第三次存储之后,还包括:The data storage method according to claim 1, wherein, after storing the fused feature data for the third time, the method further comprises:
    将所述融合特征数据在终端进行实时备份。The fusion feature data is backed up in real time at the terminal.
  10. 一种数据存储装置,其中,包括:A data storage device, which includes:
    获取模块,用于获取多个基础数据,所述多个基础数据分属于多个类别;An obtaining module, used to obtain a plurality of basic data, the plurality of basic data belong to a plurality of categories;
    第一存储模块,用于将所述多个基础数据按照分属的类别进行归纳整合,将归纳整合后的所述多个数据进行第一次存储,存储到对应类别的数据库中;The first storage module is configured to summarize and integrate the multiple basic data according to their respective categories, and store the summarized and integrated multiple data for the first time in a database of the corresponding category;
    第二存储模块,用于分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将所述特征数据进行第二次存储;The second storage module is used to perform feature extraction of basic data for each database to obtain feature data corresponding to each database, and store the feature data for the second time;
    第三存储模块,用于将所述特征数据进行融合,得到融合特征数据,将所述融合特征数据进行第三次存储。The third storage module is used to fuse the feature data to obtain the fusion feature data, and store the fusion feature data for the third time.
  11. 一种存储介质,其上存储有计算机程序,其中,当所述计算机程序在计算机上运行时,使得所述计算机执行:A storage medium on which a computer program is stored, wherein when the computer program runs on a computer, the computer is caused to execute:
    获取多个基础数据,所述多个基础数据分属于多个类别;Acquiring multiple basic data, the multiple basic data belonging to multiple categories;
    将所述多个基础数据按照分属的类别进行归纳整合,将归纳整合后的所述多个数据进行第一次存储,存储到对应类别的数据库中;Summarize and integrate the multiple basic data according to their respective categories, and store the summarized and integrated data for the first time in a database of the corresponding category;
    分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将所述特征数据进行第二次存储;Perform feature extraction of basic data on each database, obtain feature data corresponding to each database, and store the feature data for a second time;
    将所述特征数据进行融合,得到融合特征数据,将所述融合特征数据进行第三次存储。The feature data is fused to obtain fused feature data, and the fused feature data is stored for the third time.
  12. 一种电子设备,包括处理器和存储器,所述存储器存储有计算机程序,其中,所述处理器通过调用所述计算机程序,用于执行:An electronic device includes a processor and a memory, the memory stores a computer program, wherein the processor is configured to execute:
    获取多个基础数据,所述多个基础数据分属于多个类别;Acquiring multiple basic data, the multiple basic data belonging to multiple categories;
    将所述多个基础数据按照分属的类别进行归纳整合,将归纳整合后的所述多个数据进行第一次存储,存储到对应类别的数据库中;Summarize and integrate the multiple basic data according to their respective categories, and store the summarized and integrated data for the first time in a database of the corresponding category;
    分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据,将所述特征数据进行第二次存储;Perform feature extraction of basic data on each database, obtain feature data corresponding to each database, and store the feature data for a second time;
    将所述特征数据进行融合,得到融合特征数据,将所述融合特征数据进行第三次存储。The feature data is fused to obtain fused feature data, and the fused feature data is stored for the third time.
  13. 根据权利要求11所述的电子设备,其中,所述基础数据的类别至少包括用户操作终端的行为数据、传感器数据和系统运行数据。11. The electronic device according to claim 11, wherein the category of the basic data includes at least behavior data of a user operating terminal, sensor data, and system operation data.
  14. 根据权利要求13所述的电子设备,其中,在分别对各数据库进行基础数据的特征提取,得到每一个数据库对应的特征数据之前,所述处理器还用于执行:The electronic device according to claim 13, wherein, before the feature extraction of the basic data is performed on each database to obtain the feature data corresponding to each database, the processor is further configured to execute:
    采集各数据库的基础数据;Collect basic data of each database;
    利用数据处理算法从所述基础数据中提取出特征数据;Extracting characteristic data from the basic data by using a data processing algorithm;
    基于所述特征数据,训练并优化机器学习模型;Based on the feature data, train and optimize a machine learning model;
    当获取到新的基础数据时,将所述新的基础数据输入至所述机器学习模型,得到新的特征数据。When new basic data is acquired, the new basic data is input to the machine learning model to obtain new feature data.
  15. 根据权利要求13所述的电子设备,其中,所述将所述特征数据进行融合包括:The electronic device according to claim 13, wherein said fusing the characteristic data comprises:
    将所述特征数据以多表连接的方式进行融合;Fuse the characteristic data in a multi-table connection manner;
    将所述特征数据以时序对齐的方式进行融合。The feature data is fused in a time-aligned manner.
  16. 根据权利要求15所述的电子设备,其中,在将所述特征数据以多表连接的方式进行融合时,所述处理器用于执行:The electronic device according to claim 15, wherein when the characteristic data is merged in a multi-table connection manner, the processor is configured to execute:
    获取第一列表与第二列表,所述第一列表与第二列表分别包含两组不同类型的特征数据,所述第一列表的数据源小于第二列表的数据源;Acquiring a first list and a second list, the first list and the second list respectively containing two sets of different types of characteristic data, the data source of the first list is smaller than the data source of the second list;
    利用连接键为所述第一列表的数据源建立散列表;Establishing a hash table for the data source of the first list by using the connection key;
    提取所述第一列表的列数据,将所述第一列表的列数据存储到散列表中;Extract the column data of the first list, and store the column data of the first list in a hash table;
    扫描第二列表,获取所述第二列表中与所述散列表匹配的行数据,将所述与所述散列表匹配的行与第一列表中对应的内容组合成记录放入结果集中。The second list is scanned to obtain row data in the second list that matches the hash table, and the rows that match the hash table and the corresponding content in the first list are combined into a record and placed in a result set.
  17. 根据权利要求16所述的电子设备,其中,在扫描第二列表,获取所述第二列表中与所述散列表匹配的行数据时,所述处理器用于执行:The electronic device according to claim 16, wherein, when scanning the second list to obtain row data in the second list that matches the hash table, the processor is configured to execute:
    扫描第二列表,对所述连接键进行散列映射,检测所述散列表;Scan the second list, perform hash mapping on the connection key, and detect the hash table;
    当检测出所述第二列表中存在与所述散列表相匹配的行数据,获取所述第二列表中与所述散射表匹配的行数据,所述行数据与所述第一列表的列数据相匹配。When it is detected that there is row data in the second list that matches the hash table, the row data in the second list that matches the scatter table is acquired, and the row data is the same as the column of the first list. The data matches.
  18. 根据权利要求15所述的电子设备,其中,在将所述特征数据以时序对齐的方式进行融合时,所述处理器用于执行:The electronic device according to claim 15, wherein, when the feature data is merged in a time-aligned manner, the processor is configured to execute:
    获取两个特征数据库及分别与所述两个特征数据库对应的两个时序信息,每一个所述特征数据库包含其对应数据库的全部特征数据;Acquiring two feature databases and two time series information corresponding to the two feature databases, each of the feature databases contains all the feature data of its corresponding database;
    将两个所述特征数据库中的特征数据分别按照时序信息进行排列;Arrange the feature data in the two feature databases according to time sequence information;
    获取两个时序信息中相同的时序,将所述相同的时序对应的特征数据进行对齐。Obtain the same sequence in the two sequence information, and align the characteristic data corresponding to the same sequence.
  19. 根据权利要求12所述的电子设备,其中,在获取多个基础数据时,所述处理器用于执行:The electronic device according to claim 12, wherein when acquiring a plurality of basic data, the processor is configured to execute:
    实时通过多个不同传感器采集基础数据。Collect basic data through multiple different sensors in real time.
  20. 根据权利要求12所述的电子设备,其中,在将所述融合特征数据进行第三次存储之后,所述处理器还用于执行:The electronic device according to claim 12, wherein, after storing the fused feature data for the third time, the processor is further configured to execute:
    将所述融合特征数据在终端进行实时备份。The fusion feature data is backed up in real time at the terminal.
PCT/CN2020/081158 2019-04-09 2020-03-25 Data storage method and device, storage medium, and electronic apparatus WO2020207252A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910282158.5A CN111797175B (en) 2019-04-09 2019-04-09 Data storage method and device, storage medium and electronic equipment
CN201910282158.5 2019-04-09

Publications (1)

Publication Number Publication Date
WO2020207252A1 true WO2020207252A1 (en) 2020-10-15

Family

ID=72750970

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/081158 WO2020207252A1 (en) 2019-04-09 2020-03-25 Data storage method and device, storage medium, and electronic apparatus

Country Status (2)

Country Link
CN (1) CN111797175B (en)
WO (1) WO2020207252A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116048883A (en) * 2023-02-20 2023-05-02 李红亮 Big data disaster recovery analysis method and server adopting artificial intelligence

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113701819A (en) * 2021-08-31 2021-11-26 四川省建筑科学研究院有限公司 Building structure monitoring method, monitoring device, monitoring system and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933296A (en) * 2015-05-28 2015-09-23 汤海京 Big data processing method based on multi-dimensional data fusion and big data processing equipment based on multi-dimensional data fusion
CN107103094A (en) * 2017-05-18 2017-08-29 前海梧桐(深圳)数据有限公司 Data among enterprises incidence relation method for catching and its system based on mass data
US20180018391A1 (en) * 2016-07-13 2018-01-18 Yahoo Japan Corporation Data classification device, data classification method, and non-transitory computer readable storage medium
CN108764372A (en) * 2018-06-08 2018-11-06 Oppo广东移动通信有限公司 Construction method and device, mobile terminal, the readable storage medium storing program for executing of data set
CN108961302A (en) * 2018-07-16 2018-12-07 Oppo广东移动通信有限公司 Image processing method, device, mobile terminal and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933296A (en) * 2015-05-28 2015-09-23 汤海京 Big data processing method based on multi-dimensional data fusion and big data processing equipment based on multi-dimensional data fusion
US20180018391A1 (en) * 2016-07-13 2018-01-18 Yahoo Japan Corporation Data classification device, data classification method, and non-transitory computer readable storage medium
CN107103094A (en) * 2017-05-18 2017-08-29 前海梧桐(深圳)数据有限公司 Data among enterprises incidence relation method for catching and its system based on mass data
CN108764372A (en) * 2018-06-08 2018-11-06 Oppo广东移动通信有限公司 Construction method and device, mobile terminal, the readable storage medium storing program for executing of data set
CN108961302A (en) * 2018-07-16 2018-12-07 Oppo广东移动通信有限公司 Image processing method, device, mobile terminal and computer readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116048883A (en) * 2023-02-20 2023-05-02 李红亮 Big data disaster recovery analysis method and server adopting artificial intelligence

Also Published As

Publication number Publication date
CN111797175B (en) 2023-12-19
CN111797175A (en) 2020-10-20

Similar Documents

Publication Publication Date Title
JP7201730B2 (en) Intention recommendation method, device, equipment and storage medium
Pouyanfar et al. Multimedia big data analytics: A survey
US10719759B2 (en) System for building a map and subsequent localization
CN104239501B (en) Mass video semantic annotation method based on Spark
US20130106685A1 (en) Context-sensitive query enrichment
US11900688B2 (en) People and vehicle analytics on the edge
US11586667B1 (en) Hyperzoom attribute analytics on the edge
WO2020207252A1 (en) Data storage method and device, storage medium, and electronic apparatus
WO2023168998A1 (en) Video clip identification method and apparatus, device, and storage medium
CN115114395B (en) Content retrieval and model training method and device, electronic equipment and storage medium
CN111930964A (en) Content processing method, device, equipment and storage medium
CN111797851A (en) Feature extraction method and device, storage medium and electronic equipment
CN111798259A (en) Application recommendation method and device, storage medium and electronic equipment
US10609442B2 (en) Method and apparatus for generating and annotating virtual clips associated with a playable media file
CN111767880B (en) Living body identity recognition method and device based on facial features and storage medium
CN111178455B (en) Image clustering method, system, device and medium
CN111797856B (en) Modeling method and device, storage medium and electronic equipment
KR20190101692A (en) Video watch method based on transfer of learning
CN111797422A (en) Data privacy protection query method and device, storage medium and electronic equipment
CN109446356A (en) A kind of multimedia document retrieval method and device
CN114973352A (en) Face recognition method, device, equipment and storage medium
US20180293299A1 (en) Query processing
CN115546516A (en) Personnel gathering method and device, computer equipment and storage medium
WO2020207297A1 (en) Information processing method, storage medium, and electronic device
CN111797227A (en) Information processing method, information processing apparatus, storage medium, and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20787391

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20787391

Country of ref document: EP

Kind code of ref document: A1