WO2020207252A1

WO2020207252A1 - Data storage method and device, storage medium, and electronic apparatus

Info

Publication number: WO2020207252A1
Application number: PCT/CN2020/081158
Authority: WO
Inventors: 何明; 陈仲铭; 徐鑫; 刘耀勇; 陈岩
Original assignee: Oppo广东移动通信有限公司
Priority date: 2019-04-09
Filing date: 2020-03-25
Publication date: 2020-10-15
Also published as: CN111797175B; CN111797175A

Abstract

A data storage method and device, a storage medium, and an electronic apparatus. The method comprises: obtaining multiple pieces of basic data respectively belonging to multiple categories (110); summarizing and integrating the multiple pieces of basic data according to the respective categories thereof, and then performing a first storage operation (120); performing feature extraction on basic data in each of databases so as to obtain feature data corresponding to each database, and performing a second storage operation (130); and fusing the feature data so as to obtain fused feature data, and performing a third storage operation (140).

Description

Data storage method, device, storage medium and electronic equipment

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201910282158.5, and the invention title is "data storage methods, devices, storage media and electronic equipment" on April 9, 2019, the entire contents of which are incorporated by reference In this application.

Technical field

This application relates to the field of electronic technology, in particular to a data storage method, device, storage medium and electronic equipment.

Background technique

With the development of electronic technology, electronic devices such as smart phones are becoming more and more intelligent. Electronic equipment can process data through various algorithm models to provide users with various functions. For electronic devices that need to collect large amounts of data, the security of system data and the security of user privacy data are both important.

Summary of the invention

The embodiments of the present application provide a data storage method, device, storage medium, and electronic equipment, which can take into account the security of system data and the security of user privacy data.

In the first aspect, an embodiment of the present application provides a data storage method applied to an electronic device, wherein the data storage method includes:

Acquiring multiple basic data, the multiple basic data belonging to multiple categories;

Summarize and integrate the multiple basic data according to their respective categories, and store the summarized and integrated data for the first time in a database of the corresponding category;

Perform feature extraction of basic data on each database, obtain feature data corresponding to each database, and store the feature data for a second time;

The feature data is fused to obtain fused feature data, and the fused feature data is stored for the third time.

In the second aspect, an embodiment of the present application provides a data storage device, including:

An obtaining module, used to obtain a plurality of basic data, the plurality of basic data belong to a plurality of categories;

The first storage module is configured to summarize and integrate the multiple basic data according to their respective categories, and store the summarized and integrated multiple data for the first time in a database of the corresponding category;

The second storage module is used to perform feature extraction of basic data for each database to obtain feature data corresponding to each database, and store the feature data for the second time;

The third storage module is used to fuse the feature data to obtain the fusion feature data, and store the fusion feature data for the third time.

In a third aspect, embodiments of the present application provide a storage medium, in which a computer program is stored in the storage medium, and when the computer program runs on a computer, the computer executes:

In a fourth aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a processor and a memory, and a computer program is stored in the memory, and the processor calls the computer program stored in the memory to execute:

Description of the drawings

FIG. 1 is a schematic diagram of an application scenario of a data storage method provided by an embodiment of the application.

FIG. 2 is a schematic diagram of the first flow of a data storage method provided by an embodiment of this application.

FIG. 3 is a schematic diagram of another application scenario of the data storage method provided by an embodiment of the application.

FIG. 4 is a schematic diagram of a second flow of a data storage method provided by an embodiment of the application.

FIG. 5 is a schematic structural diagram of a data storage device provided by an embodiment of the application.

FIG. 6 is a schematic diagram of another structure of a data storage device provided by an embodiment of the application.

FIG. 7 is a schematic diagram of another structure of a data storage device provided by an embodiment of the application.

FIG. 8 is a schematic diagram of the first structure of an electronic device provided by an embodiment of this application.

FIG. 9 is a schematic diagram of a second structure of an electronic device provided by an embodiment of the application.

detailed description

Please refer to the drawings, in which the same component symbols represent the same components, and the principle of the present application is implemented in an appropriate computing environment for illustration. The following description is based on the exemplified specific embodiments of the present application, which should not be regarded as limiting other specific embodiments that are not described in detail herein.

The embodiment of the present application provides a data storage method, including:

In an embodiment, the categories of the basic data include at least behavior data of the user operating terminal, sensor data, and system operation data.

In an embodiment, before the feature extraction of the basic data is performed on each database to obtain the feature data corresponding to each database, the method further includes:

Collect basic data of each database;

Extracting characteristic data from the basic data by using a data processing algorithm;

Based on the feature data, train and optimize a machine learning model;

When new basic data is acquired, the new basic data is input to the machine learning model to obtain new feature data.

In an embodiment, the fusing the feature data includes:

Fuse the characteristic data in a multi-table connection manner;

The feature data is fused in a time-aligned manner.

In an embodiment, the fusion of the characteristic data in a multi-table connection manner includes:

Acquiring a first list and a second list, the first list and the second list respectively containing two sets of different types of characteristic data, the data source of the first list is smaller than the data source of the second list;

Establishing a hash table for the data source of the first list by using the connection key;

Extract the column data of the first list, and store the column data of the first list in a hash table;

The second list is scanned to obtain row data in the second list that matches the hash table, and the rows that match the hash table and the corresponding content in the first list are combined into a record and placed in a result set.

In an embodiment, the scanning the second list to obtain row data in the second list that matches the hash table includes:

Scan the second list, perform hash mapping on the connection key, and detect the hash table;

When it is detected that there is row data in the second list that matches the hash table, the row data in the second list that matches the scatter table is acquired, and the row data is the same as the column of the first list. The data matches.

In an embodiment, the fusion of the feature data in a time-aligned manner includes:.

Acquiring two feature databases and two time series information corresponding to the two feature databases, each of the feature databases contains all the feature data of its corresponding database;

Arrange the feature data in the two feature databases according to time sequence information;

Obtain the same sequence in the two sequence information, and align the characteristic data corresponding to the same sequence.

In an embodiment, the acquiring multiple basic data includes:

Collect basic data through multiple different sensors in real time.

In an embodiment, after storing the fused feature data for the third time, the method further includes:

The fusion feature data is backed up in real time at the terminal.

Referring to FIG. 1, FIG. 1 is a schematic diagram of an application scenario of a data storage method provided by an embodiment of the application. The data storage method is applied to electronic equipment. A panoramic perception architecture is provided in the electronic device. The panoramic perception architecture is the integration of hardware and software used to implement data storage methods in electronic devices.

Among them, the panoramic perception architecture includes an information perception layer, a data processing layer, a feature extraction layer, a scenario modeling layer, and an intelligent service layer.

The information perception layer is used to obtain the information of the electronic device itself and/or the information in the external environment. The information perception layer may include multiple sensors. For example, the information sensing layer includes multiple sensors such as distance sensors, magnetic field sensors, light sensors, acceleration sensors, fingerprint sensors, Hall sensors, position sensors, gyroscopes, inertial sensors, attitude sensors, barometers, heart rate sensors, and so on.

Among them, the distance sensor can be used to detect the distance between the electronic device and an external object. The magnetic field sensor can be used to detect the magnetic field information of the environment in which the electronic device is located. The light sensor can be used to detect the light information of the environment in which the electronic device is located. The acceleration sensor can be used to detect the acceleration data of the electronic device. The fingerprint sensor can be used to collect the user's fingerprint information. Hall sensor is a kind of magnetic field sensor made according to Hall effect, which can be used to realize automatic control of electronic equipment. The location sensor can be used to detect the current geographic location of the electronic device. The gyroscope can be used to detect the angular velocity of electronic equipment in all directions. Inertial sensors can be used to detect movement data of electronic devices. The attitude sensor can be used to sense the attitude information of the electronic device. The barometer can be used to detect the air pressure of the environment where the electronic device is located. The heart rate sensor can be used to detect the user's heart rate information.

The data processing layer is used to process the data obtained by the information perception layer. For example, the data processing layer can perform data cleaning, data integration, data transformation, and data reduction on the data acquired by the information perception layer.

Among them, data cleaning refers to cleaning up a large amount of data obtained by the information perception layer to eliminate invalid data and duplicate data. Data integration refers to the integration of multiple single-dimensional data acquired by the information perception layer into a higher or more abstract dimension to comprehensively process multiple single-dimensional data. Data transformation refers to the data type conversion or format conversion of the data acquired by the information perception layer, so that the transformed data meets the processing requirements. Data reduction means to minimize the amount of data while maintaining the original appearance of the data as much as possible.

The feature extraction layer is used to perform feature extraction on the data processed by the data processing layer to extract the features included in the data. The extracted features can reflect the state of the electronic device itself or the state of the user or the environmental state of the environment in which the electronic device is located.

Among them, the feature extraction layer can extract features or process the extracted features through methods such as filtering, packaging, and integration.

The filtering method refers to filtering the extracted features to delete redundant feature data. The packaging method is used to screen the extracted features. The integration method refers to the integration of multiple feature extraction methods to construct a more efficient and accurate feature extraction method for feature extraction.

The scenario modeling layer is used to construct a model based on the features extracted by the feature extraction layer, and the obtained model can be used to represent the state of the electronic device or the state of the user or the environment. For example, the scenario modeling layer can construct key value models, pattern identification models, graph models, entity connection models, object-oriented models, etc. based on the features extracted by the feature extraction layer.

The intelligent service layer is used to provide users with intelligent services based on the model constructed by the scenario modeling layer. For example, the intelligent service layer can provide users with basic application services, can perform system intelligent optimization for electronic devices, and can also provide users with personalized intelligent services.

In addition, the panoramic perception architecture can also include multiple algorithms, each of which can be used to analyze and process data, and multiple algorithms can form an algorithm library. For example, the algorithm library can include Markov algorithm, implicit Dirichlet distribution algorithm, Bayesian classification algorithm, support vector machine, K-means clustering algorithm, K-nearest neighbor algorithm, conditional random field, residual network, long Algorithms such as short-term memory networks, convolutional neural networks, and recurrent neural networks.

The embodiment of the present application provides a data storage method, and the data storage method can be applied to an electronic device. Electronic equipment can be smart phones, tablet computers, gaming equipment, AR (Augmented Reality) equipment, cars, data storage devices, audio playback devices, video playback devices, notebooks, desktop computing devices, wearable devices such as watches, glasses , Helmets, electronic bracelets, electronic necklaces, electronic clothing and other equipment.

Referring to FIG. 2, FIG. 2 is a schematic flowchart of the first data storage method provided by an embodiment of the application. Among them, the data storage method includes the following steps:

110. Obtain multiple basic data, which belong to multiple categories.

The basic data may include operating information of the electronic device, configuration information of the electronic device, user information, current environment information, and so on. Specifically, the basic data can be collected through one or more sensors, or can be collected in real time. For example, through at least one of distance sensors, magnetic field sensors, light sensors, acceleration sensors, fingerprint sensors, Hall sensors, position sensors, gyroscopes, inertial sensors, attitude sensors, barometers, blood pressure sensors, pulse sensors, heart rate sensors, etc. One to obtain current environmental information and related information of electronic equipment. Among them, the current environment information includes the user's physical information, such as blood pressure, pulse, heart rate, etc. The related information of the electronic device includes the operation information of the electronic device, the configuration information of the electronic device, and user information stored in the electronic device. Among them, the user information includes the user's identity information, personal hobbies, browsing history, personal collection and other human-computer interaction information. The operating information of electronic equipment includes power-on time, power-off time, standby time, memory usage at each point in time, main chip utilization at each point in time, current running program information, background running program information, running time of each program, and each program Downloads, etc. In some embodiments, the basic data may also include behavior data of the user operating terminal, sensor data, and system operation data.

120. The multiple basic data are summarized and integrated according to their respective categories, and the multiple data after the summary and integration are stored for the first time and stored in the database of the corresponding category.

After obtaining multiple basic data, store them in the first storage module. For example, multiple panorama perception information can be stored in the hard disk. Among them, you can set up multiple databases, and store their basic data in corresponding databases according to categories.

Cluster all the basic data, summarize and integrate multiple basic data according to their respective categories, and aggregate the same basic data to form a data set, thereby obtaining multiple data sets of multiple types of basic data. Among them, basic data can be classified according to the hardware attributes of the data, such as main chip related data, display screen related data, hard disk related data, memory related data, and various sensor related data. Basic data can also be classified according to corresponding applications, such as data related to system applications and data related to installed applications; among them, data related to installed applications can be further classified according to specific applications, such as instant messaging Application-related data, map application-related data, shopping application-related data, etc. The basic data is stored in the corresponding database according to the category, which effectively isolates the irrelevant data, so that the data can be stored independently. In some embodiments, obtaining the time series index corresponding to each database can also facilitate the indexing of basic data.

The basic data of the same type is stored in the same database. A piece of basic data can be stored in a database, for example, acceleration sensor data is only stored in the acceleration sensor database. A piece of basic data can also be stored in multiple databases. For example, when a piece of basic data falls into two categories, the basic data can be copied, and the copied basic data and the original basic data can be stored in two Among these databases, the two databases correspond to the two categories to which this basic data belongs. It should be noted that not only the currently acquired basic data can be stored in the database, but also the previously acquired basic data.

130. Perform feature extraction of basic data on each database, obtain feature data corresponding to each database, and store the feature data for a second time.

Perform a separate feature extraction on the data in a separate database to obtain the feature data corresponding to each database. The feature extraction layer can be set up to perform feature extraction on the basic data in a variety of ways. There can be different feature extraction methods for different data. Each type of data format and data content can be different. For example, the wifi connection information in the sensor data is very limited. When the wifi signal is not connected, the wifi information will not be stored and recorded; relatively speaking, it is for IMU data. It returns at a frequency of Hertz per second, and can store up to G data in one day. The feature extraction of basic data from the database is beneficial to reduce redundant information and save storage space on the one hand, and on the other hand, it can effectively extract important meanings in the basic data. Taking audio information as an example, audio information belongs to time series information. As time increases, the data of audio information continues to grow. Therefore, it is necessary to perform feature extraction on the data to reduce the amount of data. Take audio information with dual microphone channels, 32bit bit width, and sampling frequency of 44100 as an example. The data generated in 5 minutes is about 1G. After feature extraction, the important features of each time window are obtained. At this time, the features can be in vector form For storage, 1G of data can be compressed to hundreds of k.

In addition, the storage mode of the first storage and the second storage may be a triggered data return method, that is, when multiple basic data is acquired in step 110, the data return method may be a triggered return method. For example, for the network module, when the WIFI function is turned on, it will search for nearby available networks. At this time, the data detected by the network module is transmitted to the system. When the system collects basic data, it monitors and collects system notification messages.

In some embodiments, the feature extraction of basic data is performed on the database by a manual preset method, and important features in the basic data of each category are preset. The basic data is clustered and stored in the corresponding database, the basic data in the same database is identified with the same important characteristics, and the preset important characteristics are extracted to correspond to the specific data of each basic data, as the characteristic data, the characteristic data Perform a second storage.

In some embodiments, the method of pre-training the machine learning model is used to extract the features of the basic data from the database. Specifically, it may be: pre-training the machine learning model to obtain a machine learning model that matches the basic data; and inputting the basic data into the machine learning model , Obtain the model output results, and use the model output results as feature data.

First, collect the basic data of each database; use data processing algorithms to extract feature data from the basic data; train and optimize the machine learning model based on the feature data; when new basic data is obtained, input the new basic data into the machine Learn the model and get new feature data.

The characteristic data corresponding to each database is obtained, and the characteristic data can be stored in the second storage module for the second storage. The second storage module does not need to store a large amount of original basic data, only the corresponding characteristic data needs to be stored. The feature extraction of the basic data effectively extracts the important features of the basic data, reduces the redundant information of the original basic data, and saves storage space. Compared with the first storage in step 120, the amount of data stored in the second storage is greatly reduced. It should be noted that the feature extraction of the basic data of the database and the storage of the extracted feature data can avoid directly storing the original data format, strictly control information security, and protect user privacy. By extracting the features of the basic data from the database, the source data can be desensitized, and the user data desensitized by the feature layer can be effectively recorded, reducing data redundancy and facilitating subsequent use.

In some embodiments, the time series index corresponding to each database can also be obtained, and the time series index corresponding to each database is also stored in the second storage module (such as memory), so that other modules of the system can search in the database according to the time series index. To the corresponding basic data. Through the clustering method, multi-source heterogeneous basic data is clustered in time series, which effectively compresses the original basic data, reduces the redundant information of the basic data, and realizes the real-time indexing and access of the basic data. The computing resources and storage resources of electronic devices are limited, and the reasonable access and distribution of basic data can speed up the retrieval of panoramic perception information.

140. Fusion feature data to obtain fusion feature data, and store the fusion feature data for a third time.

Before the third storage, feature data fusion is performed on the content of the second storage. Specifically, the feature data can be fused using a multi-table connection, or the feature data can be fused using a time series alignment, or the feature data can be combined using a multi-table connection and a time sequence alignment. Because most of the data on the terminal is time series data, that is, the user's operation and the terminal scene at different time points are different and change with time. Therefore, the fusion of feature data can further reduce the asymmetry between data , The amount of compressed data.

The feature data is fused to obtain the fused feature data, and the fused feature data is stored for the third time, which can be stored in the third storage module. In some embodiments, after the fused feature data is obtained, the fused panoramic feature information is stored in the third storage module, and the data is effectively disaster-tolerant and backed up through the cascaded storage method, and the storage of plaintext data can be avoided And transmission, through the unique feature extraction step to extract high-latitude features from the basic data (equivalent to encrypting the basic data), effectively protecting user privacy information.

In some embodiments, the method may further include: transmitting the fused feature data to an application service layer or a data processing layer, and use the fused feature data to perform calculations. In some embodiments, the method may further include: uploading the fusion feature information to the cloud, so that it can be provided to the server for data analysis.

In some embodiments, the method may further include: performing terminal backup of the fused feature data to increase data redundancy. For example, when taking pictures in a gathering place, the audio information can judge the current environment to determine whether the current environment is happy, lively, or trouble, etc., combined with image information, can determine a more fine-grained place for the end user's room. Therefore, the audio signal undergoes

steps

110, 120, 130, and 140, and after the features are merged, slightly more redundant information will be generated than before, and this redundant information can supplement the lack of data.

The security of the data in the terminal is very important. The embodiment of the application not only takes care of the security of the system data itself, but also takes care of the security of the user's private data. The above shortcomings can be effectively solved through specific steps. Specifically, for the terminal (especially for panoramic perception, a large amount of terminal data needs to be collected), collecting a large amount of data can easily cause data loss problems, so the cascaded database storage method can effectively perform disaster recovery backup of data . Secondly, for the terminal, through feature extraction and storage of feature data, the pressure of data backup and storage can be greatly reduced, and the system hard disk and I/O (Input/Output, input/output) overhead can be effectively reduced. Finally, through feature extraction, the storage and transmission of plaintext data can be effectively avoided, and high-latitude features (equivalent to data encryption operations) can be extracted from the data through unique feature extraction, effectively protecting user privacy information.

Referring to FIG. 3, FIG. 3 is a diagram of another application scenario of the data storage method provided by an embodiment of the application. Among them, user behavior data, sensor data,..., system operation data, etc. are the source of basic data. Specifically, basic data can be obtained through sensors. Then, after clustering multiple basic data, perform primary storage. The primary storage layer stores basic data such as user behavior data, sensor data,..., system operation data, etc.

Subsequently, the feature extraction module performs feature extraction on the basic data in the primary storage layer, and extracts important features of the basic data as feature data for secondary storage. The secondary storage layer stores characteristic data such as behavior characteristics, sensor characteristics,..., system characteristics, etc.

In the tertiary storage, the feature data of the secondary storage layer is fused to obtain the fused panoramic feature, and the tertiary storage is the storage of the fused feature data.

After obtaining the fusion feature data, the fusion feature data can be uploaded to the cloud and provided to the server for data analysis, or the fusion feature data can be transmitted to the application service layer or data processing layer for calculation. In addition, redundant backup of the integrated panoramic feature database can be performed to increase data redundancy and effectively prevent data loss.

Referring to FIG. 4, FIG. 4 is a schematic diagram of a second flow of a data storage method provided by an embodiment of this application. Among them, the data storage method includes the following steps:

210. Obtain multiple basic data, and the multiple basic data belong to multiple categories.

220. Determine the category of each basic data, and according to the determined category of each basic data, summarize and integrate multiple basic data according to their respective categories.

Inductive integration can also be called clustering, which refers to dividing a collection of physical or abstract objects into multiple classes composed of similar objects. The cluster generated by clustering is a collection of a set of data objects, which are similar to objects in the same cluster and different from objects in other clusters.

By clustering all the basic data in the first storage module, the basic data of the same kind can be aggregated together to form a data set, thereby obtaining multiple data sets of multiple types of basic data. Among them, the basic data can be classified according to the hardware attributes of the data, such as data related to the main chip, data related to the display screen, data related to the hard disk, data related to the memory, data related to various sensors, etc. Basic data can also be classified according to corresponding applications, such as data related to system applications and data related to installed applications; among them, data related to installed applications can be further classified according to specific applications, such as instant messaging Application-related data, map application-related data, shopping application-related data, etc. The basic data is stored in the corresponding database according to the category, which effectively isolates the irrelevant data, so that the data can be stored independently. In some embodiments, obtaining the time series index corresponding to each database can also facilitate the indexing of basic data.

230. Store the summarized and integrated multiple data for the first time, and store them in the database of the corresponding category.

The basic data of the same type is stored in the same database. A piece of basic data can be stored in a database, for example, acceleration sensor data is only stored in the acceleration sensor database. A piece of basic data can also be stored in multiple databases. For example, when a piece of basic data falls into two categories, the basic data can be copied, and the copied basic data and the original basic data can be stored in two Among these databases, the two databases correspond to the two categories to which this basic data belongs. It should be noted that not only the currently acquired panorama perception information but also the previously stored panorama perception information can be stored in the database.

240. Train a machine learning model in advance, perform feature extraction of basic data on each database according to the machine learning model, obtain feature data corresponding to each database, and store the feature data for a second time.

Machine learning refers to computer simulation or realization of human learning behaviors to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance. It is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence. Machine learning studies how to improve the performance of specific algorithms in experience learning, and can automatically improve computer algorithms through experience.

Input the basic data into the machine learning model, obtain the model output result, use the model output result as the feature data, and store the feature data for the second time.

The scenario modeling layer uses the historical basic data stored in step 230 as training samples, and trains the machine learning model according to the training samples to obtain the trained machine learning model, which can be used as a prediction model. First, collect the basic data of each database; use data processing algorithms to extract feature data from the basic data; train and optimize the machine learning model based on the feature data; when new basic data is obtained, input the new basic data into the machine Learn the model and get new feature data.

In some embodiments, while obtaining the trained machine learning model, the importance levels corresponding to various types of historical basic data are obtained, and then the sampling frequency of various types of historical basic data is set according to the importance levels.

In some embodiments, the trained machine learning model is used to extract the feature information of the basic data, input the basic data into the machine learning model, obtain the model output result, use the model output result as the feature data, and store the feature data for the second time .

By pre-training the machine learning model, a machine learning model that matches the basic data can be obtained, which is convenient for further processing of the basic data. The machine automatically updates the learning algorithm, effectively avoiding the cumbersome and inflexible preset manual algorithm.

251. Fusion feature data in a multi-table connection manner to obtain fused feature data.

In programming terms, the "JOIN" statement is used to combine two or more tables in the database. The collection generated by "connection" can be saved as a table, or used as a table, and a multi-table connection is a way of connecting between tables.

In some embodiments, the specific method of multi-table connection can use hash join technology. Hash join is a common way when connecting large data sets. The optimizer uses the smaller data source of the two tables to use the join key (JOINKEY). Build a hash table in memory, store the column data in the hash table, and then scan a larger table, and also HASH the JOINKEY and then detect the hash table to find out the rows that match the hash table. It is worth noting that the specific data that needs to be symmetrically connected to the multi-meters is determined by the program set in advance. For example, for the data of the gyroscope and the acceleration sensor, they are complementary sensors, but the frequency of the return data is different. Therefore, multiple tables can be connected. For another example, acceleration and gravity sensors can also be used as input sources for multi-meter connection.

The fusion of the characteristic data in the manner of multi-table connection may specifically include the fusion of the characteristic data in the manner of hash connection. In some embodiments, the step of fusing the feature data in a hash connection manner may specifically include: obtaining a first list and a second list, the first list and the second list respectively containing two sets of different types of feature data, The data source of the first list is smaller than the data source of the second list; use the connection key to build a hash table for the data source of the first list; extract the column data of the first list, and store the column data of the first list in the hash table; scan In the second list, the row data in the second list that matches the hash table is obtained, and the rows that match the hash table and the corresponding content in the first list are combined into a record and placed in the result set.

The step of scanning the second list to obtain row data matching the hash table in the second list may include: scanning the second list, performing hash mapping on the connection key, and detecting the hash table; when it is detected that there is The row data that matches the hash table is obtained, and the row data that matches the scatter table in the second list is obtained. It should be noted that the row data also matches the column data of the first list.

252. Fusion feature data in a time-aligned manner to obtain fused feature data.

Timing is time sequence, and timing alignment is about using timing to align data.

In some embodiments, the step of fusing the feature data in a time-aligned manner may include: acquiring two feature databases and two time-series information corresponding to the two feature databases; combining the feature data in the two feature databases Arrange according to the timing information respectively; obtain the same timing in the two timing information, and align the characteristic data corresponding to the same timing.

It should be noted that acquiring two feature databases and two timing information corresponding to the two feature databases is specifically acquiring one feature database and timing information corresponding to the one feature database, and acquiring another feature database and the timing information corresponding to the Timing information corresponding to another feature database. Each feature database contains all the feature data of its corresponding database

In some embodiments, before acquiring the same timing in the two timing information and aligning the feature data corresponding to the same timing, it may further include: when it is detected that the timing in the two timing information cannot be completely matched, acquiring the two timing information The timing information to be operated that cannot be matched in the timing information; to determine whether the timing of the operation to be operated can be supplemented with data, the data includes characteristic data, and the method of data completion includes interpolation algorithm; if it is determined that the timing of the operation can be supplemented with data, Fill in the data corresponding to the sequence to be operated; if it is determined that the sequence to be operated cannot be filled with data, delete the sequence to be operated.

Specifically, for example, the timing information of a certain data is A, B, D, F, and the timing information of a certain data is A, B, C, D, E, F. In order to match the two types of data, use the timing information The obtained data is interpolated to align, and if some data cannot be obtained by the interpolation algorithm, the redundant timing is deleted. Through timing alignment, the asymmetry between data can be further reduced, and the amount of data can be compressed.

253. Fuse the feature data in a multi-table connection and time sequence alignment to obtain fused feature data.

When the feature data is fused, the two methods of multi-table connection and timing alignment can be selected one of the two, or both can be combined. In some embodiments, not only the multi-table connection is used to fuse the feature data, but also the time sequence alignment is used to fuse the feature data.

260. Store the fused feature data for the third time.

The third storage unit stores the fused feature data, which can effectively perform disaster recovery and backup of the data through cascaded storage, and can avoid storing and transmitting plaintext data, and extract basic data through a unique feature extraction step High-dimensional features (equivalent to encrypting basic data), effectively protecting user privacy information.

270. Perform real-time backup of the fusion feature data on the terminal.

In order to ensure the security of the data to be processed, the basic data in the first storage module, the feature data in the second storage module, and the fusion feature data in the third storage module can all be backed up in real time at the terminal.

Specifically, it can be redundantly backed up in another storage module, or another place of the first storage module, the second storage module, or the third storage module.

If the first storage module is a hard disk, when the basic data, characteristic data or fusion characteristic data are redundantly backed up in the first storage module, the hard disk can be divided into at least two areas, and the basic data is stored in one of the areas. Backup in another area.

If the first storage module is a hard disk, and the electronic device includes at least two hard disks, it can be redundantly backed up in another hard disk. Among them, the two hard drives can be the same type of hard drives, such as mechanical hard drives, solid state drives, hybrid hard drives, and so on. The two hard disks can also be different types of hard disks, such as mechanical hard disks, solid state hard disks, and hybrid hard disks.

It should be noted that the redundant backup in this embodiment can be backed up for one copy or multiple copies. Among them, multiple backups can be backed up in the same way or in different ways.

The real-time backup of the fusion feature data at the terminal can increase data redundancy and supplement the data. For example, when taking pictures in a gathering place, the audio information can judge the current environment to determine whether the current environment is happy, lively, or trouble, etc., combined with image information, can determine a more fine-grained place for the end user's room. Therefore, the audio signal undergoes

steps

110, 120, 130, and 140, and after the features are merged, slightly more redundant information will be generated than before, and this redundant information can supplement the lack of data. In addition, when the basic data is lost in the future, these redundant backup data can be used to supplement the source data.

It should be understood that in the embodiments of the present application, the terms "first" and "second" are only used to distinguish similar objects, and not necessarily used to describe a specific order or sequence. The objects described in this way can be used under appropriate circumstances. exchange.

During specific implementation, this application is not limited by the order of execution of the various steps described, and certain steps may also be carried out in other order or carried out simultaneously without conflict.

It can be seen from the above that the data storage method provided by the embodiment of the present application first obtains multiple basic data, and the multiple basic data belong to multiple categories; then the multiple basic data are summarized and integrated according to the categories they belong to, and after the integration is summarized The multiple data of the data is stored for the first time and stored in the database of the corresponding category; then the feature extraction of the basic data is performed on each database to obtain the feature data corresponding to each database, and the feature data is stored for the second time; finally The feature data is fused to obtain the fused feature data, and the fused feature data is stored for the third time. Through the three-level storage method, the key features of the basic data are extracted and merged, which can reduce redundant information. Storing the extracted feature data and the fused feature data obtained by further fusion can avoid directly operating on the plaintext data when operating the data, and effectively protect the security of system data and the security of user privacy data.

Referring to FIG. 5, FIG. 5 is a schematic structural diagram of a data storage device provided by an embodiment of the application. The data storage device 300 may be integrated in an electronic device. The data storage device 300 includes an acquisition module 301, a first storage module 302, a second storage module 303, and a third storage module 304.

The obtaining module 301 is used to obtain multiple basic data, and the multiple basic data belong to multiple categories;

The first storage module 302 is used to summarize and integrate multiple basic data according to their respective categories, and store the summarized and integrated multiple data for the first time in a database of the corresponding category;

The second storage module 303 is configured to perform feature extraction of basic data on each database, obtain feature data corresponding to each database, and store the feature data for the second time;

The third storage module 304 is configured to fuse the feature data to obtain the fusion feature data, and store the fusion feature data for the third time.

In some embodiments, the step of acquiring a plurality of basic data by the acquiring module 301 includes: acquiring the basic data through a plurality of different sensors in real time.

In some embodiments, the types of basic data include at least behavior data of the user operating terminal, sensor data, and system operation data.

Please refer to FIG. 6 together. FIG. 6 is a schematic diagram of another structure of a data storage device according to an embodiment of the application.

In some embodiments, the second storage module 303 performs feature extraction of basic data on the database, which may be performed by a machine learning method. At this time, the second storage module 303 may include a training unit 3031 and a feature acquisition unit 3032.

The training unit 3031 is used for pre-training the machine learning model to obtain a machine learning model matching the basic data. The training unit 3031 may be specifically used for: collecting basic data of each database; extracting characteristic data from the basic data using a data processing algorithm; training and optimizing a machine learning model based on the characteristic data.

The feature acquisition unit 3032 is used to input the new basic data into the machine learning model to obtain new feature data when new basic data is acquired; perform feature extraction of the basic data on each database to obtain the corresponding data for each database Characteristic data, store the characteristic data for the second time.

Please also refer to FIG. 7. FIG. 7 is a schematic diagram of another structure of the data storage device provided by an embodiment of the application. In some embodiments, the third storage module 304 may include a multi-table connection unit 3041 and/or a timing alignment unit 3042.

The multi-table connection unit 3041 is used to fuse the feature data in a multi-table connection manner, and specifically may be combined in a hash connection manner, and the steps include:

Acquire a first list and a second list, the first list and the second list respectively contain two different types of characteristic data, and the data source of the first list is smaller than the data source of the second list;

Use the connection key to create a hash table for the data source of the first list;

Scan the second list to obtain row data in the second list that matches the hash table, and combine the rows that match the hash table with the corresponding content in the first list into a record and put it in the result set.

Wherein, when scanning the second list to obtain row data matching the hash table in the second list, the multi-table connection unit 3041 is also used to:

When it is detected that there is row data matching the hash table in the second list, the row data matching the scatter table in the second list is obtained, and the row data matches the column data of the first list.

The timing alignment unit 3042 is used for fusing the feature data in a timing alignment manner, and the steps include:

Acquire two feature databases and two time series information corresponding to the two feature databases, each feature database contains all the feature data of its corresponding database;

Obtain the same timing in the two timing information, and align the feature data corresponding to the same timing.

In some embodiments, before acquiring the same timing in the two timing information and aligning the feature data corresponding to the same timing, the timing alignment unit 3042 is further configured to:

Determine whether the timing in the two timing information can be completely matched;

When it is judged that the timings in the two timing information can be completely matched, align the characteristic data corresponding to the same timing;

When it is detected that the timings in the two timing information cannot be completely matched, obtain the to-be-operated timings that cannot be matched in the two timing information;

Judge whether the data can be complemented for the operation sequence to be processed, the data includes characteristic data, and the method of data complementing includes interpolation algorithm;

If it is judged that the data can be supplemented for the timing sequence to be operated, the data corresponding to the timing sequence to be operated are supplemented;

If it is judged that the data to be operated sequence cannot be completed, the sequence to be operated is deleted.

In some embodiments, a perfect match means that the timings in the two timing information are completely the same.

In some embodiments, the device may also include a backup module and a transmission module. The backup module is used to back up the fusion feature data in real time at the terminal. The transmission module is used to transmit the fusion feature data to the application service layer or the data processing layer, so that the application service layer or the data processing layer uses the fusion information feature for calculation; or the transmission module can also be used to transmit the fusion feature data to the cloud for Cloud server for data analysis.

It can be seen from the above that an embodiment of the present application provides a data storage device. First, the obtaining module 301 obtains multiple basic data, and the multiple basic data belong to multiple categories; then, the first storage module 302 divides the multiple basic data according to their classification. Perform induction and integration of the categories, and store the summarized and integrated multiple data for the first time and store them in the database of the corresponding category; then the second storage module 303 performs feature extraction of the basic data of each database to obtain the corresponding database The feature data is stored for the second time; finally, the third storage module 304 fuses the feature data to obtain the fused feature data, and stores the fused feature data for the third time. Through the three-level storage method, the key features of the basic data are extracted and merged, which can reduce redundant information. Storing the extracted feature data and the fused feature data obtained by further fusion can avoid directly operating on the plaintext data when operating the data, and effectively protect the security of system data and the security of user privacy data.

The embodiment of the application also provides an electronic device. Electronic equipment can be smart phones, tablet computers, gaming equipment, AR (Augmented Reality) equipment, cars, data storage devices, audio playback devices, video playback devices, notebooks, desktop computing devices, wearable devices such as watches, glasses , Helmets, electronic bracelets, electronic necklaces, electronic clothing and other equipment.

Referring to FIG. 8, FIG. 8 is a schematic diagram of a first structure of an electronic device 800 according to an embodiment of the application. The electronic device 800 includes a processor 801 and a memory 802. The processor 801 is electrically connected to the memory 802.

The processor 801 is the control center of the electronic device 800. It uses various interfaces and lines to connect the various parts of the entire electronic device, and executes the electronic device by running or calling the computer program stored in the memory 802 and calling the data stored in the memory 802. Various functions and processing data of the equipment, so as to monitor the electronic equipment as a whole.

In this embodiment, the processor 801 in the electronic device 800 loads the instructions corresponding to the process of one or more computer programs into the memory 802 according to the following steps, and the processor 801 runs the instructions stored in the memory 802 In order to realize various functions:

Obtain multiple basic data, which belong to multiple categories;

Summarize and integrate multiple basic data according to their respective categories, and store the summarized and integrated multiple data for the first time in the database of the corresponding category;

Perform feature extraction of basic data for each database, obtain feature data corresponding to each database, and store the feature data for the second time;

The feature data is fused to obtain the fused feature data, and the fused feature data is stored for the third time.

In some embodiments, before performing feature extraction of basic data on each database to obtain feature data corresponding to each database, the processor 801 performs the following steps:

Collect basic data of each database;

Use data processing algorithms to extract feature data from basic data;

Based on feature data, train and optimize machine learning models;

When new basic data is obtained, the new basic data is input to the machine learning model to obtain new feature data.

In some embodiments, when fusing the feature data, the processor 801 performs the following steps:

Combine characteristic data in a multi-table connection mode;

The feature data is fused in a time-aligned manner.

Wherein, when the feature data is merged in a multi-table connection manner, the processor 801 performs the following steps:

In some embodiments, when scanning the second list to obtain row data in the second list that matches the hash table, the processor 801 performs the following steps:

In some embodiments, when the feature data is merged in a time-aligned manner, the processor 801 performs the following steps:

In some embodiments, before acquiring the same timing in the two timing information and aligning the feature data corresponding to the same timing, the processor 801 performs the following steps:

In some embodiments, when acquiring multiple basic data, the processor 801 performs the following steps:

Collect basic data through multiple different sensors in real time.

In some embodiments, after storing the fused feature data for the third time, the processor 801 performs the following steps:

The fusion feature data is backed up in real time at the terminal.

In some embodiments, referring to FIG. 9, FIG. 9 is a schematic diagram of a second structure of an electronic device 800 provided in an embodiment of this application.

Wherein, the electronic device 800 further includes: a display screen 803, a control circuit 804, an input unit 805, a sensor 806, and a power supply 807. The processor 801 is electrically connected to the display screen 803, the control circuit 804, the input unit 805, the sensor 806, and the power source 807, respectively.

The display screen 803 can be used to display information input by the user or information provided to the user and various graphical user interfaces of the electronic device. These graphical user interfaces can be composed of images, text, icons, videos, and any combination thereof.

The control circuit 804 is electrically connected to the display screen 803 and is used for controlling the display screen 803 to display information.

The input unit 805 can be used to receive inputted numbers, character information or user characteristic information (such as fingerprints), and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control. Wherein, the input unit 805 may include a fingerprint recognition module.

The sensor 806 is used to collect information of the electronic device itself or information of the user or external environment information. For example, the sensor 806 may include multiple sensors such as a distance sensor, a magnetic field sensor, a light sensor, an acceleration sensor, a fingerprint sensor, a Hall sensor, a position sensor, a gyroscope, an inertial sensor, a posture sensor, a barometer, and a heart rate sensor.

The power supply 807 is used to supply power to various components of the electronic device 800. In some embodiments, the power supply 807 may be logically connected to the processor 801 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system.

Although not shown in FIG. 9, the electronic device 800 may also include a camera, a Bluetooth module, etc., which will not be repeated here.

It can be seen from the above that an embodiment of the present application provides an electronic device, and the processor in the electronic device performs the following steps: first obtain multiple basic data, and the multiple basic data belong to multiple categories; Induction and integration are performed on the category of the genus, and the multiple data after the induction and integration are stored for the first time and stored in the database of the corresponding category; then the feature extraction of the basic data is performed on each database, and the feature data corresponding to each database is obtained. The feature data is stored for the second time; finally, the feature data is fused to obtain the fused feature data, and the fused feature data is stored for the third time. Through the three-level storage method, the key features of the basic data are extracted and merged, which can reduce redundant information. Storing the extracted feature data and the fused feature data obtained by further fusion can avoid directly operating on the plaintext data when operating the data, and effectively protect the security of system data and the security of user privacy data.

An embodiment of the present application also provides a storage medium in which a computer program is stored. When the computer program is run on a computer, the computer executes the data storage method of any of the foregoing embodiments.

For example, in some embodiments, when the computer program runs on the computer, the computer performs the following steps:

Obtain multiple basic data, which belong to multiple categories;

It should be noted that those of ordinary skill in the art can understand that all or part of the steps in the various methods of the foregoing embodiments can be completed by instructing relevant hardware through a computer program, which can be stored in a computer-readable storage medium. The storage medium may include, but is not limited to: read only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks, etc.

The data storage method, device, storage medium, and electronic equipment provided by the embodiments of the application are described in detail above. Specific examples are used in this article to illustrate the principles and implementation of the application. The description of the above embodiments is only It is used to help understand the methods and core ideas of this application; at the same time, for those skilled in the art, according to the ideas of this application, there will be changes in the specific implementation and the scope of application. In summary, this specification The content should not be construed as a limitation on this application.

Claims

A data storage method, which includes:

Acquiring multiple basic data, the multiple basic data belonging to multiple categories;

Summarize and integrate the multiple basic data according to their respective categories, and store the summarized and integrated data for the first time in a database of the corresponding category;

Perform feature extraction of basic data on each database, obtain feature data corresponding to each database, and store the feature data for a second time;

The feature data is fused to obtain fused feature data, and the fused feature data is stored for the third time.
The data storage method according to claim 1, wherein the categories of the basic data include at least behavior data of a user operating terminal, sensor data, and system operation data.
The data storage method according to claim 2, wherein, before the feature extraction of the basic data of each database is performed to obtain the feature data corresponding to each database, the method further comprises:

Collect basic data of each database;

Extracting characteristic data from the basic data by using a data processing algorithm;

Based on the feature data, train and optimize a machine learning model;

When new basic data is acquired, the new basic data is input to the machine learning model to obtain new feature data.
The data storage method according to claim 1, wherein said fusing the characteristic data comprises:

Fuse the characteristic data in a multi-table connection manner;

The feature data is fused in a time-aligned manner.
The data storage method according to claim 4, wherein said fusing the characteristic data in a multi-table connection manner comprises:

Acquiring a first list and a second list, the first list and the second list respectively containing two sets of different types of characteristic data, the data source of the first list is smaller than the data source of the second list;

Establishing a hash table for the data source of the first list by using the connection key;

Extract the column data of the first list, and store the column data of the first list in a hash table;

The second list is scanned to obtain row data in the second list that matches the hash table, and the rows that match the hash table and the corresponding content in the first list are combined into a record and placed in a result set.
The data storage method according to claim 5, wherein the scanning the second list to obtain row data in the second list that matches the hash table comprises:

Scan the second list, perform hash mapping on the connection key, and detect the hash table;

When it is detected that there is row data in the second list that matches the hash table, the row data in the second list that matches the scatter table is acquired, and the row data is the same as the column of the first list. The data matches.
The data storage method according to claim 4, wherein said fusing the characteristic data in a time-aligned manner comprises:.

Acquiring two feature databases and two time series information corresponding to the two feature databases, each of the feature databases contains all the feature data of its corresponding database;

Arrange the feature data in the two feature databases according to time sequence information;

Obtain the same sequence in the two sequence information, and align the characteristic data corresponding to the same sequence.
The data storage method according to claim 1, wherein said acquiring a plurality of basic data comprises:

Collect basic data through multiple different sensors in real time.
The data storage method according to claim 1, wherein, after storing the fused feature data for the third time, the method further comprises:

The fusion feature data is backed up in real time at the terminal.
A data storage device, which includes:

An obtaining module, used to obtain a plurality of basic data, the plurality of basic data belong to a plurality of categories;

The first storage module is configured to summarize and integrate the multiple basic data according to their respective categories, and store the summarized and integrated multiple data for the first time in a database of the corresponding category;

The second storage module is used to perform feature extraction of basic data for each database to obtain feature data corresponding to each database, and store the feature data for the second time;

The third storage module is used to fuse the feature data to obtain the fusion feature data, and store the fusion feature data for the third time.
A storage medium on which a computer program is stored, wherein when the computer program runs on a computer, the computer is caused to execute:

Acquiring multiple basic data, the multiple basic data belonging to multiple categories;

Summarize and integrate the multiple basic data according to their respective categories, and store the summarized and integrated data for the first time in a database of the corresponding category;

Perform feature extraction of basic data on each database, obtain feature data corresponding to each database, and store the feature data for a second time;

The feature data is fused to obtain fused feature data, and the fused feature data is stored for the third time.
An electronic device includes a processor and a memory, the memory stores a computer program, wherein the processor is configured to execute:

Acquiring multiple basic data, the multiple basic data belonging to multiple categories;

Summarize and integrate the multiple basic data according to their respective categories, and store the summarized and integrated data for the first time in a database of the corresponding category;

Perform feature extraction of basic data on each database, obtain feature data corresponding to each database, and store the feature data for a second time;

The feature data is fused to obtain fused feature data, and the fused feature data is stored for the third time.
11. The electronic device according to claim 11, wherein the category of the basic data includes at least behavior data of a user operating terminal, sensor data, and system operation data.
The electronic device according to claim 13, wherein, before the feature extraction of the basic data is performed on each database to obtain the feature data corresponding to each database, the processor is further configured to execute:

Collect basic data of each database;

Extracting characteristic data from the basic data by using a data processing algorithm;

Based on the feature data, train and optimize a machine learning model;

When new basic data is acquired, the new basic data is input to the machine learning model to obtain new feature data.
The electronic device according to claim 13, wherein said fusing the characteristic data comprises:

Fuse the characteristic data in a multi-table connection manner;

The feature data is fused in a time-aligned manner.
The electronic device according to claim 15, wherein when the characteristic data is merged in a multi-table connection manner, the processor is configured to execute:

Acquiring a first list and a second list, the first list and the second list respectively containing two sets of different types of characteristic data, the data source of the first list is smaller than the data source of the second list;

Establishing a hash table for the data source of the first list by using the connection key;

Extract the column data of the first list, and store the column data of the first list in a hash table;

The second list is scanned to obtain row data in the second list that matches the hash table, and the rows that match the hash table and the corresponding content in the first list are combined into a record and placed in a result set.
The electronic device according to claim 16, wherein, when scanning the second list to obtain row data in the second list that matches the hash table, the processor is configured to execute:

Scan the second list, perform hash mapping on the connection key, and detect the hash table;

When it is detected that there is row data in the second list that matches the hash table, the row data in the second list that matches the scatter table is acquired, and the row data is the same as the column of the first list. The data matches.
The electronic device according to claim 15, wherein, when the feature data is merged in a time-aligned manner, the processor is configured to execute:

Acquiring two feature databases and two time series information corresponding to the two feature databases, each of the feature databases contains all the feature data of its corresponding database;

Arrange the feature data in the two feature databases according to time sequence information;

Obtain the same sequence in the two sequence information, and align the characteristic data corresponding to the same sequence.
The electronic device according to claim 12, wherein when acquiring a plurality of basic data, the processor is configured to execute:

Collect basic data through multiple different sensors in real time.
The electronic device according to claim 12, wherein, after storing the fused feature data for the third time, the processor is further configured to execute:

The fusion feature data is backed up in real time at the terminal.