Disclosure of Invention
In order to overcome the above-mentioned drawbacks of the prior art, embodiments of the present invention provide a method and a server for storing stock exchange data in a column based on a tree structure.
According to one aspect of the present invention, there is provided a tree-structure-based stock exchange data columnar storage method including the steps of:
s1: collecting original stock trade data with a time stamp in stock trade, preprocessing the original stock trade data to obtain first target data, analyzing the subdivision field of the first target data based on a fuzzy function, and storing the first target data of the subdivision field on a corresponding storage node;
s2: constructing a target tree structure for the first target data of each subdivision domain, wherein the storage node of the first target data storage is a target tree structure of the corresponding subdivision domain, the subdivision domain is a root node of the current target tree structure, and the first target data is stored in the corresponding target tree structure to obtain stock node data of the corresponding node; the stock node data comprises node position information, node data information and backup node data information;
S3: generating stock sequence data according to the time sequence mode corresponding to the time stamp by the stock node data; counting the stock support degree and the stock co-occurrence degree of all the stock sequence data, and obtaining a frequent positive sequence mode set based on the stock support degree and the stock co-occurrence degree; mining time sequence modes in the frequent positive sequence mode set to obtain a stock time positive sequence mode set;
s4: and sequentially carrying out hash operation and data combination on the stock node data in the stock time positive sequence mode set to obtain simplified stock node data, and storing the simplified stock node data in a target tree structure according to node position information.
In a preferred embodiment, the specific analysis process of the subdivision region corresponding to the first target data is as follows:
performing data preprocessing on the original stock trading data to obtain first target data; data preprocessing includes, but is not limited to, one or more of the following processing modes: filtering error data, filtering repeated data and normalizing processed data;
extracting subdivision field features based on the first target data, and converting the subdivision field features into fuzzy field variables;
calculating the membership degree of the subdivision domain where the stock trade data is located by the fuzzy domain variable through a membership function, and dividing the membership degree to obtain a fuzzy numerical interval corresponding to the fuzzy domain variable;
The fuzzy numerical intervals based on the subdivision domain feature map characterize the subdivision domain of the stock exchange data.
In a preferred embodiment, the acquiring the subdivision domain feature includes:
extracting associated domain features of the subdivision domain corresponding to the first target data by adopting a preset feature extraction network model;
and obtaining subdivision domain features through a cross-validation model for the associated domain features and the first target data.
In a preferred embodiment, the backup node data information is a backup of the corresponding node data information.
In a preferred embodiment, the node position information of the root node is the number of positions of the root node, the first target data subdivided under the root node is a child node, the node position information of the child node is composed of the number of positions of a parent node of the current node and the number of positions of the node itself, and the number of positions of the node represents a representation symbol of the position of the node in the same layer node.
In a preferred embodiment, mining of time series patterns in the frequent positive series pattern set, deriving a set of stock time positive series patterns includes:
counting the stock support degree and the stock co-occurrence degree of a time sequence mode, wherein the stock support degree is used for representing the total number of occurrences of the time sequence mode in all stock sequence data, the co-occurrence degree is used for representing how many stock sequence data the time sequence mode occurs in, the time sequence mode represents that a second time node occurs after a first time node occurs, and the first time node and the second time node are any two time nodes in the frequent positive sequence mode set;
And under the condition that the stock support degree corresponding to any two time nodes is larger than a preset stock support degree threshold value and the stock co-occurrence degree is larger than the preset stock co-occurrence degree threshold value, taking the corresponding stock sequence data as elements in the stock time positive sequence mode set.
In a preferred embodiment, the simplified stock node data acquisition logic is:
the method comprises the steps that data grouping is carried out on stock node data in a stock time positive sequence mode set according to preset node data items, and a weighted hash value corresponding to the stock node data is calculated by utilizing a hash algorithm and weight values of all the stock node data;
comparing weighted hash values corresponding to all stock node data, and merging two stock node data with Hamming distance smaller than a preset Hamming threshold value between the hash values into the same similar stock data set;
combining the same item of similar stock data set into new stock node data, and marking the new stock node data as simplified stock node data; and updating node position information of the stock node data; the reduced stock node data is stored on the updated node location information.
In a preferred embodiment, updating node location information of stock node data includes:
Acquiring node position information corresponding to node data information in the same similar stock data set;
obtaining the distance between the current node and the root node based on the node position information; the distance between the current node and the root node is the sum of the first distance and the second distance; the first distance is the distance from the parent node to the root node of the current node; the second distance is the ratio of the current node position number to the total number of node positions under the father node;
comparing all node position information with each other; and the node position information with the smallest node distance from the root node is updated.
In a preferred embodiment, the acquiring logic of the weight value of the stock node data is:
the weight value comprises a fixed weight value and a fluctuation weight value, wherein the fixed weight value is the reciprocal of the distance from corresponding stock node data to the root node, and the fluctuation weight value is the difference value between the maximum value of the data fluctuation amplitude and a preset amplitude threshold value; the fluctuation weight value is determined according to any one of the following methods:
when the data fluctuation range of the stock node data is larger than or equal to a preset amplitude threshold value, increasing a weight value corresponding to the node data information;
And when the data fluctuation range of the stock node data is smaller than a preset amplitude threshold value, reducing the weight value corresponding to the node data information.
According to another aspect of the present invention, there is provided a stock exchange data storage server comprising:
the data acquisition module acquires original stock trade data with a time stamp in stock trade, preprocesses the original stock trade data to obtain first target data, analyzes the subdivision field of the first target data based on a fuzzy function, and stores the first target data of the subdivision field on a corresponding storage node;
the tree structure construction module is used for constructing a target tree structure for the first target data of each subdivision field, wherein the storage node of the first target data storage is a target tree structure of the corresponding subdivision field, the subdivision field is a root node of the current target tree structure, and the first target data is stored in the corresponding target tree structure to obtain stock node data of the corresponding node; the stock node data comprises node position information, node data information and backup node data information;
the simplified sequence generation module is used for generating stock sequence data according to the time sequence mode corresponding to the time stamp by the stock node data; counting the stock support degree and the stock co-occurrence degree of all the stock sequence data, and obtaining a frequent positive sequence mode set based on the stock support degree and the stock co-occurrence degree; mining time sequence modes in the frequent positive sequence mode set to obtain a stock time positive sequence mode set;
And the data storage module sequentially performs hash operation and data combination on the stock node data in the stock time positive sequence mode set to obtain simplified stock node data, and stores the simplified stock node data in the target tree structure according to the node position information.
According to still another aspect of the present invention, there is provided an electronic apparatus including: a processor and a memory, wherein the memory stores a computer program for the processor to call;
the processor executes the above-described stock exchange data columnar storage method based on the tree structure by calling the computer program stored in the memory.
According to yet another aspect of the present invention, there is provided a computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the above-described tree-structure-based stock exchange data columnar storage method.
The stock exchange data column type storage method based on the tree structure has the technical effects and advantages that:
the invention can effectively organize and store a large amount of stock transaction data in a tree structure storage mode, and can rapidly locate and search specific stock node data by storing the data according to the subdivision field and the node position information, thereby improving the storage efficiency of the data; the frequent positive sequence mode can be mined through counting the stock support and co-occurrence of the stock sequence data, so that important trends and modes in stock trading are revealed, and the method has important significance in making investment strategies and predicting market trends; the method can also capture the time correlation in stock exchange, help analysts and investors find important time series patterns, and be used for market prediction and decision making, and finally, the original stock node data can be simplified through hash operation and data merging. The simplification can reduce the occupation of the storage space and reduce the storage cost; thus, the storage efficiency and the data mining capability of the stock exchange data can be improved, and more effective support is provided for stock analysis and decision making.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Referring to fig. 1, the present embodiment is a stock transaction data storage server, which includes a data acquisition module 1, a tree structure construction module 2, a reduced sequence generation module 3, and a data storage module 4, where the modules are connected by wired and/or wireless connection to realize data transmission between the modules;
The data acquisition module 1 acquires original stock trade data with a time stamp in stock trade, preprocesses the original stock trade data to obtain first target data, analyzes the subdivision field of the first target data based on a fuzzy function, and stores the first target data of the subdivision field on a corresponding storage node;
what needs to be explained here is: first, stock exchange data includes a data source of a stock exchange, a financial data provider or a third party API; acquiring a plurality of stock trading data from a plurality of directions, which can perfect the accuracy and the integrity of the acquired data to a certain extent; the disadvantage is that there is a large amount of identical data overlap, which subsequently requires preprocessing of the original stock exchange data to obtain relatively accurate and uniform first target data.
Analyzing a subdivision region of the first target data based on the fuzzy function, the manner of the subdivision region including, but not limited to, one or more of the following: industry, market, securities types (such as stocks, options, futures, etc.), or other custom classification criteria, and each stock exchange data is stored on a corresponding storage node according to different subdivision domains. Ensuring that the time stamp information is kept in the storage process so as to be analyzed or queried later and added; facilitating subsequent periodic updating and maintenance of stock exchange data.
The specific analysis process of the subdivision field corresponding to the first target data is as follows:
performing data preprocessing on the original stock trading data to obtain first target data; data preprocessing includes, but is not limited to, one or more of the following processing modes: filtering error data, filtering repeated data and normalizing processed data;
extracting subdivision field features based on the first target data, and converting the subdivision field features into fuzzy field variables;
calculating the membership degree of the subdivision domain where the stock trade data is located by the fuzzy domain variable through a membership function, and dividing the membership degree to obtain a fuzzy numerical interval corresponding to the fuzzy domain variable;
the fuzzy numerical intervals based on the subdivision domain feature map characterize the subdivision domain of the stock exchange data.
What needs to be explained here is: the collected raw stock trade data is first pre-processed to ensure accuracy and consistency of the data.
The pretreatment mode comprises the following steps:
filtering the error data: data points for which errors or outliers exist, such as data for which the price is negative or the transaction volume is abnormally high, are identified and excluded.
Repeated data were filtered: repeated transaction data is detected and deleted to avoid repeated impact on the analysis results.
Normalizing the processed data: and (3) carrying out normalization processing on the data, and unifying the data with different scales into a specific range so as to eliminate deviation caused by data difference.
Features associated with the selected segment are extracted from the pre-processed stock exchange data, the particular feature extraction method being dependent upon the selected segment. Specific examples: for the field of subdivision of market index data, it is possible to extract daily index values; for the subdivision field of individual stock data, characteristics such as price, transaction amount and the like of stocks can be extracted; for the subdivision field of the industry field, the characteristic fields of electronics, agriculture, biology, chemistry, catering and the like can be extracted, and the definition and the category of the subdivision field are classified according to the current server.
The subdivision domain features are converted into fuzzy domain variables, membership degrees of the fuzzy domain variables in different subdivision domains are calculated, and the fuzzy domain variables quantify the relationship between the subdivision domain features and the subdivision domains by defining membership functions. The membership function may be a gaussian function, a trigonometric function, or other suitable functional form.
A mapping table can be established, and the characteristics of the subdivision field are corresponding to the fuzzy numerical intervals, so that the meanings of the intervals can be referred in the subsequent analysis and decision process, the selection, definition and actual application scene of the subdivision field can be specifically adjusted and expanded according to the specific situation and the requirement of the actual subdivision field.
The acquiring the subdivision domain feature includes:
extracting associated domain features of the subdivision domain corresponding to the first target data by adopting a preset feature extraction network model;
and obtaining subdivision domain features through a cross-validation model for the associated domain features and the first target data.
What needs to be explained here is: a pre-trained feature extraction network model is used, which model is trained in the domain related to the subdivision domain. By inputting the first target data into the feature extraction network model, the associated domain features of the subdivided domain can be extracted.
And training and evaluating the obtained associated domain characteristics of the subdivision domain and the first target data through a cross-validation model. Cross-validation is a technique that divides data into training and validation sets and performs multiple training and validation. Through cross-validation, the effect of the associated domain features in the corresponding subdivision domain may be evaluated, from which features with good performance on the first target data are selected.
A tree structure construction module 2 for constructing a target tree structure for the first target data of each subdivision domain, the target tree structure comprising a root node and at least one group of child nodes; the storage node of the first target data storage is a target tree structure corresponding to the subdivision field, the subdivision field is a root node of the current target tree structure, the first target data is stored in the corresponding target tree structure, and stock node data of the corresponding node are obtained; the stock node data includes node location information, node data information, and a plurality of backup node data information.
What needs to be explained here is: in this embodiment, a target tree structure is first constructed, where the target tree structure may be a multi-tree, and includes a root node and multiple groups of child nodes, where the root node is a storage node, and the root node is an initial parent node; the root node represents all first target data in the corresponding subdivision domain; the child node is first target data subdivided under the father node;
the node position information of the root node is the position number of the root node, the node position information of the child node is composed of the position number of the father node and the position number of the node, the position number of the node represents the representation symbol of the position of the node in the same layer of nodes, and the length of the father node corresponding to the current node distance is marked as the step length of the current node.
Further, the backup node data information is a backup of the corresponding node data information; the purpose of storing backup node data information on corresponding node location information here is to: when the node data information on the corresponding node position information is missing or invalid, directly extracting the backup node data information of the corresponding node, and if the extracted backup node data information is missing or invalid, extracting the next backup node data information, and backing up the missing historical data from the backup node data information; if the data information of a plurality of backup nodes is true or invalid, the server is attacked with a high probability, and maintenance is required for the server.
The simplified sequence generation module 3 generates stock sequence data according to the time sequence mode corresponding to the time stamp by the stock node data; counting the stock support degree and the stock co-occurrence degree of all the stock sequence data, and obtaining a frequent positive sequence mode set based on the stock support degree and the stock co-occurrence degree; mining time sequence modes in the frequent positive sequence mode set to obtain a stock time positive sequence mode set;
what needs to be explained here is: each stock sequence data is created by mapping each stock node to a corresponding time sequence pattern according to its time stamp. Ensuring that the time series pattern corresponding to each time stamp is correct for subsequent analysis; traversing all stock sequence data, and counting the support degree (namely the frequency of occurrence in the sequence) of each stock and the co-occurrence degree (namely the frequency of occurrence in the same sequence) between stocks;
based on the stock support and the stock co-occurrence, a method of association rule mining may be used to obtain a set of frequent positive sequence patterns. Mining methods such as Apriori algorithm or FP-growth algorithm find frequent positive sequence patterns in stock sequence data. These patterns are sequences that occur frequently and are of interest; further mining is performed from the frequent positive sequence pattern set to obtain a more compact stock sequence pattern set. Some data mining and machine learning techniques, such as sequence pattern mining, cluster analysis, or association rule mining, may be used to find more useful and valuable stock sequence patterns.
Mining time sequence patterns in the frequent positive sequence pattern set, the obtaining a stock time positive sequence pattern set comprising:
counting the stock support degree and the stock co-occurrence degree of a time sequence mode, wherein the stock support degree is used for representing the total number of occurrences of the time sequence mode in all stock sequence data, the co-occurrence degree is used for representing how many stock sequence data the time sequence mode occurs in, the time sequence mode represents that a second time node occurs after a first time node occurs, and the first time node and the second time node are any two time nodes in the frequent positive sequence mode set;
and under the condition that the stock support degree corresponding to any two time nodes is larger than a preset stock support degree threshold value and the stock co-occurrence degree is larger than the preset stock co-occurrence degree threshold value, taking the corresponding stock sequence data as elements in the stock time positive sequence mode set.
What needs to be explained here is: through the method, the stock time positive sequence mode set can be obtained, wherein each element meets the preset stock support threshold value and stock co-occurrence threshold value condition, so that the importance and co-occurrence of the time sequence mode are ensured. The stock support threshold and the stock co-occurrence threshold are mainly obtained through professional analysis, and the stock support threshold and the stock co-occurrence threshold are also adjusted according to the difference of the stock sequence data corresponding to different subdivision fields.
And the data storage module 4 sequentially performs hash operation and data combination on the stock node data in the stock time positive sequence mode set to obtain simplified stock node data, and stores the simplified stock node data in the target tree structure according to the node position information.
What needs to be explained here is: for each stock node data in the stock time positive sequence pattern set, it is hashed using a hash function. The hash function maps the node data to a unique hash value for subsequent data merging and storage.
The hash operation may employ a conventional hash function such as MD5, SHA-1, SHA-256, or the like. Ensuring that the selected hash function has good hashes to minimize hash collisions.
Combining the stock node data of Ha Xihou; different merging strategies, such as simple adjacent node merging, node attribute-based merging rules or time window-based merging, are selected for subsequent data analysis and query operations, according to specific actual requirements and the design of a target tree structure.
The acquisition logic of the simplified stock node data is as follows:
the method comprises the steps that data grouping is carried out on stock node data in a stock time positive sequence mode set according to preset node data items, and a weighted hash value corresponding to the stock node data is calculated by utilizing a hash algorithm and weight values of all the stock node data;
Comparing weighted hash values corresponding to all stock node data, and merging two stock node data with Hamming distance smaller than a preset Hamming threshold value between the hash values into the same similar stock data set;
combining the same item of similar stock data set into new stock node data, and marking the new stock node data as simplified stock node data; and updating node position information of the stock node data; the reduced stock node data is stored on the updated node location information.
What needs to be explained here is: and carrying out data grouping and weighted hash value calculation on preset node data items, then merging similar stock data sets according to Hamming distance and Hamming threshold values, finally obtaining simplified stock node data and storing the simplified stock node data on updated node position information.
In addition, the character strings of the stock node data stored on the target tree structure are consistent in length, so that the Hamming distance is used for calculating the difference degree between the stock node data, equal-length character strings of the two stock node data can be compared bit by bit, the number of different positions of the equal-length character strings on the same position is counted, and the number of different positions is unequal; or performing exclusive or operation on the stock node data to obtain a new binary character string, and counting the number of 1 in the new binary character string, which can be expressed as Hamming distance.
Updating node location information of stock node data includes:
acquiring node position information corresponding to node data information in the same similar stock data set;
obtaining the distance between the current node and the root node based on the node position information; the distance between the current node and the root node is the sum of the first distance and the second distance; the first distance is the distance from the parent node to the root node of the current node; the second distance is the ratio of the current node position number to the total number of node positions under the father node;
comparing all node position information with each other; and the node position information with the smallest node distance from the root node is updated.
Here, it is necessary to exemplify: the root node is a storage node and plays a role of a catalog nodeBy marking it asThe child node corresponding to the root node is marked as +.>,/>Is the +.>Personal node->,And->Representing the total number of the positions of the child nodes corresponding to the root node; stock node data is marked +.>The parent node corresponding to the current stock node data is +.>The +.>A plurality of nodes; />,/>And->Representing the total number of child node positions under the parent node where the current stock node data is located; thus the distance of the current stock node data to the parent node is the second distance +. >Calculated by the formula +.>The method comprises the steps of carrying out a first treatment on the surface of the And by analogy, obtaining the distance from the current stock node data to the root node, and setting the node closest to the root node as updated node position information.
The above formulas are all formulas with dimensionality removed and numerical calculation, the formulas are formulas with the latest real situation obtained by software simulation through collecting a large amount of data, and preset parameters and threshold selection in the formulas are set by those skilled in the art according to the actual situation.
The acquisition logic of the weight value of the stock node data is as follows:
the weight value comprises a fixed weight value and a fluctuation weight value, wherein the fixed weight value is the reciprocal of the distance from corresponding stock node data to the root node, and the fluctuation weight value is the difference value between the maximum value of the data fluctuation amplitude and a preset amplitude threshold value; the fluctuation weight value is determined according to any one of the following methods:
when the data fluctuation range of the stock node data is larger than or equal to a preset amplitude threshold value, increasing a weight value corresponding to the node data information;
and when the data fluctuation range of the stock node data is smaller than a preset amplitude threshold value, reducing the weight value corresponding to the node data information.
What needs to be explained here is: here, the preset amplitude threshold value needs to be adjusted according to specific requirements and data characteristics to control the variation degree of the fluctuation weight value. And synthesizing the fixed weight value and the fluctuation weight value to obtain the weight value of the final stock node data. Thus, the weight value of each node data will include a fixed weight value and a fluctuating weight value that is adjusted according to the fluctuation of the data to reflect the location of the node data in the tree structure and the degree of fluctuation of the data.
What should be stated here is also: in the prior art, the weight value of the stock node data is directly counted, and the method comprises the following steps: the weight value of each stock node data is pre-distributed mainly according to preset node data items, and is set mainly according to priori knowledge or suggestions of domain experts based on importance, influence or experience in the domain of the data items.
These statistical features may also be used as weight values by statistical features such as mean, variance, standard deviation, etc. A larger statistical feature value may be given a higher weight, indicating the importance of the attribute in the data;
on the other hand, the weight value is calculated in many fields currently through a machine learning method, and a model is trained by using the machine learning method (such as regression, decision tree and the like) according to the existing data and the target variable, so that the weight value of each node data item is obtained. These models may determine weight values based on the relationship between the characteristics of the data and the target variables.
Selecting a proper weight calculation method according to the specific application scene and the data characteristics, and adjusting and optimizing weight values according to the complexity of the problem and the data characteristics: the adjusting and optimizing process is to reduce the influence degree of the fluctuation weight value on the fixed weight value.
The explanation is made in connection with specific embodiments:
a mapping table is first created for characterizing the relationship between the first target data and the subdivision domain. Then creating target tree structures according to the subdivision areas, wherein each target tree structure corresponds to one subdivision area, and storing first target data in the corresponding subdivision area on a corresponding storage node, namely storing the first target data on the target tree structure in the corresponding subdivision area; each target tree structure represents a particular domain, including electronic, agricultural, biological, chemical, and dining domains;
for each target tree structure, the target tree structure may include a root node and multiple groups of child nodes, as shown in fig. 3, where the root node is a starting storage point on the target tree structure, a first-level type division may be performed on the root node to obtain a group of first-level child nodes, the root node is a parent node of the first-level child nodes, a second-level type division may also be performed on the first-level child nodes to obtain a group of second-level child nodes, the first-level child nodes are parent nodes of the second-level child nodes, and so on, so as to generate the target tree structure. The storage node of the first target data storage is a target tree structure corresponding to the subdivision field, the subdivision field is a root node of the current target tree structure, the first target data is stored in the corresponding target tree structure, and stock node data of the corresponding node are obtained.
Specific examples: the root node is "dining", the first level sub-node is "fast food", "high-grade restaurant" and "cafe", the second level sub-node is stock code, name, historical data, etc. of corresponding dining enterprises in the first level sub-node, through the above example configuration, can realize the discernment to the stock field. If you want to acquire stocks associated with the catering field, the tree structure can be traversed to find out the nodes of the catering field and acquire the corresponding stock nodes in the child nodes. In this way, you can acquire stock information related to the catering field, and so on, in practical application, the adding child node can be modified according to practical situations.
Example 2
Referring to fig. 2, the embodiment is not described in detail in the section of the description of the embodiment, and the embodiment provides a method for storing a list of stock exchange data based on a tree structure, which includes: the method comprises the following steps:
s1: collecting original stock trade data with a time stamp in stock trade, preprocessing the original stock trade data to obtain first target data, analyzing the subdivision field of the first target data based on a fuzzy function, and storing the first target data of the subdivision field on a corresponding storage node;
The specific analysis process of the subdivision field corresponding to the first target data is as follows:
performing data preprocessing on the original stock trading data to obtain first target data; data preprocessing includes, but is not limited to, one or more of the following processing modes: filtering error data, filtering repeated data and normalizing processed data;
extracting subdivision field features based on the first target data, and converting the subdivision field features into fuzzy field variables;
calculating the membership degree of the subdivision domain where the stock trade data is located by the fuzzy domain variable through a membership function, and dividing the membership degree to obtain a fuzzy numerical interval corresponding to the fuzzy domain variable;
the fuzzy numerical intervals based on the subdivision domain feature map characterize the subdivision domain of the stock exchange data.
The acquiring the subdivision domain feature includes:
extracting associated domain features of the subdivision domain corresponding to the first target data by adopting a preset feature extraction network model;
and obtaining subdivision domain features through a cross-validation model for the associated domain features and the first target data.
S2: constructing a target tree structure for the first target data of each subdivision domain, wherein the storage node of the first target data storage is a target tree structure of the corresponding subdivision domain, the subdivision domain is a root node of the current target tree structure, and the first target data is stored in the corresponding target tree structure to obtain stock node data of the corresponding node; the stock node data comprises node position information, node data information and backup node data information;
The backup node data information is a backup of the corresponding node data information.
The node position information of the root node is the position number of the root node, the node position information of the child node is composed of the position number of the father node of the current node and the position number of the node, and the position number of the node represents the representation symbol of the position of the node in the same layer of nodes.
S3: generating stock sequence data according to the time sequence mode corresponding to the time stamp by the stock node data; counting the stock support degree and the stock co-occurrence degree of all the stock sequence data, and obtaining a frequent positive sequence mode set based on the stock support degree and the stock co-occurrence degree; mining time sequence modes in the frequent positive sequence mode set to obtain a stock time positive sequence mode set;
mining time sequence patterns in the frequent positive sequence pattern set, the obtaining a stock time positive sequence pattern set comprising:
counting the stock support degree and the stock co-occurrence degree of a time sequence mode, wherein the stock support degree is used for representing the total number of occurrences of the time sequence mode in all stock sequence data, the co-occurrence degree is used for representing how many stock sequence data the time sequence mode occurs in, the time sequence mode represents that a second time node occurs after a first time node occurs, and the first time node and the second time node are any two time nodes in the frequent positive sequence mode set;
And under the condition that the stock support degree corresponding to any two time nodes is larger than a preset stock support degree threshold value and the stock co-occurrence degree is larger than the preset stock co-occurrence degree threshold value, taking the corresponding stock sequence data as elements in the stock time positive sequence mode set.
S4: and sequentially carrying out hash operation and data combination on the stock node data in the stock time positive sequence mode set to obtain simplified stock node data, and storing the simplified stock node data in a target tree structure according to node position information.
The acquisition logic of the simplified stock node data is as follows:
the method comprises the steps that data grouping is carried out on stock node data in a stock time positive sequence mode set according to preset node data items, and a weighted hash value corresponding to the stock node data is calculated by utilizing a hash algorithm and weight values of all the stock node data;
comparing weighted hash values corresponding to all stock node data, and merging two stock node data with Hamming distance smaller than a preset Hamming threshold value between the hash values into the same similar stock data set;
combining the same item of similar stock data set into new stock node data, and marking the new stock node data as simplified stock node data; and updating node position information of the stock node data; the reduced stock node data is stored on the updated node location information.
The acquisition logic of the weight value of the stock node data is as follows:
the weight value comprises a fixed weight value and a fluctuation weight value, wherein the fixed weight value is the reciprocal of the distance from corresponding stock node data to the root node, and the fluctuation weight value is the difference value between the maximum value of the data fluctuation amplitude and a preset amplitude threshold value; the fluctuation weight value is determined according to any one of the following methods:
when the data fluctuation range of the stock node data is larger than or equal to a preset amplitude threshold value, increasing a weight value corresponding to the node data information;
and when the data fluctuation range of the stock node data is smaller than a preset amplitude threshold value, reducing the weight value corresponding to the node data information.
Updating node location information of stock node data includes:
acquiring node position information corresponding to node data information in the same similar stock data set;
obtaining the distance between the current node and the root node based on the node position information; the distance between the current node and the root node is the sum of the first distance and the second distance; the first distance is the distance from the parent node to the root node of the current node; the second distance is the ratio of the current node position number to the total number of node positions under the father node;
Comparing all node position information with each other; and the node position information with the smallest node distance from the root node is updated.
The invention can effectively organize and store a large amount of stock transaction data in a tree structure storage mode, and can rapidly locate and search specific stock node data by storing the data according to the subdivision field and the node position information, thereby improving the storage efficiency of the data; the frequent positive sequence mode can be mined through counting the stock support and co-occurrence of the stock sequence data, so that important trends and modes in stock trading are revealed, and the method has important significance in making investment strategies and predicting market trends; the method can also capture the time correlation in stock exchange, help analysts and investors find important time series patterns, and be used for market prediction and decision making, and finally, the original stock node data can be simplified through hash operation and data merging. The simplification can reduce the occupation of the storage space and reduce the storage cost; thus, the storage efficiency and the data mining capability of the stock exchange data can be improved, and more effective support is provided for stock analysis and decision making.
Example 3
An electronic device is shown according to an exemplary embodiment, comprising: a processor and a memory, wherein the memory stores a computer program for the processor to call;
the processor executes the above-described tree-structure-based stock exchange data columnar storage method by calling the computer program stored in the memory.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where the electronic device may have a relatively large difference due to different configurations or performances, and may include one or more processors (Central Processing Units, CPU) and one or more memories, where at least one computer program is stored in the memories, and the at least one computer program is loaded and executed by the processors to implement the stock algorithm trading method based on the deep neural network provided in the foregoing method embodiments. The electronic device can also include other components for implementing the functions of the device, for example, the electronic device can also have a wired or wireless network interface, an input-output interface, and the like, for input-output. The embodiments of the present application are not described herein.
Example 4
FIG. 5 is a schematic diagram of a computer-readable storage medium according to one embodiment of the present application. As shown in fig. 5, is a computer-readable storage medium 100 according to one embodiment of the present application. Computer readable storage medium 100 has stored thereon computer readable instructions. The path planning method according to the embodiment of the present application described with reference to the above drawings may be performed when the computer readable instructions are executed by the processor. Storage medium 100 includes, but is not limited to, for example, volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM), cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like.
In addition, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, the present application provides a non-transitory machine-readable storage medium storing machine-readable instructions executable by a processor to perform instructions corresponding to the method steps provided by the present application, such as: collecting original stock trade data with a time stamp in stock trade, preprocessing the original stock trade data to obtain first target data, analyzing the subdivision field of the first target data based on a fuzzy function, and storing the first target data of the subdivision field on a corresponding storage node; constructing a target tree structure for the first target data of each subdivision domain, wherein the storage node of the first target data storage is a target tree structure of the corresponding subdivision domain, the subdivision domain is a root node of the current target tree structure, and the first target data is stored in the corresponding target tree structure to obtain stock node data of the corresponding node; the stock node data comprises node position information, node data information and backup node data information; generating stock sequence data according to the time sequence mode corresponding to the time stamp by the stock node data; counting the stock support degree and the stock co-occurrence degree of all the stock sequence data, and obtaining a frequent positive sequence mode set based on the stock support degree and the stock co-occurrence degree; mining time sequence modes in the frequent positive sequence mode set to obtain a stock time positive sequence mode set; and sequentially carrying out hash operation and data combination on the stock node data in the stock time positive sequence mode set to obtain simplified stock node data, and storing the simplified stock node data in a target tree structure according to node position information.
The methods and apparatus, devices of the present application may be implemented in numerous ways. For example, the methods and apparatus, devices of the present application may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present application are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present application may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present application. Thus, the present application also covers a recording medium storing a program for executing the method according to the present application.
It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
It should be understood that determining B from a does not mean determining B from a alone, but can also determine B from a and/or other information.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Finally: the foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.