CN117081602B - Capital settlement data optimization processing method based on blockchain - Google Patents

Capital settlement data optimization processing method based on blockchain Download PDF

Info

Publication number
CN117081602B
CN117081602B CN202311321416.9A CN202311321416A CN117081602B CN 117081602 B CN117081602 B CN 117081602B CN 202311321416 A CN202311321416 A CN 202311321416A CN 117081602 B CN117081602 B CN 117081602B
Authority
CN
China
Prior art keywords
target
category
character string
dimension
settlement data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311321416.9A
Other languages
Chinese (zh)
Other versions
CN117081602A (en
Inventor
贾庆佳
刘永峰
张志勇
张磊
王晓琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Off Site Market Clearing Center Co ltd
Original Assignee
Qingdao Off Site Market Clearing Center Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Off Site Market Clearing Center Co ltd filed Critical Qingdao Off Site Market Clearing Center Co ltd
Priority to CN202311321416.9A priority Critical patent/CN117081602B/en
Publication of CN117081602A publication Critical patent/CN117081602A/en
Application granted granted Critical
Publication of CN117081602B publication Critical patent/CN117081602B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3088Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing the use of a dictionary, e.g. LZ78
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/04Protocols for data compression, e.g. ROHC
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to the technical field of data compression, in particular to a block chain-based fund settlement data optimization processing method, which comprises the following steps: acquiring a fund settlement data set corresponding to a target blockchain; classifying the funds settlement data in the funds settlement data set; screening candidate character strings under each target dimension from each target category; determining the corresponding representative degree of each candidate character string; performing similarity analysis processing on all character strings of each target category under each target dimension; screening target character strings from all candidate character strings in each target category; constructing an initial dictionary corresponding to each target category; and compressing all fund settlement data in each target category through an LZW algorithm according to the initial dictionary corresponding to each target category. The invention realizes the compression of the fund settlement data of the blockchain and improves the efficiency of the fund settlement data compression.

Description

Capital settlement data optimization processing method based on blockchain
Technical Field
The invention relates to the technical field of data compression, in particular to a block chain-based fund settlement data optimization processing method.
Background
In a blockchain-based funds settlement scenario, each participant may often become a node, they often maintain a complete copy of the blockchain, i.e., each node often stores more funds settlement data, which often needs to be compressed in order to save blockchain memory. Currently, when data is compressed, the following methods are generally adopted: data is compressed by an LZW algorithm, wherein the initial dictionary in the commonly used LZW algorithm is often empty.
However, when the fund settlement data is compressed by the LZW algorithm according to the empty initial dictionary, there are often the following technical problems:
because the initial dictionary is empty, the LZW algorithm often needs more insertion operation on the dictionary, a certain expense is often caused, and an additional calculation step is often needed, meanwhile, due to the huge data volume of the blockchain, the dictionary in the LZW algorithm is often very large, so that the load of the blockchain node is often increased, and further, the efficiency of fund settlement data compression is poor.
Disclosure of Invention
The summary of the invention is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. The summary of the invention is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In order to solve the technical problem of poor efficiency of fund settlement data compression, the invention provides a blockchain-based fund settlement data optimization processing method.
The invention provides a block chain-based fund settlement data optimization processing method, which comprises the following steps:
acquiring a fund settlement data set corresponding to a target blockchain, wherein the fund settlement data in the fund settlement data set comprises a character string under each target dimension, and the time dimension is one of the target dimensions;
classifying the fund settlement data in the fund settlement data set according to settlement time included in the fund settlement data, and determining each category obtained by classification as a target category, wherein the settlement time is a character string in a time dimension;
according to the number of times that all character strings of each target class in each target dimension appear in the target class, candidate character strings in each target dimension are screened out from each target class;
determining the representative degree corresponding to each candidate character string according to the length corresponding to each candidate character string and the occurrence times of each candidate character string in the belonging target category;
performing similarity analysis processing on all character strings of each target category in each target dimension to obtain target similarity of each target category in each target dimension;
screening target character strings from all candidate character strings in each target category according to the representative degree corresponding to all candidate character strings in each target category and the target similarity degree of each target category under all target dimensions, and obtaining a target character string set corresponding to each target category;
constructing an initial dictionary corresponding to each target category according to the target character string set corresponding to each target category;
and compressing all fund settlement data in each target category through an LZW algorithm according to the initial dictionary corresponding to each target category.
Optionally, the classifying the funds settlement data in the funds settlement data set according to settlement time included in the funds settlement data, and determining each category obtained by classification as a target category includes:
segmenting a preset time period, and determining each sub-time period obtained by segmentation as a target time period;
and dividing all fund settlement data with settlement time belonging to the same target time period into the same target category.
Optionally, the step of screening candidate character strings in each target dimension from each target category according to the number of times that all character strings in each target dimension appear in the target category, includes:
and when the occurrence frequency of the character string in the target dimension in the target category is greater than a preset frequency threshold, determining the character string as a candidate character string in the target dimension.
Optionally, the length corresponding to the candidate character string and the number of times of occurrence of the candidate character string in the belonging target category are positively correlated with the corresponding representative degree.
Optionally, the determining the representative degree corresponding to each candidate character string according to the length corresponding to each candidate character string and the frequency of occurrence of each candidate character string in the target category includes:
and determining the product of the length corresponding to each candidate character string and the frequency of occurrence of the candidate character string in the target category as the representative degree corresponding to each candidate character string.
Optionally, the analyzing the similarity degree of all the character strings of each target category in each target dimension to obtain the target similarity degree of each target category in each target dimension includes:
and determining the target similarity degree of the target category in the target dimension according to ASCII codes corresponding to all characters in all character strings of the target category in the target dimension.
Optionally, the formula corresponding to the target similarity of the target class in the target dimension is:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is the firstiThe object category is atbTarget similarity in the individual target dimensions;iis the sequence number of the target class;bis the sequence number of the target dimension; />Is the firstiThe object category is atbUnder the individual target dimension->The combination number of 2 character strings is taken out from the different character strings; />Is the firstiThe object category is atbThe number of different strings in the individual target dimensions; />Is the firstiThe object category is atbUnder the object dimensionaThe similarity indexes corresponding to the combinations;ais the firstiThe object category is atbSequence number of the combination under the individual target dimensions; />Is the firstiThe object category is atbUnder the object dimensionaA smaller value in the number of characters included in the two strings in the combination; />Taking an absolute value function;mis the serial number of the character in the character string; />Is the firstiThe object category is atbUnder the object dimensionaThe 1 st character string in the combinationmASCII codes corresponding to the individual characters; />Is the firstiThe object category is atbUnder the object dimensionaThe 2 nd character string in the combinationmASCII codes corresponding to the individual characters; />Is a three-dimensional operation expression.
Optionally, the step of screening the target strings from all the candidate strings in each target category according to the representative degrees corresponding to all the candidate strings in each target category and the target similarity degree of each target category under all the target dimensions to obtain a target string set corresponding to each target category includes:
determining the target representative degree corresponding to each candidate character string according to the representative degree corresponding to each candidate character string and the target similarity degree of the target category to which the candidate character string belongs under the target dimension;
when the target representing degree corresponding to the candidate character string is larger than a preset representing threshold value, determining the candidate character string as a target character string;
and combining all the target character strings in each target category into a target character string set.
Optionally, the representative degree corresponding to the candidate character string and the target similarity degree of the target category to which the candidate character string belongs under the target dimension are positively correlated with the corresponding target representative degree.
Optionally, the constructing an initial dictionary corresponding to each target category according to the target string set corresponding to each target category includes:
and determining the target character string set corresponding to the target category as an initial dictionary corresponding to the target category.
The invention has the following beneficial effects:
the method for optimizing and processing the fund settlement data based on the blockchain realizes the compression of the fund settlement data of the blockchain, solves the technical problem of poor efficiency of the compression of the fund settlement data, and improves the efficiency of the compression of the fund settlement data. Firstly, acquiring a fund settlement data set corresponding to a target blockchain can facilitate the subsequent compression of all fund settlement data corresponding to the target blockchain. Then, since the characteristics of the fund settlement data often have a certain relationship with the settlement time, the fund settlement data in the fund settlement data set is classified based on the settlement time included in the fund settlement data, so that more relatively similar fund settlement data can be classified into one type, and the subsequent accurate compression of the fund settlement data in each target type can be facilitated. Then, since the more character strings that appear in the belonging target category tend to be more representative, the more can be used to construct the initial dictionary, the degree of representativeness of each candidate character string can be quantified based on the length of each candidate character string and the number of times it appears in the belonging target category. Then, when the target similarity of the target category in the target dimension is larger, the more similar the respective character strings of the target category in the target dimension are often explained, the more representative the character string relative of the target category in the target dimension is often explained, and the more suitable the character string relative of the target category in the target dimension is for constructing an initial dictionary. Therefore, the representative degree corresponding to all candidate character strings in each target category and the target similarity degree of each target category under all target dimensions are comprehensively considered, and the target character string set for constructing the initial dictionary corresponding to each target category can be conveniently screened out. Finally, based on the initial dictionary corresponding to each target category, all fund settlement data in each target category are compressed through an LZW algorithm, so that the fund settlement data in each target category are accurately compressed, and compared with the data compression by adopting an empty initial dictionary, the method comprehensively considers a plurality of indexes related to the construction of the initial dictionary, such as the representative degree, the target similarity degree and the like, objectively screens out a relatively representative target character string from each target category, constructs the initial dictionary with a certain representative target character string, reduces the inserting operation of the dictionary to a certain extent, reduces the occupation of calculation resources, and improves the efficiency of the fund settlement data compression. And secondly, compared with directly compressing the fund settlement data set corresponding to the target blockchain, the method and the device compress the fund settlement data in each target category, the dictionary size during data compression through the LZW algorithm can be reduced to a certain extent, so that the load of the blockchain node can be reduced to a certain extent, and the compression efficiency of the fund settlement data can be improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a blockchain-based funds settlement data optimization processing method of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description is given below of the specific implementation, structure, features and effects of the technical solution according to the present invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The invention provides a block chain-based fund settlement data optimization processing method, which comprises the following steps:
acquiring a fund settlement data set corresponding to a target blockchain, wherein the fund settlement data in the fund settlement data set comprises a character string under each target dimension, and the time dimension is one of the target dimensions;
classifying the fund settlement data in the fund settlement data set according to the settlement time included in the fund settlement data, and determining each category obtained by classification as a target category, wherein the settlement time is a character string in a time dimension;
according to the number of times that all character strings of each target class in each target dimension appear in the target class, candidate character strings in each target dimension are screened out from each target class;
determining the representative degree corresponding to each candidate character string according to the length corresponding to each candidate character string and the occurrence times of each candidate character string in the belonging target category;
performing similarity analysis processing on all character strings of each target category in each target dimension to obtain target similarity of each target category in each target dimension;
screening target character strings from all candidate character strings in the target classes according to the representative degree corresponding to all candidate character strings in each target class and the target similarity degree of each target class under all target dimensions to obtain a target character string set corresponding to each target class;
constructing an initial dictionary corresponding to each target category according to the target character string set corresponding to each target category;
and compressing all fund settlement data in each target category through an LZW algorithm according to the initial dictionary corresponding to each target category.
The following detailed development of each step is performed:
referring to FIG. 1, a flow diagram of some embodiments of a blockchain-based funds settlement data optimization processing method in accordance with the present invention is shown. The capital settlement data optimization processing method based on the blockchain comprises the following steps:
and S1, acquiring a fund settlement data set corresponding to the target blockchain.
In some embodiments, a set of funds settlement data corresponding to the target blockchain may be obtained.
The target blockchain may be a blockchain to be data compressed, among other things. The funds settlement data may be funds settlement related data. For example, the funds settlement data may be a transaction record. The transaction record may be a record associated with a transaction. Transaction records may include, but are not limited to: transaction time, transaction location, transaction price, and transaction merchandise. The funds settlement data in the funds settlement data set may include a string in each of the target dimensions. The target dimension may be a dimension to which a character string included in the fund settlement data belongs. For example, the funds settlement data may include a settlement time, and the settlement time may be a string in a time dimension, which may be one of the target dimensions. For example, the settlement time may be a transaction time. The units corresponding to the respective character strings in the same target dimension may be the same. For example, each of the funds settlement data may include settlement time that is time in the same time zone.
It should be noted that, acquiring the fund settlement data set corresponding to the target blockchain can facilitate the subsequent compression of all the fund settlement data corresponding to the target blockchain.
As an example, all transaction records recorded by the target blockchain in a period to be data-compressed may be obtained from a database, and each transaction record is used as fund settlement data, where the period to be data-compressed may be a preset period to be data-compressed. For example, the duration corresponding to the period of time to be data compressed may be one year. If the transaction record includes: the transaction time, the transaction place, the transaction price and the transaction commodity, wherein the transaction time is the settlement time, and the target dimension to which the transaction time belongs can be the time dimension; the target dimension to which the transaction location belongs may be a location dimension; the target dimension to which the transaction price belongs may be a price dimension; the target dimension to which the transaction commodity belongs may be a commodity dimension.
And S2, classifying the fund settlement data in the fund settlement data set according to the settlement time included in the fund settlement data, and determining each classified category as a target category.
In some embodiments, the fund settlement data in the fund settlement data set may be classified according to the settlement time included in the fund settlement data, and each classification obtained by classification may be determined as a target classification.
It should be noted that, due to the characteristics of the funds settlement data, there is often a relationship with settlement time, for example, it may be desirable for a company or individual to conduct transactions over a specific period of time. For example, two companies may mostly conduct funds transactions between 8 hours and 10 hours per day. Therefore, the fund settlement data in the fund settlement data set is classified based on the settlement time included in the fund settlement data, more similar fund settlement data can be classified into one type, and the subsequent accurate compression of the fund settlement data in each target type can be facilitated.
As an example, this step may include the steps of:
the first step, segmenting a preset time period, and determining each sub-time period obtained by segmentation as a target time period.
The preset time period may be a preset time period. The duration corresponding to the preset time period may be 1 day.
For example, the preset time period may be divided into a plurality of sub-time periods with the preset time period as a division step, and each sub-time period may be taken as the target time period. The preset duration may be a preset duration. For example, the preset time period may be 2 hours. For example, if the preset duration is 2 hours, the time 0-2 may be a target time period; 2-4 may be one target time period, and so on, 12 target time periods may be obtained.
And secondly, dividing all fund settlement data of which the settlement time belongs to the same target time period into the same target category.
For example, the method for determining the correspondence of the target time period to which the settlement time belongs may be: unifying the units of the settlement time as time units in a target time period, taking the settlement time after unifying the units as target settlement time, and enabling the target time period to which the target settlement time belongs to be the target time period to which the corresponding settlement time belongs. For example, if a certain settlement time is 2023, 09, 25, 17, 09, 02 seconds, and each target period is in order: when 0 to 6, 6 to 12, 12 to 18, and 18 to 24, the settlement time unit is unified into a time unit in the target time zone, and then the obtained target settlement time may be 17, so the target time zone to which the settlement time belongs is 12 to 18.
And step S3, screening candidate character strings in each target dimension from each target category according to the occurrence times of all character strings in each target dimension in the target category.
In some embodiments, candidate strings for each target dimension may be screened from each target category based on the number of times that all strings for each target category for each target dimension occur in the belonging target category.
Wherein, all the character strings of the target category in each target dimension can be the character strings of the target dimension included in all the fund settlement data in the target category. Taking the time dimension as an example, all the strings of a certain target category in the time dimension may be settlement times included in all the funds settlement data in the target category. The candidate strings may be strings that repeatedly appear in the target class.
It should be noted that, since the more character strings appear in the target category, the more representative the description is, and the more may be used to construct the initial dictionary, the candidate character strings may be character strings that are initially screened and have a certain possibility of being used to construct the initial dictionary.
As an example, when the number of occurrences of the character string in the target dimension in the belonging target category is greater than a preset number threshold, the character string may be determined as a candidate character string in the target dimension. The preset number of times threshold may be a preset threshold. For example, the preset number of times threshold may be 1.
Optionally, the repeated character strings in each target dimension can be screened from each target category through a suffix tree method to serve as candidate character strings, so that a plurality of candidate character strings in each target dimension of each target category are obtained.
And S4, determining the representative degree corresponding to each candidate character string according to the length corresponding to each candidate character string and the occurrence times of the candidate character string in the belonging target category.
In some embodiments, the degree of representativeness to which each candidate string corresponds may be determined based on the length to which each candidate string corresponds and the number of times it occurs in the belonging target category.
The length corresponding to the candidate character string and the frequency of occurrence of the candidate character string in the target category can be positively correlated with the corresponding representative degree. The length to which the candidate character string corresponds may be the number of characters that the candidate character string includes.
It should be noted that, based on the length corresponding to each candidate character string and the number of times it appears in the belonging target category, the degree of representativeness corresponding to each candidate character string may be quantified.
As an example, the product of the length corresponding to each candidate character string and the number of times it appears in the belonging target category may be determined as the degree of representativeness corresponding to each candidate character string.
For example, the formula corresponding to determining the representative degree corresponding to the candidate character string may be:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is the firstiThe object category is atbUnder the object dimensionjThe degree of representativeness to which each candidate string corresponds.iIs the sequence number of the target class.bIs the sequence number of the target dimension.jIs the firstiThe object category is atbSequence number of candidate character string in each target dimension. />Is the firstiThe object category is atbUnder the object dimensionjThe length corresponding to each candidate string. />Is the firstiThe object category is atbUnder the object dimensionjThe candidate character string is at the firstiThe number of occurrences in the individual target categories.
When the following is performedThe larger the tends to explain the firstjThe longer the candidate strings. When->The larger the tends to explain the firstjThe candidate character string is at the firstiIndividual target categoriesThe more times of occurrence, tend to indicate the firstjThe candidate character string is at the firstiThe more frequently that appears in the individual target categories, the more often the description of the firstjThe more likely the candidate string is the firstiHigh frequency strings in the individual target classes. Thus->The larger the tends to explain the firstjThe more frequently the candidate strings appear, the longer the candidate strings are, which tends to indicate the firstjThe candidate character string is at the firstiThe more representative the target class, the more often the description isjThe more suitable the candidate strings are for constructing the firstiAn initial dictionary of target categories.
And S5, performing similarity analysis processing on all character strings of each target category in each target dimension to obtain the target similarity of each target category in each target dimension.
In some embodiments, the similarity analysis process may be performed on all the strings of each target category in each target dimension, so as to obtain the target similarity of each target category in each target dimension.
It should be noted that, the similarity analysis is performed on all the strings of each target category in each target dimension, so that the target similarity of each target category in each target dimension can be quantified. Secondly, when the target similarity degree of the target category in the target dimension is larger, the more similar the character strings of the target category in the target dimension are, the more representative the character string relative of the target category in the target dimension is, and the more suitable the character string relative of the target category in the target dimension is for constructing an initial dictionary.
As an example, the target similarity degree of the target category in the target dimension may be determined according to ASCII codes corresponding to all characters in all character strings of the target category in the target dimension.
For example, according to the information exchange standard (ASCII, american Standard Code for Information Interchange) codes corresponding to all characters in all character strings of the target category in the target dimension, the formula for determining the target similarity degree of the target category in the target dimension may be:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is the firstiThe object category is atbTarget similarity in the individual target dimensions.iIs the sequence number of the target class.bIs the sequence number of the target dimension. />Is the firstiThe object category is atbUnder the individual target dimension->The number of combinations of 2 strings is extracted from the different strings. />Is the firstiThe object category is atbNumber of different strings in each target dimension. />Is the firstiThe object category is atbUnder the object dimensionaAnd the corresponding similarity indexes of the combinations.aIs the firstiThe object category is atbSequence number of the combination in the individual target dimensions. />Is the firstiThe object category is atbUnder the object dimensionaThe two strings in the combination include the smaller of the number of characters.Is a function of absolute value.mIs the serial number of the character in the string. />Is the firstiThe object category is atbUnder the object dimensionaThe 1 st character string in the combinationmASCII code corresponding to the individual characters. />Is the firstiThe object category is atbUnder the object dimensionaThe 2 nd character string in the combinationmASCII code corresponding to the individual characters.Is a three-dimensional operation expression. If->Equal to 0, then1 is shown in the specification; if->Not equal to 0, thenIs 0.
When the following is performedThe larger the tends to explain the firstiThe object category is atbUnder the object dimensionaThe more similar the two strings in the respective combinations are relative. Thus->The larger the tends to explain the firstiThe object category is atbThe higher the similarity between strings in the target dimension is, the more often the description isiThe object category is atbThe higher the regularity between strings in the target dimension, the more often the description isiThe object category is atbThe more representative the character string relative in the target dimension, the more often the description isiThe object category is atbThe more character strings in the target dimension can be used to construct the firstiAn initial dictionary of target categories.
And S6, screening out target character strings from all candidate character strings in the target classes according to the representative degree corresponding to all candidate character strings in each target class and the target similarity degree of each target class under all target dimensions, and obtaining a target character string set corresponding to each target class.
In some embodiments, the target strings may be selected from all candidate strings in each target category according to the representativeness degree corresponding to all candidate strings in each target category and the target similarity degree of each target category in all target dimensions, so as to obtain a target string set corresponding to each target category.
It should be noted that, comprehensively considering the representative degrees corresponding to all candidate character strings in each target category and the target similarity degree of each target category under all target dimensions, the target character string set for constructing the initial dictionary corresponding to each target category can be conveniently screened out.
As an example, this step may include the steps of:
the first step, determining the target representative degree corresponding to each candidate character string according to the representative degree corresponding to each candidate character string and the target similarity degree of the target category of the candidate character string under the target dimension.
The representative degree corresponding to the candidate character strings and the target similarity degree of the target classes of the candidate character strings under the target dimension can be positively correlated with the corresponding target representative degree.
For example, the formula for determining the target representative degree corresponding to the candidate character string may be:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is the firstiThe object category is atbUnder the object dimensionjThe target representative degree corresponding to each candidate character string.iIs the sequence number of the target class.bIs the sequence number of the target dimension.jIs the firstiThe object category is atbSequence number of candidate character string in each target dimension. />Is the firstiThe object category is atbUnder the object dimensionjThe degree of representativeness to which each candidate string corresponds. />Is the firstiThe object category is atbTarget similarity in the individual target dimensions.
When the following is performedThe larger the tends to explain the firstjThe candidate character string is at the firstiThe more representative the target class, the more often the description isjThe more candidate strings can be used to construct the firstiAn initial dictionary of target categories. When->The larger the tends to explain the firstiThe object category is atbThe higher the similarity between strings in the target dimension is, the more often the description isiThe object category is atbThe more representative the character string relative in the target dimension, the more often the description isiThe object category is atbThe more character strings in the target dimension can be used to construct the firstiAn initial dictionary of target categories. Thus, when->The larger the tends to explain the firstjThe more candidate strings can be used to construct the firstiAn initial dictionary of target categories.
And secondly, determining the candidate character string as a target character string when the target representing degree corresponding to the candidate character string is larger than a preset representing threshold value.
The preset representative threshold may be a preset threshold. For example, the preset representative threshold may be 0.57.
And thirdly, combining all the target character strings in each target category into a target character string set.
Wherein, all target character strings in the target category can be target character strings included in all fund settlement data in the target category.
And S7, constructing an initial dictionary corresponding to each target category according to the target character string set corresponding to each target category.
In some embodiments, an initial dictionary corresponding to each target category may be constructed from a set of target strings corresponding to each target category.
The initial dictionary corresponding to the target category may include: each target character string in the target character string set corresponding to the target category.
It should be noted that, constructing the initial dictionary corresponding to each target category can facilitate the subsequent compression of all funds settlement data in each target category.
As an example, the target string set corresponding to the target category may be determined as an initial dictionary corresponding to the target category.
And S8, compressing all fund settlement data in each target category through an LZW algorithm according to the initial dictionary corresponding to each target category.
In some embodiments, all funds settlement data in each target category may be compressed by a string table compression (LZW, lempel-Ziv-Welch Encoding) algorithm based on the initial dictionary corresponding to each target category.
As an example, all funds settlement data in the target category may be compressed by the LZW algorithm according to the initial dictionary corresponding to the target category.
In summary, based on the initial dictionary corresponding to each target category, all the fund settlement data in each target category are compressed through the LZW algorithm, so that the fund settlement data in each target category are accurately compressed, and compared with the data compression by adopting the empty initial dictionary, the method comprehensively considers a plurality of indexes related to the construction of the initial dictionary, such as the representative degree, the target similarity degree and the like, objectively screens out the relatively representative target character strings from each target category, constructs the initial dictionary with a certain representative property by using the target character strings, reduces the inserting operation of the dictionary to a certain extent, reduces the occupation of computing resources, and improves the efficiency of the fund settlement data compression. And secondly, compared with directly compressing the fund settlement data set corresponding to the target blockchain, the method and the device compress the fund settlement data in each target category, the dictionary size during data compression through the LZW algorithm can be reduced to a certain extent, so that the load of the blockchain node can be reduced to a certain extent, and the compression efficiency of the fund settlement data can be improved.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention and are intended to be included within the scope of the invention.

Claims (6)

1. The block chain-based fund settlement data optimization processing method is characterized by comprising the following steps of:
acquiring a fund settlement data set corresponding to a target blockchain, wherein the fund settlement data in the fund settlement data set comprises a character string under each target dimension, and the time dimension is one of the target dimensions;
classifying the fund settlement data in the fund settlement data set according to settlement time included in the fund settlement data, and determining each category obtained by classification as a target category, wherein the settlement time is a character string in a time dimension;
according to the number of times that all character strings of each target class in each target dimension appear in the target class, candidate character strings in each target dimension are screened out from each target class;
determining the representative degree corresponding to each candidate character string according to the length corresponding to each candidate character string and the occurrence times of each candidate character string in the belonging target category;
performing similarity analysis processing on all character strings of each target category in each target dimension to obtain target similarity of each target category in each target dimension;
screening target character strings from all candidate character strings in each target category according to the representative degree corresponding to all candidate character strings in each target category and the target similarity degree of each target category under all target dimensions, and obtaining a target character string set corresponding to each target category;
constructing an initial dictionary corresponding to each target category according to the target character string set corresponding to each target category;
compressing all fund settlement data in each target category through an LZW algorithm according to the initial dictionary corresponding to each target category;
the length corresponding to the candidate character string and the frequency of occurrence of the candidate character string in the target category are positively correlated with the corresponding representative degree;
the determining the representative degree corresponding to each candidate character string according to the length corresponding to each candidate character string and the frequency of occurrence of each candidate character string in the target category comprises the following steps:
determining the product of the length corresponding to each candidate character string and the frequency of occurrence of the candidate character string in the target category as the representative degree corresponding to each candidate character string;
the step of screening target character strings from all candidate character strings in each target category according to the representative degree corresponding to all candidate character strings in each target category and the target similarity degree of each target category under all target dimensions to obtain a target character string set corresponding to each target category, comprising the following steps:
determining the target representative degree corresponding to each candidate character string according to the representative degree corresponding to each candidate character string and the target similarity degree of the target category to which the candidate character string belongs under the target dimension;
when the target representing degree corresponding to the candidate character string is larger than a preset representing threshold value, determining the candidate character string as a target character string;
combining all target character strings in each target category into a target character string set;
the representative degree corresponding to the candidate character strings and the target similarity degree of the target category of the candidate character strings under the target dimension are positively correlated with the corresponding target representative degree.
2. The method for optimizing processing of blockchain-based funds settlement data according to claim 1, wherein the classifying the funds settlement data in the funds settlement data set according to settlement time included in the funds settlement data and determining each category obtained by the classifying as a target category comprises:
segmenting a preset time period, and determining each sub-time period obtained by segmentation as a target time period;
and dividing all fund settlement data with settlement time belonging to the same target time period into the same target category.
3. The blockchain-based funds settlement data optimization processing method as in claim 1, wherein the step of screening candidate strings in each target dimension from each target category based on the number of occurrences of all strings in each target dimension in the belonging target category, comprises:
and when the occurrence frequency of the character string in the target dimension in the target category is greater than a preset frequency threshold, determining the character string as a candidate character string in the target dimension.
4. The method for optimizing and processing the funds settlement data based on the blockchain as in claim 1, wherein the step of analyzing and processing the similarity degree of all the character strings of each target category in each target dimension to obtain the target similarity degree of each target category in each target dimension comprises the following steps:
and determining the target similarity degree of the target category in the target dimension according to ASCII codes corresponding to all characters in all character strings of the target category in the target dimension.
5. The blockchain-based funds settlement data optimization processing method as in claim 4, wherein the formula corresponding to the target similarity of the target class in the target dimension is:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is the firstiThe object category is atbTarget similarity in the individual target dimensions;iis the sequence number of the target class;bis the sequence number of the target dimension; />Is the firstiThe object category is atbUnder the individual target dimension->The combination number of 2 character strings is taken out from the different character strings; />Is the firstiThe object category is atbThe number of different strings in the individual target dimensions; />Is the firstiThe object category is atbUnder the object dimensionaThe similarity indexes corresponding to the combinations;ais the firstiThe object category is atbSequence number of the combination under the individual target dimensions; />Is the firstiThe object category is atbUnder the object dimensionaA smaller value in the number of characters included in the two strings in the combination; />Taking an absolute value function;mis the serial number of the character in the character string; />Is the firstiThe object category is atbUnder the object dimensionaThe 1 st character string in the combinationmASCII codes corresponding to the individual characters; />Is the firstiThe object category is atbUnder the object dimensionaThe 2 nd character string in the combinationmASCII codes corresponding to the individual characters; />Is a three-dimensional operation expression.
6. The method for optimizing and processing the funds settlement data based on the blockchain as in claim 1, wherein the constructing the initial dictionary corresponding to each target category according to the target string set corresponding to each target category comprises:
and determining the target character string set corresponding to the target category as an initial dictionary corresponding to the target category.
CN202311321416.9A 2023-10-13 2023-10-13 Capital settlement data optimization processing method based on blockchain Active CN117081602B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311321416.9A CN117081602B (en) 2023-10-13 2023-10-13 Capital settlement data optimization processing method based on blockchain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311321416.9A CN117081602B (en) 2023-10-13 2023-10-13 Capital settlement data optimization processing method based on blockchain

Publications (2)

Publication Number Publication Date
CN117081602A CN117081602A (en) 2023-11-17
CN117081602B true CN117081602B (en) 2024-01-26

Family

ID=88719733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311321416.9A Active CN117081602B (en) 2023-10-13 2023-10-13 Capital settlement data optimization processing method based on blockchain

Country Status (1)

Country Link
CN (1) CN117081602B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105427583A (en) * 2015-11-27 2016-03-23 浙江工业大学 LZW-coding-based road traffic data compression method
JP2018006964A (en) * 2016-06-30 2018-01-11 株式会社日立製作所 Data compression method, gateway, and data transmission system
CN108023597A (en) * 2016-10-28 2018-05-11 沈阳高精数控智能技术股份有限公司 A kind of reliability of numerical control system data compression method
CN112968706A (en) * 2021-01-29 2021-06-15 上海联影医疗科技股份有限公司 Data compression method, FPGA chip and FPGA online upgrading method
CN116112434A (en) * 2023-04-12 2023-05-12 深圳市网联天下科技有限公司 Router data intelligent caching method and system
CN116634029A (en) * 2023-07-21 2023-08-22 众科云(北京)科技有限公司 Work platform data rapid transmission method based on block chain
CN116775589A (en) * 2023-08-23 2023-09-19 湖北华中电力科技开发有限责任公司 Data security protection method for network information

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105427583A (en) * 2015-11-27 2016-03-23 浙江工业大学 LZW-coding-based road traffic data compression method
JP2018006964A (en) * 2016-06-30 2018-01-11 株式会社日立製作所 Data compression method, gateway, and data transmission system
CN108023597A (en) * 2016-10-28 2018-05-11 沈阳高精数控智能技术股份有限公司 A kind of reliability of numerical control system data compression method
CN112968706A (en) * 2021-01-29 2021-06-15 上海联影医疗科技股份有限公司 Data compression method, FPGA chip and FPGA online upgrading method
CN116112434A (en) * 2023-04-12 2023-05-12 深圳市网联天下科技有限公司 Router data intelligent caching method and system
CN116634029A (en) * 2023-07-21 2023-08-22 众科云(北京)科技有限公司 Work platform data rapid transmission method based on block chain
CN116775589A (en) * 2023-08-23 2023-09-19 湖北华中电力科技开发有限责任公司 Data security protection method for network information

Also Published As

Publication number Publication date
CN117081602A (en) 2023-11-17

Similar Documents

Publication Publication Date Title
CN109492772B (en) Method and device for generating information
US11907659B2 (en) Item recall method and system, electronic device and readable storage medium
CN105144157B (en) System and method for the data in compressed data library
US20150032708A1 (en) Database analysis apparatus and method
US20170221153A1 (en) Systems and Methods for Use in Compressing Data Structures
CN112541338A (en) Similar text matching method and device, electronic equipment and computer storage medium
CN113449187A (en) Product recommendation method, device and equipment based on double portraits and storage medium
CN110647995A (en) Rule training method, device, equipment and storage medium
CN111680506A (en) External key mapping method and device of database table, electronic equipment and storage medium
CN114722014A (en) Batch data time sequence transmission method and system based on database log file
CN114741368A (en) Log data statistical method based on artificial intelligence and related equipment
WO2023004632A1 (en) Method and apparatus for updating knowledge graph, electronic device, storage medium, and program
CN113157853B (en) Problem mining method, device, electronic equipment and storage medium
CN108255411A (en) A kind of data compression method and device and uncompressing data and device
CN116881430B (en) Industrial chain identification method and device, electronic equipment and readable storage medium
CN111898378B (en) Industry classification method and device for government enterprise clients, electronic equipment and storage medium
CN111291037A (en) Data storage and query method, device, equipment and computer storage medium
CN117081602B (en) Capital settlement data optimization processing method based on blockchain
JP2022534160A (en) Methods and devices for outputting information, electronic devices, storage media, and computer programs
US20220199202A1 (en) Method and apparatus for compressing fastq data through character frequency-based sequence reordering
CN114998001A (en) Service class identification method, device, equipment, storage medium and program product
CN109299260B (en) Data classification method, device and computer readable storage medium
CN114610953A (en) Data classification method, device, equipment and storage medium
CN112785095A (en) Loan prediction method, loan prediction device, electronic device, and computer-readable storage medium
CN112529319A (en) Grading method and device based on multi-dimensional features, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant