CN117081602B

CN117081602B - Capital settlement data optimization processing method based on blockchain

Info

Publication number: CN117081602B
Application number: CN202311321416.9A
Authority: CN
Inventors: 贾庆佳; 刘永峰; 张志勇; 张磊; 王晓琳
Original assignee: Qingdao Off Site Market Clearing Center Co ltd
Current assignee: Qingdao Off Site Market Clearing Center Co ltd
Priority date: 2023-10-13
Filing date: 2023-10-13
Publication date: 2024-01-26
Anticipated expiration: 2043-10-13
Also published as: CN117081602A

Abstract

The invention relates to the technical field of data compression, in particular to a block chain-based fund settlement data optimization processing method, which comprises the following steps: acquiring a fund settlement data set corresponding to a target blockchain; classifying the funds settlement data in the funds settlement data set; screening candidate character strings under each target dimension from each target category; determining the corresponding representative degree of each candidate character string; performing similarity analysis processing on all character strings of each target category under each target dimension; screening target character strings from all candidate character strings in each target category; constructing an initial dictionary corresponding to each target category; and compressing all fund settlement data in each target category through an LZW algorithm according to the initial dictionary corresponding to each target category. The invention realizes the compression of the fund settlement data of the blockchain and improves the efficiency of the fund settlement data compression.

Description

Capital settlement data optimization processing method based on blockchain

Technical Field

The invention relates to the technical field of data compression, in particular to a block chain-based fund settlement data optimization processing method.

Background

In a blockchain-based funds settlement scenario, each participant may often become a node, they often maintain a complete copy of the blockchain, i.e., each node often stores more funds settlement data, which often needs to be compressed in order to save blockchain memory. Currently, when data is compressed, the following methods are generally adopted: data is compressed by an LZW algorithm, wherein the initial dictionary in the commonly used LZW algorithm is often empty.

However, when the fund settlement data is compressed by the LZW algorithm according to the empty initial dictionary, there are often the following technical problems:

because the initial dictionary is empty, the LZW algorithm often needs more insertion operation on the dictionary, a certain expense is often caused, and an additional calculation step is often needed, meanwhile, due to the huge data volume of the blockchain, the dictionary in the LZW algorithm is often very large, so that the load of the blockchain node is often increased, and further, the efficiency of fund settlement data compression is poor.

Disclosure of Invention

The summary of the invention is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. The summary of the invention is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In order to solve the technical problem of poor efficiency of fund settlement data compression, the invention provides a blockchain-based fund settlement data optimization processing method.

The invention provides a block chain-based fund settlement data optimization processing method, which comprises the following steps:

acquiring a fund settlement data set corresponding to a target blockchain, wherein the fund settlement data in the fund settlement data set comprises a character string under each target dimension, and the time dimension is one of the target dimensions;

classifying the fund settlement data in the fund settlement data set according to settlement time included in the fund settlement data, and determining each category obtained by classification as a target category, wherein the settlement time is a character string in a time dimension;

according to the number of times that all character strings of each target class in each target dimension appear in the target class, candidate character strings in each target dimension are screened out from each target class;

determining the representative degree corresponding to each candidate character string according to the length corresponding to each candidate character string and the occurrence times of each candidate character string in the belonging target category;

performing similarity analysis processing on all character strings of each target category in each target dimension to obtain target similarity of each target category in each target dimension;

screening target character strings from all candidate character strings in each target category according to the representative degree corresponding to all candidate character strings in each target category and the target similarity degree of each target category under all target dimensions, and obtaining a target character string set corresponding to each target category;

constructing an initial dictionary corresponding to each target category according to the target character string set corresponding to each target category;

and compressing all fund settlement data in each target category through an LZW algorithm according to the initial dictionary corresponding to each target category.

Optionally, the classifying the funds settlement data in the funds settlement data set according to settlement time included in the funds settlement data, and determining each category obtained by classification as a target category includes:

segmenting a preset time period, and determining each sub-time period obtained by segmentation as a target time period;

and dividing all fund settlement data with settlement time belonging to the same target time period into the same target category.

Optionally, the step of screening candidate character strings in each target dimension from each target category according to the number of times that all character strings in each target dimension appear in the target category, includes:

and when the occurrence frequency of the character string in the target dimension in the target category is greater than a preset frequency threshold, determining the character string as a candidate character string in the target dimension.

Optionally, the length corresponding to the candidate character string and the number of times of occurrence of the candidate character string in the belonging target category are positively correlated with the corresponding representative degree.

Optionally, the determining the representative degree corresponding to each candidate character string according to the length corresponding to each candidate character string and the frequency of occurrence of each candidate character string in the target category includes:

and determining the product of the length corresponding to each candidate character string and the frequency of occurrence of the candidate character string in the target category as the representative degree corresponding to each candidate character string.

Optionally, the analyzing the similarity degree of all the character strings of each target category in each target dimension to obtain the target similarity degree of each target category in each target dimension includes:

and determining the target similarity degree of the target category in the target dimension according to ASCII codes corresponding to all characters in all character strings of the target category in the target dimension.

Optionally, the formula corresponding to the target similarity of the target class in the target dimension is:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is the firstiThe object category is atbTarget similarity in the individual target dimensions;iis the sequence number of the target class;bis the sequence number of the target dimension; />Is the firstiThe object category is atbUnder the individual target dimension->The combination number of 2 character strings is taken out from the different character strings; />Is the firstiThe object category is atbThe number of different strings in the individual target dimensions; />Is the firstiThe object category is atbUnder the object dimensionaThe similarity indexes corresponding to the combinations;ais the firstiThe object category is atbSequence number of the combination under the individual target dimensions; />Is the firstiThe object category is atbUnder the object dimensionaA smaller value in the number of characters included in the two strings in the combination; />Taking an absolute value function;mis the serial number of the character in the character string; />Is the firstiThe object category is atbUnder the object dimensionaThe 1 st character string in the combinationmASCII codes corresponding to the individual characters; />Is the firstiThe object category is atbUnder the object dimensionaThe 2 nd character string in the combinationmASCII codes corresponding to the individual characters; />Is a three-dimensional operation expression.

Optionally, the step of screening the target strings from all the candidate strings in each target category according to the representative degrees corresponding to all the candidate strings in each target category and the target similarity degree of each target category under all the target dimensions to obtain a target string set corresponding to each target category includes:

determining the target representative degree corresponding to each candidate character string according to the representative degree corresponding to each candidate character string and the target similarity degree of the target category to which the candidate character string belongs under the target dimension;

when the target representing degree corresponding to the candidate character string is larger than a preset representing threshold value, determining the candidate character string as a target character string;

and combining all the target character strings in each target category into a target character string set.

Optionally, the representative degree corresponding to the candidate character string and the target similarity degree of the target category to which the candidate character string belongs under the target dimension are positively correlated with the corresponding target representative degree.

Optionally, the constructing an initial dictionary corresponding to each target category according to the target string set corresponding to each target category includes:

and determining the target character string set corresponding to the target category as an initial dictionary corresponding to the target category.

The invention has the following beneficial effects:

the method for optimizing and processing the fund settlement data based on the blockchain realizes the compression of the fund settlement data of the blockchain, solves the technical problem of poor efficiency of the compression of the fund settlement data, and improves the efficiency of the compression of the fund settlement data. Firstly, acquiring a fund settlement data set corresponding to a target blockchain can facilitate the subsequent compression of all fund settlement data corresponding to the target blockchain. Then, since the characteristics of the fund settlement data often have a certain relationship with the settlement time, the fund settlement data in the fund settlement data set is classified based on the settlement time included in the fund settlement data, so that more relatively similar fund settlement data can be classified into one type, and the subsequent accurate compression of the fund settlement data in each target type can be facilitated. Then, since the more character strings that appear in the belonging target category tend to be more representative, the more can be used to construct the initial dictionary, the degree of representativeness of each candidate character string can be quantified based on the length of each candidate character string and the number of times it appears in the belonging target category. Then, when the target similarity of the target category in the target dimension is larger, the more similar the respective character strings of the target category in the target dimension are often explained, the more representative the character string relative of the target category in the target dimension is often explained, and the more suitable the character string relative of the target category in the target dimension is for constructing an initial dictionary. Therefore, the representative degree corresponding to all candidate character strings in each target category and the target similarity degree of each target category under all target dimensions are comprehensively considered, and the target character string set for constructing the initial dictionary corresponding to each target category can be conveniently screened out. Finally, based on the initial dictionary corresponding to each target category, all fund settlement data in each target category are compressed through an LZW algorithm, so that the fund settlement data in each target category are accurately compressed, and compared with the data compression by adopting an empty initial dictionary, the method comprehensively considers a plurality of indexes related to the construction of the initial dictionary, such as the representative degree, the target similarity degree and the like, objectively screens out a relatively representative target character string from each target category, constructs the initial dictionary with a certain representative target character string, reduces the inserting operation of the dictionary to a certain extent, reduces the occupation of calculation resources, and improves the efficiency of the fund settlement data compression. And secondly, compared with directly compressing the fund settlement data set corresponding to the target blockchain, the method and the device compress the fund settlement data in each target category, the dictionary size during data compression through the LZW algorithm can be reduced to a certain extent, so that the load of the blockchain node can be reduced to a certain extent, and the compression efficiency of the fund settlement data can be improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a blockchain-based funds settlement data optimization processing method of the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description is given below of the specific implementation, structure, features and effects of the technical solution according to the present invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

classifying the fund settlement data in the fund settlement data set according to the settlement time included in the fund settlement data, and determining each category obtained by classification as a target category, wherein the settlement time is a character string in a time dimension;

screening target character strings from all candidate character strings in the target classes according to the representative degree corresponding to all candidate character strings in each target class and the target similarity degree of each target class under all target dimensions to obtain a target character string set corresponding to each target class;

The following detailed development of each step is performed:

referring to FIG. 1, a flow diagram of some embodiments of a blockchain-based funds settlement data optimization processing method in accordance with the present invention is shown. The capital settlement data optimization processing method based on the blockchain comprises the following steps:

and S1, acquiring a fund settlement data set corresponding to the target blockchain.

In some embodiments, a set of funds settlement data corresponding to the target blockchain may be obtained.

The target blockchain may be a blockchain to be data compressed, among other things. The funds settlement data may be funds settlement related data. For example, the funds settlement data may be a transaction record. The transaction record may be a record associated with a transaction. Transaction records may include, but are not limited to: transaction time, transaction location, transaction price, and transaction merchandise. The funds settlement data in the funds settlement data set may include a string in each of the target dimensions. The target dimension may be a dimension to which a character string included in the fund settlement data belongs. For example, the funds settlement data may include a settlement time, and the settlement time may be a string in a time dimension, which may be one of the target dimensions. For example, the settlement time may be a transaction time. The units corresponding to the respective character strings in the same target dimension may be the same. For example, each of the funds settlement data may include settlement time that is time in the same time zone.

It should be noted that, acquiring the fund settlement data set corresponding to the target blockchain can facilitate the subsequent compression of all the fund settlement data corresponding to the target blockchain.

As an example, all transaction records recorded by the target blockchain in a period to be data-compressed may be obtained from a database, and each transaction record is used as fund settlement data, where the period to be data-compressed may be a preset period to be data-compressed. For example, the duration corresponding to the period of time to be data compressed may be one year. If the transaction record includes: the transaction time, the transaction place, the transaction price and the transaction commodity, wherein the transaction time is the settlement time, and the target dimension to which the transaction time belongs can be the time dimension; the target dimension to which the transaction location belongs may be a location dimension; the target dimension to which the transaction price belongs may be a price dimension; the target dimension to which the transaction commodity belongs may be a commodity dimension.

And S2, classifying the fund settlement data in the fund settlement data set according to the settlement time included in the fund settlement data, and determining each classified category as a target category.

In some embodiments, the fund settlement data in the fund settlement data set may be classified according to the settlement time included in the fund settlement data, and each classification obtained by classification may be determined as a target classification.

It should be noted that, due to the characteristics of the funds settlement data, there is often a relationship with settlement time, for example, it may be desirable for a company or individual to conduct transactions over a specific period of time. For example, two companies may mostly conduct funds transactions between 8 hours and 10 hours per day. Therefore, the fund settlement data in the fund settlement data set is classified based on the settlement time included in the fund settlement data, more similar fund settlement data can be classified into one type, and the subsequent accurate compression of the fund settlement data in each target type can be facilitated.

As an example, this step may include the steps of:

the first step, segmenting a preset time period, and determining each sub-time period obtained by segmentation as a target time period.

The preset time period may be a preset time period. The duration corresponding to the preset time period may be 1 day.

For example, the preset time period may be divided into a plurality of sub-time periods with the preset time period as a division step, and each sub-time period may be taken as the target time period. The preset duration may be a preset duration. For example, the preset time period may be 2 hours. For example, if the preset duration is 2 hours, the time 0-2 may be a target time period; 2-4 may be one target time period, and so on, 12 target time periods may be obtained.

And secondly, dividing all fund settlement data of which the settlement time belongs to the same target time period into the same target category.

For example, the method for determining the correspondence of the target time period to which the settlement time belongs may be: unifying the units of the settlement time as time units in a target time period, taking the settlement time after unifying the units as target settlement time, and enabling the target time period to which the target settlement time belongs to be the target time period to which the corresponding settlement time belongs. For example, if a certain settlement time is 2023, 09, 25, 17, 09, 02 seconds, and each target period is in order: when 0 to 6, 6 to 12, 12 to 18, and 18 to 24, the settlement time unit is unified into a time unit in the target time zone, and then the obtained target settlement time may be 17, so the target time zone to which the settlement time belongs is 12 to 18.

And step S3, screening candidate character strings in each target dimension from each target category according to the occurrence times of all character strings in each target dimension in the target category.

In some embodiments, candidate strings for each target dimension may be screened from each target category based on the number of times that all strings for each target category for each target dimension occur in the belonging target category.

Wherein, all the character strings of the target category in each target dimension can be the character strings of the target dimension included in all the fund settlement data in the target category. Taking the time dimension as an example, all the strings of a certain target category in the time dimension may be settlement times included in all the funds settlement data in the target category. The candidate strings may be strings that repeatedly appear in the target class.

It should be noted that, since the more character strings appear in the target category, the more representative the description is, and the more may be used to construct the initial dictionary, the candidate character strings may be character strings that are initially screened and have a certain possibility of being used to construct the initial dictionary.

As an example, when the number of occurrences of the character string in the target dimension in the belonging target category is greater than a preset number threshold, the character string may be determined as a candidate character string in the target dimension. The preset number of times threshold may be a preset threshold. For example, the preset number of times threshold may be 1.

Optionally, the repeated character strings in each target dimension can be screened from each target category through a suffix tree method to serve as candidate character strings, so that a plurality of candidate character strings in each target dimension of each target category are obtained.

And S4, determining the representative degree corresponding to each candidate character string according to the length corresponding to each candidate character string and the occurrence times of the candidate character string in the belonging target category.

In some embodiments, the degree of representativeness to which each candidate string corresponds may be determined based on the length to which each candidate string corresponds and the number of times it occurs in the belonging target category.

The length corresponding to the candidate character string and the frequency of occurrence of the candidate character string in the target category can be positively correlated with the corresponding representative degree. The length to which the candidate character string corresponds may be the number of characters that the candidate character string includes.

It should be noted that, based on the length corresponding to each candidate character string and the number of times it appears in the belonging target category, the degree of representativeness corresponding to each candidate character string may be quantified.

As an example, the product of the length corresponding to each candidate character string and the number of times it appears in the belonging target category may be determined as the degree of representativeness corresponding to each candidate character string.

For example, the formula corresponding to determining the representative degree corresponding to the candidate character string may be:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is the firstiThe object category is atbUnder the object dimensionjThe degree of representativeness to which each candidate string corresponds.iIs the sequence number of the target class.bIs the sequence number of the target dimension.jIs the firstiThe object category is atbSequence number of candidate character string in each target dimension. />Is the firstiThe object category is atbUnder the object dimensionjThe length corresponding to each candidate string. />Is the firstiThe object category is atbUnder the object dimensionjThe candidate character string is at the firstiThe number of occurrences in the individual target categories.

When the following is performedThe larger the tends to explain the firstjThe longer the candidate strings. When->The larger the tends to explain the firstjThe candidate character string is at the firstiIndividual target categoriesThe more times of occurrence, tend to indicate the firstjThe candidate character string is at the firstiThe more frequently that appears in the individual target categories, the more often the description of the firstjThe more likely the candidate string is the firstiHigh frequency strings in the individual target classes. Thus->The larger the tends to explain the firstjThe more frequently the candidate strings appear, the longer the candidate strings are, which tends to indicate the firstjThe candidate character string is at the firstiThe more representative the target class, the more often the description isjThe more suitable the candidate strings are for constructing the firstiAn initial dictionary of target categories.

And S5, performing similarity analysis processing on all character strings of each target category in each target dimension to obtain the target similarity of each target category in each target dimension.

In some embodiments, the similarity analysis process may be performed on all the strings of each target category in each target dimension, so as to obtain the target similarity of each target category in each target dimension.

It should be noted that, the similarity analysis is performed on all the strings of each target category in each target dimension, so that the target similarity of each target category in each target dimension can be quantified. Secondly, when the target similarity degree of the target category in the target dimension is larger, the more similar the character strings of the target category in the target dimension are, the more representative the character string relative of the target category in the target dimension is, and the more suitable the character string relative of the target category in the target dimension is for constructing an initial dictionary.

As an example, the target similarity degree of the target category in the target dimension may be determined according to ASCII codes corresponding to all characters in all character strings of the target category in the target dimension.

For example, according to the information exchange standard (ASCII, american Standard Code for Information Interchange) codes corresponding to all characters in all character strings of the target category in the target dimension, the formula for determining the target similarity degree of the target category in the target dimension may be:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is the firstiThe object category is atbTarget similarity in the individual target dimensions.iIs the sequence number of the target class.bIs the sequence number of the target dimension. />Is the firstiThe object category is atbUnder the individual target dimension->The number of combinations of 2 strings is extracted from the different strings. />Is the firstiThe object category is atbNumber of different strings in each target dimension. />Is the firstiThe object category is atbUnder the object dimensionaAnd the corresponding similarity indexes of the combinations.aIs the firstiThe object category is atbSequence number of the combination in the individual target dimensions. />Is the firstiThe object category is atbUnder the object dimensionaThe two strings in the combination include the smaller of the number of characters.Is a function of absolute value.mIs the serial number of the character in the string. />Is the firstiThe object category is atbUnder the object dimensionaThe 1 st character string in the combinationmASCII code corresponding to the individual characters. />Is the firstiThe object category is atbUnder the object dimensionaThe 2 nd character string in the combinationmASCII code corresponding to the individual characters.Is a three-dimensional operation expression. If->Equal to 0, then1 is shown in the specification; if->Not equal to 0, thenIs 0.

When the following is performedThe larger the tends to explain the firstiThe object category is atbUnder the object dimensionaThe more similar the two strings in the respective combinations are relative. Thus->The larger the tends to explain the firstiThe object category is atbThe higher the similarity between strings in the target dimension is, the more often the description isiThe object category is atbThe higher the regularity between strings in the target dimension, the more often the description isiThe object category is atbThe more representative the character string relative in the target dimension, the more often the description isiThe object category is atbThe more character strings in the target dimension can be used to construct the firstiAn initial dictionary of target categories.

And S6, screening out target character strings from all candidate character strings in the target classes according to the representative degree corresponding to all candidate character strings in each target class and the target similarity degree of each target class under all target dimensions, and obtaining a target character string set corresponding to each target class.

In some embodiments, the target strings may be selected from all candidate strings in each target category according to the representativeness degree corresponding to all candidate strings in each target category and the target similarity degree of each target category in all target dimensions, so as to obtain a target string set corresponding to each target category.

It should be noted that, comprehensively considering the representative degrees corresponding to all candidate character strings in each target category and the target similarity degree of each target category under all target dimensions, the target character string set for constructing the initial dictionary corresponding to each target category can be conveniently screened out.

As an example, this step may include the steps of:

the first step, determining the target representative degree corresponding to each candidate character string according to the representative degree corresponding to each candidate character string and the target similarity degree of the target category of the candidate character string under the target dimension.

The representative degree corresponding to the candidate character strings and the target similarity degree of the target classes of the candidate character strings under the target dimension can be positively correlated with the corresponding target representative degree.

For example, the formula for determining the target representative degree corresponding to the candidate character string may be:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is the firstiThe object category is atbUnder the object dimensionjThe target representative degree corresponding to each candidate character string.iIs the sequence number of the target class.bIs the sequence number of the target dimension.jIs the firstiThe object category is atbSequence number of candidate character string in each target dimension. />Is the firstiThe object category is atbUnder the object dimensionjThe degree of representativeness to which each candidate string corresponds. />Is the firstiThe object category is atbTarget similarity in the individual target dimensions.

When the following is performedThe larger the tends to explain the firstjThe candidate character string is at the firstiThe more representative the target class, the more often the description isjThe more candidate strings can be used to construct the firstiAn initial dictionary of target categories. When->The larger the tends to explain the firstiThe object category is atbThe higher the similarity between strings in the target dimension is, the more often the description isiThe object category is atbThe more representative the character string relative in the target dimension, the more often the description isiThe object category is atbThe more character strings in the target dimension can be used to construct the firstiAn initial dictionary of target categories. Thus, when->The larger the tends to explain the firstjThe more candidate strings can be used to construct the firstiAn initial dictionary of target categories.

And secondly, determining the candidate character string as a target character string when the target representing degree corresponding to the candidate character string is larger than a preset representing threshold value.

The preset representative threshold may be a preset threshold. For example, the preset representative threshold may be 0.57.

And thirdly, combining all the target character strings in each target category into a target character string set.

Wherein, all target character strings in the target category can be target character strings included in all fund settlement data in the target category.

And S7, constructing an initial dictionary corresponding to each target category according to the target character string set corresponding to each target category.

In some embodiments, an initial dictionary corresponding to each target category may be constructed from a set of target strings corresponding to each target category.

The initial dictionary corresponding to the target category may include: each target character string in the target character string set corresponding to the target category.

It should be noted that, constructing the initial dictionary corresponding to each target category can facilitate the subsequent compression of all funds settlement data in each target category.

As an example, the target string set corresponding to the target category may be determined as an initial dictionary corresponding to the target category.

And S8, compressing all fund settlement data in each target category through an LZW algorithm according to the initial dictionary corresponding to each target category.

In some embodiments, all funds settlement data in each target category may be compressed by a string table compression (LZW, lempel-Ziv-Welch Encoding) algorithm based on the initial dictionary corresponding to each target category.

As an example, all funds settlement data in the target category may be compressed by the LZW algorithm according to the initial dictionary corresponding to the target category.

In summary, based on the initial dictionary corresponding to each target category, all the fund settlement data in each target category are compressed through the LZW algorithm, so that the fund settlement data in each target category are accurately compressed, and compared with the data compression by adopting the empty initial dictionary, the method comprehensively considers a plurality of indexes related to the construction of the initial dictionary, such as the representative degree, the target similarity degree and the like, objectively screens out the relatively representative target character strings from each target category, constructs the initial dictionary with a certain representative property by using the target character strings, reduces the inserting operation of the dictionary to a certain extent, reduces the occupation of computing resources, and improves the efficiency of the fund settlement data compression. And secondly, compared with directly compressing the fund settlement data set corresponding to the target blockchain, the method and the device compress the fund settlement data in each target category, the dictionary size during data compression through the LZW algorithm can be reduced to a certain extent, so that the load of the blockchain node can be reduced to a certain extent, and the compression efficiency of the fund settlement data can be improved.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention and are intended to be included within the scope of the invention.

Claims

1. The block chain-based fund settlement data optimization processing method is characterized by comprising the following steps of:

compressing all fund settlement data in each target category through an LZW algorithm according to the initial dictionary corresponding to each target category;

the length corresponding to the candidate character string and the frequency of occurrence of the candidate character string in the target category are positively correlated with the corresponding representative degree;

the determining the representative degree corresponding to each candidate character string according to the length corresponding to each candidate character string and the frequency of occurrence of each candidate character string in the target category comprises the following steps:

determining the product of the length corresponding to each candidate character string and the frequency of occurrence of the candidate character string in the target category as the representative degree corresponding to each candidate character string;

the step of screening target character strings from all candidate character strings in each target category according to the representative degree corresponding to all candidate character strings in each target category and the target similarity degree of each target category under all target dimensions to obtain a target character string set corresponding to each target category, comprising the following steps:

combining all target character strings in each target category into a target character string set;

the representative degree corresponding to the candidate character strings and the target similarity degree of the target category of the candidate character strings under the target dimension are positively correlated with the corresponding target representative degree.

2. The method for optimizing processing of blockchain-based funds settlement data according to claim 1, wherein the classifying the funds settlement data in the funds settlement data set according to settlement time included in the funds settlement data and determining each category obtained by the classifying as a target category comprises:

3. The blockchain-based funds settlement data optimization processing method as in claim 1, wherein the step of screening candidate strings in each target dimension from each target category based on the number of occurrences of all strings in each target dimension in the belonging target category, comprises:

4. The method for optimizing and processing the funds settlement data based on the blockchain as in claim 1, wherein the step of analyzing and processing the similarity degree of all the character strings of each target category in each target dimension to obtain the target similarity degree of each target category in each target dimension comprises the following steps:

5. The blockchain-based funds settlement data optimization processing method as in claim 4, wherein the formula corresponding to the target similarity of the target class in the target dimension is:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is the firstiThe object category is atbTarget similarity in the individual target dimensions;iis the sequence number of the target class;bis the sequence number of the target dimension; />Is the firstiThe object category is atbUnder the individual target dimension->The combination number of 2 character strings is taken out from the different character strings; />Is the firstiThe object category is atbThe number of different strings in the individual target dimensions; />Is the firstiThe object category is atbUnder the object dimensionaThe similarity indexes corresponding to the combinations;ais the firstiThe object category is atbSequence number of the combination under the individual target dimensions; />Is the firstiThe object category is atbUnder the object dimensionaA smaller value in the number of characters included in the two strings in the combination; />Taking an absolute value function;mis the serial number of the character in the character string; />Is the firstiThe object category is atbUnder the object dimensionaThe 1 st character string in the combinationmASCII codes corresponding to the individual characters; />Is the firstiThe object category is atbUnder the object dimensionaThe 2 nd character string in the combinationmASCII codes corresponding to the individual characters; />Is a three-dimensional operation expression.

6. The method for optimizing and processing the funds settlement data based on the blockchain as in claim 1, wherein the constructing the initial dictionary corresponding to each target category according to the target string set corresponding to each target category comprises: