CN113505273A

CN113505273A - Data sorting method, device, equipment and medium based on repeated data screening

Info

Publication number: CN113505273A
Application number: CN202110566211.1A
Authority: CN
Inventors: 李珊
Original assignee: Ping An Bank Co Ltd
Current assignee: Ping An Bank Co Ltd
Priority date: 2021-05-24
Filing date: 2021-05-24
Publication date: 2021-10-15
Anticipated expiration: 2041-05-24
Also published as: CN113505273B

Abstract

The invention relates to the field of intelligent decision making, and discloses a data sorting method based on repeated data screening, which comprises the following steps: performing relevance screening sorting on a preset resource data set according to a received query request to obtain a query result sequence; performing label classification on the query result sequence to obtain a first classification result sequence; carrying out relevance classification on the first classification result sequence to obtain a second classification result sequence; screening repeated data of the second classification result sequence, and performing index reduction calculation on the screened repeated data to obtain a third classification result sequence; and sequencing all the resource data in the third classification result sequence according to the corresponding relevance scores of all the resource data to obtain a target query result sequence. The invention also relates to a block chain technology, and the query result sequence can be stored in a block chain node. The invention also provides a data sorting device, equipment and medium based on the repeated data screening. The invention can improve the efficiency of data sorting.

Description

Data sorting method, device, equipment and medium based on repeated data screening

Technical Field

The invention relates to the field of intelligent decision making, in particular to a data sorting method and device based on repeated data screening, electronic equipment and a readable storage medium.

Background

At present, data sorting is widely applied in the fields of data retrieval and data recommendation. In such a retrieval and recommendation scenario, the relevance of the retrieved or recommended data is generally scored, and all data are displayed in descending order from high score to low score.

However, since the retrieved or recommended data is usually very rich and even can be duplicated, the current data sorting method has a problem that the same or similar data is piled together for display, and the similar content is piled to cover a large amount of display space, so that the acquisition of effective information becomes difficult, and the efficiency of data sorting is low.

Disclosure of Invention

The invention provides a data sorting method, a data sorting device, electronic equipment and a computer readable storage medium based on repeated data screening, and mainly aims to improve the efficiency of data sorting.

In order to achieve the above object, the data sorting method based on repeated data screening provided by the present invention includes:

performing relevance screening sorting on a preset resource data set according to a received query request to obtain a query result sequence;

performing label classification on the query result sequence to obtain a first classification result sequence;

carrying out relevancy classification on the first classification result sequence to obtain a second classification result sequence;

performing repeated data screening on the second classification result sequence, and performing index reduction calculation on the screened repeated data to obtain a third classification result sequence;

sequencing all the resource data in the third classification result sequence according to the corresponding relevance scores of each resource data to obtain a target query result sequence;

and sending the target query result sequence to the terminal equipment corresponding to the query request.

Optionally, the performing relevance screening sorting on a preset resource data set according to the query request to obtain a query result sequence includes:

extracting a query field in the query request, and converting the query field into a vector to obtain a query vector;

converting each resource data in the resource data set into a vector to obtain a corresponding resource vector;

calculating the relevance of the query vector and the resource vector to obtain a corresponding relevance score;

screening the resource data of which the relevancy score is greater than a preset relevancy in the resource data set to obtain the initial query result sequence;

and sequencing all resource data in the initial query result sequence according to the corresponding relevance scores to obtain the query result sequence.

Optionally, the performing correlation classification on the first classification result sequence to obtain a second classification result sequence includes:

constructing a score interval according to the query result sequence;

and classifying the first classification result sequence by using the score interval to obtain the second classification result sequence.

Optionally, the constructing a score interval according to the query result sequence includes:

screening the maximum correlation score of the query result sequence to obtain first interval data;

screening the minimum relevance score of the query result sequence to obtain second interval data;

carrying out average calculation on the first interval data and the second interval data to obtain third interval data;

and constructing two continuous intervals by taking the first interval data, the second interval data and the third interval data as interval end point values to obtain the score interval.

Optionally, the screening the repeated data of the second classification result sequence, and performing exponential score reduction calculation on the screened repeated data to obtain a third classification result sequence includes:

coding each resource data in the second classification result sequence by using a preset algorithm to obtain a corresponding data code;

calculating the text distance of any two data codes in all the data codes corresponding to the second classification result sequence;

determining the text distance smaller than a preset threshold value as a similar text distance;

performing association classification on all resource data corresponding to the similar text distance in the second classification result sequence to obtain a repeated data list;

and performing exponential reduction calculation on the resource data in the repeated data list corresponding to the second classification result sequence to obtain the third classification result sequence.

Optionally, the performing associated classification on all resource data corresponding to the similar text distance in the second classification result sequence to obtain a repeated data list includes:

performing tree classification by taking resource data corresponding to all similar text distances in the second classification result sequence as nodes to obtain a classification tree;

and sequencing all the resource data corresponding to the classification tree according to the corresponding relevance scores of all the resource data to obtain the repeated data list.

Optionally, the performing exponential score reduction calculation on the resource data in the repeated data list corresponding to the second classification result sequence to obtain the third classification result sequence includes:

performing index calculation on the preset sorting position in the repeated data list corresponding to the second sorting result sequence and the relevance scores corresponding to all the subsequent resource data to obtain the corresponding updated relevance score;

and replacing the corresponding relevance score by using the updated relevance score to obtain the third classification result sequence.

In order to solve the above problem, the present invention further provides a data sorting apparatus based on repeated data screening, the apparatus including:

the data classification module is used for performing relevance screening and sorting on a preset resource data set according to the received query request to obtain a query result sequence; performing label classification on the query result sequence to obtain a first classification result sequence; carrying out relevancy classification on the first classification result sequence to obtain a second classification result sequence;

the data screening module is used for screening the repeated data of the second classification result sequence and performing index reduction calculation on the screened repeated data to obtain a third classification result sequence;

the data sorting module is used for sorting all the resource data in the third classification result sequence according to the corresponding relevance scores of all the resource data to obtain a target query result sequence; and sending the target query result sequence to the terminal equipment corresponding to the query request.

In order to solve the above problem, the present invention also provides an electronic device, including:

a memory storing at least one computer program; and

and the processor executes the computer program stored in the memory to realize the data sorting method based on repeated data screening.

In order to solve the above problem, the present invention further provides a computer-readable storage medium, in which at least one computer program is stored, and the at least one computer program is executed by a processor in an electronic device to implement the data sorting method based on repeated data filtering described above.

In the embodiment of the invention, a query result sequence is obtained by performing relevance screening and sorting on a preset resource data set according to a received query request; performing label classification on the query result sequence to obtain a first classification result sequence, classifying data with different labels, and avoiding the display of the same-class data in a bundled mode; classifying the relevance of the first classification result sequence to obtain a second classification result sequence, classifying the data of each class of labels according to the high relevance and the local relevance, and sorting the data based on the repeated data screening more uniformly; performing repeated data screening on the second classification result sequence, and performing index reduction calculation on the screened repeated data to obtain a third classification result sequence; sorting all the resource data in the third classification result sequence according to the corresponding relevance scores of each resource data to obtain a target query result sequence, and reducing the relevance scores of the repeated data to avoid bundling similar data, so that sorted data is displayed more variously, and the efficiency of sorting the data based on the screening of the repeated data is improved; and sending the target query result sequence to the terminal equipment corresponding to the query request. Therefore, the data sorting method, the data sorting device, the electronic equipment and the readable storage medium based on the repeated data screening, which are provided by the embodiment of the invention, improve the efficiency of data sorting based on the repeated data screening.

Drawings

Fig. 1 is a schematic flowchart of a data sorting method based on repeated data screening according to an embodiment of the present invention;

FIG. 2 is a block diagram of a data sorting apparatus based on data duplication filtering according to an embodiment of the present invention;

fig. 3 is a schematic internal structural diagram of an electronic device implementing a data sorting method based on repeated data screening according to an embodiment of the present invention;

the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The embodiment of the invention provides a data sorting method based on repeated data screening. The execution subject of the data sorting method based on repeated data screening includes, but is not limited to, at least one of electronic devices, such as a server, a terminal, and the like, which can be configured to execute the method provided by the embodiments of the present application. In other words, the data sorting method based on the repeated data screening may be performed by software or hardware installed in the terminal device or the server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.

Referring to fig. 1, which is a schematic flow chart of a data sorting method based on data duplication screening according to an embodiment of the present invention, in an embodiment of the present invention, the data sorting method based on data duplication screening includes:

s1, performing relevance screening and sorting on a preset resource data set according to the received query request to obtain a query result sequence;

in this embodiment of the present invention, the query request includes: a query field, wherein the resource data set is a set containing different resource data, wherein the resource data can be consulting data, product data, activity data, and the like.

In detail, in the embodiment of the present invention, a query field in the query request is extracted, and the query field is converted into a vector to obtain a query vector; converting each resource data in the resource data set into a vector to obtain a corresponding resource vector; calculating the relevance of the query vector and the resource vector to obtain a corresponding relevance score; performing data screening on the resource data set according to the relevancy scores to obtain an initial query result sequence; and sequencing all resource data in the initial query result sequence according to the corresponding relevance scores to obtain the query result sequence.

Optionally, in the embodiment of the present invention, a Word2vec model formed by transfer learning training may be used for vector conversion based on a preset text (e.g., teaching materials, training materials) based on professional domain knowledge. Further, in the embodiment of the present invention, the resource data in the resource data set whose relevancy score is greater than a preset relevancy value is screened, so as to obtain the initial query result sequence.

Optionally, in the embodiment of the present invention, the correlation may be calculated by using the following formula:

wherein, X_iThe i-th element, Y, representing the query vector X_iFor the ith element of resource vector Y, n represents Sim represents the relevance score of query vector X and resource vector Y.

In another embodiment of the present invention, the query result sequence may be stored in a blockchain node, and the access efficiency of data is improved by using the high throughput characteristic of the blockchain.

S2, performing label classification on the query result sequence to obtain a first classification result sequence;

in the embodiment of the present invention, the first classification result sequence includes resource data of different attribute categories, and in order to better sequence the resource data in the query result sequence, the query result sequence is subjected to tag classification to obtain a first classification result sequence.

In detail, in the implementation of the present invention, the performing label classification on the query result sequence includes: performing label marking on each resource data in the query result sequence by using a preset label classification model to obtain a label query result sequence; and classifying all resource data in the label query result sequence according to different labels to obtain the corresponding first classification result sequence.

In the embodiment of the present invention, since the labels corresponding to different resource data in the query result sequence are different, and each type of label corresponds to one first classification result sequence, the query result sequence is subjected to label classification, and a plurality of first classification result sequences are obtained, for example: all resource data in the query result sequence can be labeled by 4 types of labels, and each type of label corresponds to one first classification result sequence, so that 4 first classification result sequences can be obtained in total.

In the embodiment of the present invention, the label classification model may be a deep learning model constructed by a Bert network.

In detail, before the tag marking is performed on each resource data in the query result sequence by using a preset tag classification model in the embodiment of the present invention, the method further includes: acquiring a historical resource data set; in one application scenario of the present invention, the preset label may include: information item labels, credit card product labels, insurance product labels, shopping category labels, preferential event labels, etc.; and performing iterative training on the pre-constructed deep learning model by using the training set to obtain the label classification model.

Optionally, in another embodiment of the present invention, each resource data in the first classification result sequence includes a corresponding tag, and it is not necessary to label each resource data in the query result sequence by using a preset tag classification model, and the resource data corresponding to the same resource class tag in the first classification result sequence is summarized to obtain a corresponding first classification result sequence.

S3, carrying out relevance classification on the first classification result sequence to obtain a second classification result sequence;

in this embodiment of the present invention, as can be seen from the above S1, each resource data in the first classification result sequence has a relevance score, and in order to ensure that the ordering of all resource data in the first classification result sequence is more balanced, and prevent the subsequent resource data with a high relevance score from interfering with the resource data with a low relevance score and affecting the accuracy of data ordering, the embodiment of the present invention further performs relevance classification on the first classification result sequence to obtain a second classification result sequence.

In detail, in the embodiment of the present invention, the performing correlation classification on the first classification result sequence includes: and constructing a score interval according to the query result sequence, and classifying the first classification result sequence by using the score interval to obtain the second classification result sequence.

Further, in the embodiment of the present invention, a score interval is constructed according to the query result sequence; the method comprises the following steps: screening the maximum correlation score and the minimum correlation score of the query result sequence, and carrying out average calculation on the maximum correlation score and the minimum correlation score to obtain an average correlation score; and constructing two continuous intervals by taking the maximum relevance score, the minimum relevance score and the average relevance score as interval endpoint values to obtain the score interval. For example: the maximum relevance score is 10, the minimum relevance score is 0, then the average relevance score is (10+0)/2 is 5, 0,5 and 10 are constructed into two continuous intervals, and the score intervals are [0,5] and (5,10 ]. optionally, in another embodiment of the present invention, the score intervals can be adjusted according to practical experience or business requirements, in an embodiment of the present invention, each first classification result sequence corresponds to a plurality of second classification result sequences, the number of the second classification result sequences corresponding to each first classification result sequence is determined by the number of intervals included in the score intervals, for example, a branch interval includes two intervals, then the number of the second classification result sequences corresponding to each first classification result sequence is 2, in another embodiment of the present invention, the relevance classification is performed on the first classification result sequences, the method comprises the following steps: sequencing all resource data in the first classification result sequence in sequence according to the corresponding relevance scores to obtain a standard first classification result sequence; and classifying the data in the standard first classification result sequence according to a preset sequencing percentage to obtain the second classification result sequence. Such as: and if the preset sequencing percentage is 50%, and the standard first classification result sequence has 10 resource data in total, classifying the resource data sequenced in the first 50% of the standard first classification result sequence into one class, and classifying the rest resource data into one class to obtain the corresponding second classification result sequence.

S4, screening the repeated data of the second classification result sequence, and performing index reduction calculation on the screened repeated data to obtain a third classification result sequence;

in the embodiment of the invention, in order to prevent the narrow data display type caused by the repeated or similar resource data bundled display, the repeated data screening is performed on the second classification result sequence, and the exponential reduction calculation is performed on the repeated data in the screened second classification result sequence to obtain a third classification result sequence, wherein the repeated data comprises the same or similar data.

In detail, in the embodiment of the present invention, the screening of the duplicate data of the second classification result sequence, and performing exponential score reduction calculation on the screened duplicate data to obtain a third classification result sequence includes: coding each resource data in the second classification result sequence by using a preset algorithm to obtain a data code corresponding to each resource data; calculating the text distance of any two data codes in all the data codes corresponding to the second classification result sequence; determining the text distance smaller than a preset threshold value as a similar text distance; performing association classification on all resource data corresponding to the similar text distance in the second classification result sequence to obtain a repeated data list; and performing exponential reduction calculation on the resource data in the repeated data list corresponding to the second classification result sequence to obtain the third classification result sequence.

Optionally, in the embodiment of the present invention, the preset algorithm is a simhash algorithm,

in detail, in the embodiment of the present invention, the associating and classifying the resource data corresponding to the distance between all similar texts in the second classification result sequence to obtain the repeated data list includes: performing tree classification by taking resource data corresponding to all similar text distances in the second classification result sequence as nodes to obtain a classification tree; and sequencing all the resource data corresponding to the classification tree according to the corresponding relevance scores of all the resource data to obtain a corresponding repeated data list. For example: the text distance of A and B is similar; the text distance of A and C is similar distance; and B and E have similar text distance, taking A as the node of the first layer of the classification number, B, C as the node of the second layer of the classification tree, and taking E as the node of the third layer of the classification tree to construct a corresponding classification tree.

Further, in order to avoid bundling up the duplicate data, in the embodiment of the present invention, performing exponential score reduction calculation on the resource data in the duplicate data list includes: and performing exponential calculation on the preset sorting position in the repeated data list corresponding to the second sorting result sequence and the relevance scores corresponding to all the subsequent resource data to obtain the corresponding updated relevance scores.

Further, in the embodiment of the present invention, the updated relevancy score is used to replace the corresponding relevancy score, so as to obtain the third classification result sequence.

Optionally, the preset sorting position is a second.

Optionally, in the embodiment of the present invention, the following formula is used to perform the index calculation:

N＝a^lgi*_i

wherein a is a predetermined sorting parameter, preferably a is 0.5, C_iAnd obtaining a correlation score corresponding to the ith resource data in the repeated data list, wherein i is the sequencing number of the resource data in the repeated data list, and N is the updated correlation score of the ith resource data in the repeated data list.

Optionally, in the embodiment of the present invention, the text distance between two data codes is a hamming distance between two corresponding data codes.

S5, sorting all the resource data in the third classification result sequence according to the corresponding relevance scores of each resource data to obtain a target query result sequence;

in the embodiment of the present invention, as can be seen from the above, there are a plurality of second classification result sequences, and therefore, there are a plurality of third classification result sequences, further, in the embodiment of the present invention, all resource data in the third classification result sequence are sorted according to the correlation score corresponding to each resource data, so as to obtain a target query result sequence, for example: and two third classification result sequences are provided, wherein one third classification result sequence comprises resource data A and resource data B, the relevance score of A is 10, the relevance score of B is 8, the other third classification result sequence comprises resource data C and resource data D, the relevance score of C is 9, and the relevance score of C is 7, so that all the resource data in the third classification result sequence are A, B, C, D, and the target query result sequence obtained by sequencing all the resource data in the third classification result sequence according to the relevance scores is [ A, C, B, D ].

And S6, sending the target query result sequence to the terminal equipment corresponding to the query request.

In detail, in the embodiment of the present invention, the target query result sequence is sent to a terminal device corresponding to the query request, where the terminal device includes: intelligent terminals such as computer, panel, cell-phone, for example: and the user initiates a query request on the mobile phone A, and then the target query result sequence is sent to the mobile phone A, so that the user can conveniently check the target query result sequence.

Fig. 2 is a functional block diagram of a data sorting apparatus based on duplicate data filtering according to the present invention.

The data sorting device 100 based on repeated data screening according to the present invention may be installed in an electronic device. According to the implemented functions, the data sorting device based on repeated data screening may include a data classification module 101, a data screening module 102, and a data sorting module 103, which may also be referred to as a unit, and refer to a series of computer program segments that can be executed by a processor of an electronic device and can perform a fixed function, and are stored in a memory of the electronic device.

In the present embodiment, the functions regarding the respective modules/units are as follows:

the data classification module 101 is configured to perform relevance screening and sorting on a preset resource data set according to a received query request, so as to obtain a query result sequence; performing label classification on the query result sequence to obtain a first classification result sequence; carrying out relevancy classification on the first classification result sequence to obtain a second classification result sequence;

In detail, in the embodiment of the present invention, the data classification module 101 extracts a query field in the query request, and converts the query field into a vector to obtain a query vector; converting each resource data in the resource data set into a vector to obtain a corresponding resource vector; calculating the relevance of the query vector and the resource vector to obtain a corresponding relevance score; performing data screening on the resource data set according to the relevancy scores to obtain an initial query result sequence; and sequencing all resource data in the initial query result sequence according to the corresponding relevance scores to obtain the query result sequence.

Optionally, in the embodiment of the present invention, the data classification module 101 may perform vector transformation by using a Word2vec model formed by migration learning training based on a preset text (such as a textbook and training data) of knowledge in a professional field. Further, in the embodiment of the present invention, the data classification module 101 filters the resource data in the resource data set, where the relevancy score is greater than a preset relevancy value, to obtain the initial query result sequence.

Optionally, the data classification module 101 according to the embodiment of the present invention may calculate the correlation using the following formula:

In the embodiment of the present invention, the first classification result sequence includes resource data of different attribute categories, and in order to better sort the resource data in the query result sequence, the data classification module 101 performs label classification on the query result sequence to obtain a first classification result sequence.

In detail, in the implementation of the present invention, the tag classification of the query result sequence by the data classification module 101 includes: performing label marking on each resource data in the query result sequence by using a preset label classification model to obtain a label query result sequence; and classifying all resource data in the label query result sequence according to different labels to obtain the corresponding first classification result sequence.

In detail, before the data classification module 101 performs label marking on each resource data in the query result sequence by using a preset label classification model in the embodiment of the present invention, the method further includes: acquiring a historical resource data set; in one application scenario of the present invention, the preset label may include: information item labels, credit card product labels, insurance product labels, shopping category labels, preferential event labels, etc.; and performing iterative training on the pre-constructed deep learning model by using the training set to obtain the label classification model.

Optionally, in another embodiment of the present invention, each resource data in the first classification result sequence includes a corresponding tag, and it is not necessary to label each resource data in the query result sequence by using a preset tag classification model, and the data classification module 101 summarizes the resource data corresponding to the same resource class tag in the first classification result sequence to obtain a corresponding first classification result sequence.

In the embodiment of the present invention, each resource data in the first classification result sequence has a relevance score, and in order to ensure that the ordering of all resource data in the first classification result sequence is more balanced, and prevent the following resource data with a high relevance score from interfering with the resource data with a low relevance score and affecting the accuracy of data ordering, the data classification module 101 in the embodiment of the present invention further performs relevance classification on the first classification result sequence to obtain a second classification result sequence.

In detail, in the embodiment of the present invention, the performing, by the data classification module 101, relevance classification on the first classification result sequence includes: and constructing a score interval according to the query result sequence, and classifying the first classification result sequence by using the score interval to obtain the second classification result sequence.

Further, in the embodiment of the present invention, the data classification module 101 constructs a score interval according to the query result sequence; the method comprises the following steps: screening the maximum correlation score and the minimum correlation score of the query result sequence, and carrying out average calculation on the maximum correlation score and the minimum correlation score to obtain an average correlation score; and constructing two continuous intervals by taking the maximum relevance score, the minimum relevance score and the average relevance score as interval endpoint values to obtain the score interval. For example: in an embodiment of the present invention, each first classification result sequence corresponds to a plurality of second classification result sequences, and the number of the second classification result sequences corresponding to each first classification result sequence is determined by the number of intervals included in the score interval, for example, a branch interval includes two intervals, and then the number of the second classification result sequences corresponding to each first classification result sequence is 2.

In another embodiment of the present invention, the performing, by the data classification module 101, correlation classification on the first classification result sequence includes: sequencing all resource data in the first classification result sequence in sequence according to the corresponding relevance scores to obtain a standard first classification result sequence; and classifying the data in the standard first classification result sequence according to a preset sequencing percentage to obtain the second classification result sequence. Such as: and if the preset sequencing percentage is 50%, and the standard first classification result sequence has 10 resource data in total, classifying the resource data sequenced in the first 50% of the standard first classification result sequence into one class, and classifying the rest resource data into one class to obtain the corresponding second classification result sequence.

The data screening module 102 is configured to perform repeated data screening on the second classification result sequence, and perform index reduction calculation on the screened repeated data to obtain a third classification result sequence;

in the embodiment of the present invention, in order to prevent a narrow data display type caused by repeated or similar resource data bundled display, the data screening module 102 performs repeated data screening on the second classification result sequence, and performs exponential reduction calculation on the repeated data in the screened second classification result sequence to obtain a third classification result sequence, where the repeated data includes the same or similar data.

In detail, in the embodiment of the present invention, the data filtering module 102 performs repeated data filtering on the second classification result sequence, and performs exponential reduction calculation on the filtered repeated data to obtain a third classification result sequence, where the method includes: coding each resource data in the second classification result sequence by using a preset algorithm to obtain a data code corresponding to each resource data; calculating the text distance of any two data codes in all the data codes corresponding to the second classification result sequence; determining the text distance smaller than a preset threshold value as a similar text distance; performing association classification on all resource data corresponding to the similar text distance in the second classification result sequence to obtain a repeated data list; and performing exponential reduction calculation on the resource data in the repeated data list corresponding to the second classification result sequence to obtain the third classification result sequence.

in detail, in the embodiment of the present invention, the data screening module 102 performs association classification on all resource data corresponding to the distance between similar texts in the second classification result sequence to obtain a repeated data list, including: performing tree classification by taking resource data corresponding to all similar text distances in the second classification result sequence as nodes to obtain a classification tree; and sequencing all the resource data corresponding to the classification tree according to the corresponding relevance scores of all the resource data to obtain a corresponding repeated data list. For example: the text distance of A and B is similar; the text distance of A and C is similar distance; and B and E have similar text distance, taking A as the node of the first layer of the classification number, B, C as the node of the second layer of the classification tree, and taking E as the node of the third layer of the classification tree to construct a corresponding classification tree.

Further, to avoid bundling up the duplicate data, in an embodiment of the present invention, the data screening module 102 performs an exponential score reduction calculation on the resource data in the duplicate data list, including: and performing exponential calculation on the preset sorting position in the repeated data list corresponding to the second sorting result sequence and the relevance scores corresponding to all the subsequent resource data to obtain the corresponding updated relevance scores.

Further, in the data screening module 102 of the embodiment of the present invention, the updated relevancy score is used to replace the corresponding relevancy score, so as to obtain the third classification result sequence.

Optionally, the preset sorting position is a second.

Optionally, in the embodiment of the present invention, the data screening module 102 performs index calculation by using the following formula:

N＝a^lgi*C_i

wherein a is a predetermined sorting parameter, preferably a is 0.5, C_iIn the repeated data listAnd the correlation score corresponding to the ith resource data is represented by i as the sequencing number of the resource data in the repeated data list, and N is the updated correlation score of the ith resource data in the repeated data list.

The data sorting module 103 is configured to sort all the resource data in the third classification result sequence according to the corresponding relevance scores of each resource data, so as to obtain a target query result sequence; and sending the target query result sequence to the terminal equipment corresponding to the query request.

In the embodiment of the present invention, as can be seen from the above, there are a plurality of second classification result sequences, and therefore, there are a plurality of third classification result sequences, further, the data sorting module 103 in the embodiment of the present invention sorts all resource data in the third classification result sequence according to the corresponding relevance scores of each resource data, so as to obtain a target query result sequence, for example: and two third classification result sequences are provided, wherein one third classification result sequence comprises resource data A and resource data B, the relevance score of A is 10, the relevance score of B is 8, the other third classification result sequence comprises resource data C and resource data D, the relevance score of C is 9, and the relevance score of C is 7, so that all the resource data in the third classification result sequence are A, B, C, D, and the target query result sequence obtained by sequencing all the resource data in the third classification result sequence according to the relevance scores is [ A, C, B, D ].

Fig. 3 is a schematic structural diagram of an electronic device implementing the data sorting method based on repeated data screening according to the present invention.

The electronic device may include a processor 10, a memory 11, a communication bus 12, and a communication interface 13, and may further include a computer program, such as a data sorting program based on data duplication filtering, stored in the memory 11 and executable on the processor 10.

The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, for example a removable hard disk of the electronic device. The memory 11 may also be an external storage device of the electronic device in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used to store not only application software installed in the electronic device and various types of data, such as codes of a data sorting program based on data duplication filtering, but also temporarily store data that has been output or is to be output.

The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules (e.g., data sorting programs based on data filtering, etc.) stored in the memory 11 and calling data stored in the memory 11.

The communication bus 12 may be a PerIPheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The bus may be divided into an address bus, a data bus, a control bus, etc. The communication bus 12 is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

Fig. 3 shows only an electronic device having components, and those skilled in the art will appreciate that the structure shown in fig. 3 does not constitute a limitation of the electronic device, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.

For example, although not shown, the electronic device may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management and the like are realized through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

Optionally, the communication interface 13 may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which is generally used to establish a communication connection between the electronic device and other electronic devices.

Optionally, the communication interface 13 may further include a user interface, which may be a Display (Display), an input unit (such as a Keyboard (Keyboard)), and optionally, a standard wired interface, or a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

The data sorting program based on repeated data filtering stored in the memory 11 of the electronic device is a combination of a plurality of computer programs, and when running in the processor 10, can realize:

Specifically, the processor 10 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the computer program, which is not described herein again.

Further, the electronic device integrated module/unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. The computer readable medium may be non-volatile or volatile. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).

Embodiments of the present invention may also provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor of an electronic device, the computer program may implement:

Further, the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A data sorting method based on repeated data screening is characterized by comprising the following steps:

2. The data sorting method based on repeated data screening according to claim 1, wherein the performing correlation screening sorting on a preset resource data set according to the query request to obtain a query result sequence comprises:

3. The data sorting method based on repeated data screening as claimed in claim 1, wherein the classifying the correlation degree of the first classification result sequence to obtain a second classification result sequence comprises:

constructing a score interval according to the query result sequence;

4. The data sorting method based on repeated data screening as claimed in claim 3, wherein the constructing score intervals according to the query result sequences comprises:

5. The data sorting method based on the repeated data screening as claimed in any one of claims 1 to 4, wherein the performing the repeated data screening on the second classification result sequence and performing the exponential reduction calculation on the screened repeated data to obtain a third classification result sequence comprises:

6. The data sorting method based on repeated data screening according to claim 5, wherein the step of performing the associative classification on all the resource data corresponding to the similar text distance in the second classification result sequence to obtain the repeated data list comprises:

7. The data sorting method based on the repeated data screening as claimed in any one of claim 5, wherein the performing an exponential drop computation on the resource data in the repeated data list corresponding to the second sorting result sequence to obtain the third sorting result sequence includes:

8. A data sorting method based on repeated data screening is characterized by comprising the following steps:

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of data sorting based on duplicate data screening of any of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements a data sorting method based on duplicate data screening according to any of claims 1 to 7.