CN109190089B - Probability comprehensive ordering method - Google Patents

Probability comprehensive ordering method Download PDF

Info

Publication number
CN109190089B
CN109190089B CN201811035247.1A CN201811035247A CN109190089B CN 109190089 B CN109190089 B CN 109190089B CN 201811035247 A CN201811035247 A CN 201811035247A CN 109190089 B CN109190089 B CN 109190089B
Authority
CN
China
Prior art keywords
data set
line
sorting
comparison
lines
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811035247.1A
Other languages
Chinese (zh)
Other versions
CN109190089A (en
Inventor
李园白
杨阳
刘方舟
王静
王琳
张一颖
李萌
杜昱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute Of Information On Traditional Chinese Medicine Cacms
Original Assignee
Institute Of Information On Traditional Chinese Medicine Cacms
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute Of Information On Traditional Chinese Medicine Cacms filed Critical Institute Of Information On Traditional Chinese Medicine Cacms
Priority to CN201811035247.1A priority Critical patent/CN109190089B/en
Publication of CN109190089A publication Critical patent/CN109190089A/en
Application granted granted Critical
Publication of CN109190089B publication Critical patent/CN109190089B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/06Arrangements for sorting, selecting, merging, or comparing data on individual record carriers
    • G06F7/08Sorting, i.e. grouping record carriers in numerical or other ordered sequence according to the classification of at least some of the information they carry

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Algebra (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of computers, and provides a probability comprehensive sequencing method. The method mainly comprises the following steps: decomposing the previous experiment result into a sequencing line only containing two comparison elements; counting the repetition frequency of each sequencing line; and selecting a starting sorting line to circularly calculate the position accuracy of each comparison element until an optimal sorting line formed by all the comparison elements with the highest position accuracy is obtained. The invention combines a plurality of sequencing lines by using a correct probability mode for the first time, the sequencing lines can not contain all elements, and the situation that the sequencing sequence of partial elements is inconsistent can occur among different sequencing lines, abandons the method of exhausting all possible sequencing lines and then performing correct probability comparison, adopts a highest correct probability screening mode to gradually display the sequencing lines with the highest correct probability, and finally performs rechecking of related elements on the sequencing lines with the high probability, thereby greatly reducing the calculated amount and having higher technical result accuracy.

Description

Probability comprehensive ordering method
Technical Field
The invention relates to the technical field of computers, in particular to a probability comprehensive sequencing method.
Background
In the scientific research experiment process, some scientific research experiments can obtain some sequencing results, and if the sequencing results of a plurality of experiments can be put together for comprehensive utilization, a comprehensive sequencing result is formed, so that the scientific research significance is certain.
According to the conventional sequencing method, element measured values in all experiments are put together for sequencing, but due to different experimental conditions, different experimental measuring instruments and different experimental methods, although different research experiments are carried out on the same experimental object, the obtained experimental results are different, the magnitude values obtained by measuring the same experimental object by different experiments are quite different and difficult to accept and reject, and the comprehensive sequencing is carried out by the measured values only because some differences can reach the magnitude order, so that the method is extremely inaccurate. For example, a scientific research experiment relates to ingredient determination, and the experiment is a comparison result of determining the content of the effective component A of the Chinese medicine X in a plurality of provinces, and the effective component A of the Chinese medicine X in which the province is the highest in content is expected to be seen, and the quality of the medicinal material is the best. Some experiments determine that the effective components of the X traditional Chinese medicine are in different provinces, some experiments determine that the effective components of the X traditional Chinese medicine are in the first, second and third provinces, some experiments determine that the effective components of the X traditional Chinese medicine are in the second, third and fourth provinces, and some experiments determine that the effective components of the X traditional Chinese medicine are in the first, second and third provinces. When a researcher wants to obtain the comparison results of all the provinces of the content of the effective component A in the traditional Chinese medicine X, for example, which province is the highest or the next highest, the sorting is extremely inaccurate if the sorting is carried out according to the size of the measured values directly given by experiments (the experiment conditions are different). Because the content of the effective component A of the traditional Chinese medicine X is measured in a certain experiment, the measured values of the medicine A, the medicine B and the medicine C are respectively 0.6mg/ml,0.5mg/ml and 0.4mg/ml, while the measured values of the medicine A, the medicine B and the medicine C are respectively 10mg/ml,9mg/ml and 8mg/ml in another experiment, and if the medicine A and the medicine B are directly sorted according to the measured values, an accurate comprehensive sorting result cannot be obtained.
In a single experiment, due to consistent experimental conditions, the sequencing result of a single experiment containing a part of provinces is accurate, for example: a certain experiment obtains A, B and C, and another experiment obtains D, B and A. If the sequencing of the effective components A of the Chinese medicine X of all provinces is required to be obtained, a sequencing method is required to integrate the sequencing results of all experiments, the sequencing method is not limited by different experimental condition results of all researches, and the problem of different sequencing results among different experimental provinces can be solved.
Disclosure of Invention
The invention aims to provide a comprehensive probability sequencing method, which can comprehensively utilize different scientific research experiment sequencing results and provide a sequencing result with higher accuracy.
In order to solve the technical problem, the invention provides a probability comprehensive sequencing method, which comprises the following steps:
s1: defining a data set formed by sequencing results of previous experiments as a data set P, decomposing each sequencing line in the data set P into sequencing lines only containing two comparison elements, and defining a data set formed by all the sequencing lines only containing two comparison elements as a data set Q;
s2: counting the repetition frequency of each sequencing line in the data set Q;
s3: taking the sorting line with the highest occurrence frequency in the data set Q as a starting sorting line, and taking the comparison element Q in the starting sorting line as a starting sorting line 1 And a comparison element q 2 Adding newly appeared comparison elements q in the subsequent sorting line one by one as a basis n Comparing said element q n And comparison element q 1 ~q n-1 Combining, listing the elements comprising comparison q 1 ~q n-1 N ordering lines constituting a data set M, wherein the comparison element q n The positions in each sorting line of the data set M are different, n is a positive integer, and the maximum value is the number of comparison elements in the data set Q;
s4: decomposing each sorting line in the data set M to contain only the comparison elements q according to the method in step S1 n And said comparison element q 1 ~q n-1 N-1 sorting lines of any one of the above, wherein n groups of n-1 sorting lines in the data set M constitute a data set R;
s5: respectively searching the sorting lines in the data set R in the data set Q, comparing the sorting relation between the sorting lines in the data set Q and the sorting lines in the data set R, and marking the correct frequency or the error frequency of the sorting lines in the data set R according to the comparison result;
s6: separately calculating said comparison elements q n In the above-mentionedThe position accuracy in each sorting line of the data set M is calculated by the formula: the sum of correct frequencies of each group of sort lines in the data set R/the sum of correct frequencies and error frequencies of each group of sort lines in the data set R is 100%;
s7: selecting the comparison element q n The sorting line with the highest position accuracy rate in the sorting lines of the data set M is used as a calculation comparison element q n+1 And (4) returning to the step (S3) by the starting point sorting line of the position accuracy, repeating the steps (S3) to (S6) to obtain the comparison element q n+1 And circularly executing the sorting line with the highest position accuracy until the optimal sorting line with the highest position accuracy of all the comparison elements is obtained.
Further, the step S1 further includes sorting the data set P into sorting lines P n Is combined according to the order of any two comparison elements in the sorting line p n The sorting relation in (2) is sorted to obtain a sorting line only containing the arbitrary two comparison elements.
Further, the step S5 further includes searching the data set Q for the sorting lines in the data set R respectively, and if the first sorting line in the data set R is the same as the first sorting line in the data set Q, marking the occurrence frequency of the first sorting line in the data set Q as the correct frequency of the first sorting line in the data set R; if a second sort line in the data set R is opposite to a second sort line in the data set Q, labeling a frequency of occurrence of the second sort line in the data set Q as a frequency of errors of the second sort line in the data set R; if a third sort line in the data set R has neither the same sort line nor the opposite sort line in the data set Q, then the correct frequency for the third sort line in the data set R is noted as 0.
Further, in the step S6, if the comparison element q is n If the position accuracy rates in more than 2 sorting lines of the data set M are the same and the sorting line is the highest position accuracy rate, taking all the more than 2 sorting lines as the starting sorting line, and continuously calculating the next newly appeared comparison element q n+1
Further, after the step S7, the method further includes a step S8: and calculating the average position accuracy of all comparison elements in the optimal sorting line, summing the position accuracy of each comparison element in the optimal sorting line, and dividing by the number of the comparison elements to obtain the average position accuracy of the optimal sorting line.
Further, after the step S8, the method further includes a step S9: rechecking the position accuracy of the comparison elements in the optimal sorting line, and searching all the data sets Q containing the comparison elements Q n Will each contain said comparison element q n Is compared with the optimal sorting line, and the comparison element q is marked according to the comparison result of the sorting relation n The correct frequency or the wrong frequency.
Further, the step S9 further includes: if the comparison result is that the ordering relations are the same, the comparison element q is included n The frequency of occurrence of the sorting line of (a) in the data set Q is labeled as the comparison element Q n The position correct frequency in the optimal sequencing line; if the comparison result is that the ordering relation is opposite, the comparison element q is included n The frequency of occurrence of the sorting line of (a) in the data set Q is labeled as the comparison element Q n The frequency of position errors in the optimal sequencing line; if the data set Q contains the comparison element Q n Does not appear in the optimal sorting line, the comparison element q is compared n The position accuracy in the optimal sorting line is labeled as 0.
Further, step S10 is further included after step S9: and calculating the average position accuracy after rechecking of all comparison elements in the optimal sorting line.
The technical scheme of the invention has the following advantages: the invention provides a solution to the problem of the integration of a plurality of sequencing lines, and combines the plurality of sequencing lines by using a correct probability mode for the first time, wherein the sequencing lines do not contain all elements, and the situation that the sequencing order of part of elements is inconsistent can occur among different sequencing lines. In the implementation process of the invention, a method for exhausting all possible sequencing lines and then performing correct probability comparison is abandoned, the highest correct probability screening mode is used, the sequencing line with the highest correct probability is displayed step by step, and finally the relevant elements are rechecked for the sequencing line with the highest correct probability, so that the calculated amount is greatly reduced, and the accuracy of the sequencing result is consistent with the effect of exhausting all the sequencing lines.
Drawings
FIG. 1 is a flow chart diagram of a probability synthesis ranking method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
FIG. 1 is a flow chart diagram of a probability synthesis ranking method of the present invention. As shown in fig. 1, the method for comprehensively ranking probabilities of the present invention includes the following steps:
s1: and defining a data set consisting of sequencing results of previous experiments as a data set P, decomposing each sequencing line in the data set P into sequencing lines only containing two comparison elements, and defining a data set consisting of all the sequencing lines only containing two comparison elements as a data set Q.
In step S1, a data set formed by the sorting results of the previous experiment is defined as a data set P, each sorting line in the data set P is decomposed into sorting lines containing only two comparison elements, and a data set formed by all sorting lines containing only two comparison elements is defined as a data set Q. The sorting line in the data set Q is the same sorting line P in the data set P by decomposing the sorting line in the data set P n According to which any two comparison elements are combined on the sorting line p n The sequencing relation in (1) is sequenced to obtainA sort line is drawn that contains only the arbitrary two comparison elements.
For example, there are 6 sort lines in the data set P, each P 1 :A>C>B>D、p 2 :A>B>D>C>Y、p 3 :A>B>Y、p 4 :Y>D、p 5 :A>D>C>Y、p 6 :A>B>D>C, the goal is to expect an ordering of 5 elements with respect to ACBDY. Arbitrarily select the sorting line p 1 :A>C>B>Two of D compare elements A and C, sorting line p by elements A and C 1 The sorting relation in (2) then forms a sorting line: a. The>C, selecting comparison elements A and B in the same way, and sorting the elements on the line p according to the elements A and B 1 The ordering relationship in (1) forms an ordering line: a. The>B, repeating the steps in the same way, and sorting the line p 1 :A>C>B>D, 5 sorting lines are formed after decomposition as shown in Table 1.
TABLE 1
Serial number 1 2 3 4 5
Sorting line A>C A>B A>D C>B B>D
S2: the repetition frequency of each sequencing line in data set Q is counted.
In step S2, the repeated sorting lines in the data set Q are summarized, and the repetition frequency of each sorting line is counted. For example, after all 6 sorting lines in the data set P are decomposed, 13 sorting lines are collected, and the repetition frequency of each sorting line is counted, as shown in table 2.
TABLE 2
Serial number 1 2 3 4 5 6 7 8 9 10 11 12 13
Sorting line A>B A>D A>C D>C B>D A>Y C>Y B>C B>Y D>Y C>B C>D Y>D
Frequency of 4 4 4 3 3 3 2 2 2 2 1 1 1
S3: taking the sorting line with the highest occurrence frequency in the data set Q as a starting sorting line, and taking the comparison elements Q in the starting sorting line as the starting sorting line 1 And a comparison element q 2 Adding newly appeared comparison elements q in the subsequent sorting line one by one as a basis n Comparing the elements q n And a comparison element q 1 ~q n-1 Combining, listing the elements comprising comparison q 1 ~q n N sorting lines, which make up the data set M, wherein the comparison elements q n The positions in each sort line of the data set M are all different, n is a positive integer, and the maximum value is the number of comparison elements in the data set Q.
In step S3, the ranking line is started from the ranking line with the highest frequency of occurrence in the data set Q, for example, in table 2, the ranking line is started from a > B with 4 times of occurrence, based on the comparison elements a and B, the newly-appeared comparison element in the subsequent ranking line is gradually added, such as the newly-appeared comparison element D of the second ranking line, the newly-appeared comparison element D is combined with the previous comparison elements a and B, and 3 ranking lines (1) >, B >, D, (2) >, a >, D B and (3) >, a > B containing the comparison elements a, B and D are listed, wherein the positions of the comparison element D in each ranking line are different, that is, all possible positions of the comparison element D are listed. Data set M was composed of the 3 ranking lines (1) >. Where n is a positive integer and the maximum is the number of comparison elements in the data set Q.
S4: decomposing each sorting line in the data set M into a sorting line containing only the comparison elements q according to the method in step S1 n And a comparison element q 1 ~q n-1 N-1 sorting lines of any one of the above items, and n groups of n-1 sorting lines in the data set M constitute a data set R.
In step S4, each sort line in the data set M is decomposed to contain only q according to the method in step S1 n And q is 1 ~q n-1 N-1 sorting lines of any one of them, and n groups of n-1 sorting lines. For example, compare 3 sort lines (1) of elements A, B and D>B>D、②A>D>B and (3)>A>B is decomposed into 3 sets of sorting lines according to the method in step S1, each set of sorting lines comprising 2 sorting lines, each sorting line comprising only the comparison element D and either one of a, B, as shown in table 3. The 3 x 2 sort lines in table 3 make up data set R.
TABLE 3
Figure BDA0001790696370000061
S5: and respectively searching the sequencing lines in the data set R in the data set Q, comparing the sequencing relation of the sequencing lines in the data set Q and the sequencing lines in the data set R, and marking the correct frequency or the error frequency of the sequencing lines in the data set R according to the comparison result.
In step S5, the data set Q is searched for the sorting lines in the data set R, and if the first sorting line in the data set R is the same as the first sorting line in the data set Q, the occurrence frequency of the first sorting line in the data set Q is labeled as the correct frequency of the first sorting line in the data set R, for example, the data set Q, that is, table 2, is searched for the sorting line a > D in table 3, and if the same sorting line is found, the occurrence frequency thereof is 4, the correct frequency of the sorting line a > D in the data set R is labeled as 4.
If the second sort line in data set R is opposite to the second sort line in data set Q, then the frequency of occurrence of the second sort line in data set Q is labeled as the frequency of errors in the second sort line in data set R, e.g., data set R, sort line D > B in table 3, is searched in data set Q, table 2, and the frequency of occurrence of the opposite sort line B > D is found to be 3, then the frequency of errors in sort line D > B in data set R is labeled as 3.
If the third sort line in data set R has neither the same sort line nor the opposite sort line in data set Q, then the correct frequency for the third sort line in data set R is noted as 0. I.e., the ordering line in data set R is not present in data set Q, indicating that such a result has not been present in the experiment, so its correct frequency order may be labeled 0.
After all the sorting lines in the data set R are searched, the results shown in table 4 can be obtained, and the correct and error frequencies of 3 × 2 sorting lines are listed in table 4.
TABLE 4
Figure BDA0001790696370000071
S6: calculating the comparison elements q separately n The position accuracy in each sorting line of the data set M is calculated by the following formula: sum of correct and error frequencies for each set of sort lines in dataset R/sum of correct and error frequencies for each set of sort lines in dataset R100%.
In step S6, the comparison elements q are each calculated n The position accuracy in each sorting line of the data set M is calculated by the following formula: sum of correct and error frequencies for each set of sorted lines in dataset R/sum of correct and error frequencies for each set of sorted lines in dataset R100%. For example, in Table 4, comparison element D is on 3 sort lines (1) of dataset M>B>D、②A>D>B and (3)>A>B are all different, and according to the above formula, the comparison element D is on the sorting line (1)>B>The position accuracy in D can be calculated as (correct 4 + correct 3)/(correct 4 + correct 3) × 100% =100%; comparison element D on sequencing line (2)>D>The position accuracy in B can be calculated as correct 4 times/(correct 4 times + error 3 times) × 100% =57.14%; comparison element D on the sorting line (3)>A>The position accuracy in B can be calculated as 0 times correct/(4 times wrong + 3 times wrong) × 100% =0. The calculation results are shown in table 5.
TABLE 5
Figure BDA0001790696370000072
S7: selecting a comparison element q n The sorting line with the highest position accuracy rate in the sorting lines of the data set M is used as a calculation comparison element q n+1 The starting point sorting line of the position accuracy returns to the step S3, the steps S3 to S6 are repeated, and the comparison element q is obtained n+1 And circularly executing the sorting line with the highest position accuracy until the optimal sorting line with the highest position accuracy of all the comparison elements is obtained.
In step S7, a comparison element q is selected n The sorting line having the highest position accuracy among the sorting lines of the data set M, for example, 3 sorting lines (1) of the data set MA>B>D、②A>D>B and (3)>A>In B, the comparison element D is on the sequencing line (1)>B>D the highest position accuracy rate is 100%, so the sorting line (1) is selected>B>D is used as the starting sorting line for calculating the next new comparison element C. And returning to the step S3, calculating the next newly appeared comparison element C, repeating the steps S3-S6, and obtaining the sorting line with the highest position accuracy of the comparison element C until obtaining the optimal sorting line with the highest position accuracy of all the comparison elements.
For example, in the above example, the position correctness rates of the comparison elements C and Y may be continuously calculated, where the ranking line with the highest position correctness rate of the comparison element D is the starting point of a > B > D, the comparison element C and the ranking line a > B > D may be combined into 4 ranking lines (1): a > (B > (C), (2) > (B > (C >) (D), (3) > (C > (B >) (D), (4) > (C >) (B >) (D), the ranking line with the highest position correctness rate of the comparison element C is calculated by the above method, the position correctness rate of the comparison element Y is calculated by analogy, and finally, the optimal ranking line of 5 comparison elements is obtained as a > B > D > C > Y, and the position correctness rate of each comparison element is shown in table 6.
TABLE 6
Comparison element A B D C Y
Position accuracy rate 100.00% 100.00% 100.00% 81.82% 90.00%
In the embodiment, the sequencing line which only contains two comparison elements at the highest frequency is taken as the starting sequencing line, and the sequencing line is expanded by calculating the accuracy of each element in the sequencing line added with the new element until all the related elements participate in sequencing, so that the final sequencing line with the highest accuracy is formed. In the implementation process, a comparison method of exhausting all possible sequencing lines and then performing correct probability is abandoned, and the sequencing line with the highest correct probability is displayed step by adopting a mode of screening the highest correct probability.
In step S6, if element q is compared n If the position accuracy rates in more than 2 sorting lines of the data set M are the same and the position accuracy rate is the highest position accuracy rate, all the more than 2 sorting lines are taken as the starting sorting line, and the next newly appeared comparison element q is continuously calculated n+1 . I.e. when the comparison element q is present n When the position accuracy in more than 2 sorting lines of data set M is the same, for example, comparison element D is in 3 sorting lines of data set M (1)>B>D、②A>D>B and (3)>A>In B, sorting line (1)>B>D and (2)>D>B has the same position accuracy rate and the highest position accuracy rate, then the sorting line (1) A>B>D and (2)>D>B are all reserved, respectively in the ordering line (1)>B>D and (2)>D>B is a starting point sorting line, and the next newly appeared comparison element C is continuously calculated. Because the starting point sorting lines are more than 1, the finally obtained optimal sorting lines are also more than 1, and if the starting point sorting lines are added for many times in the process of calculating each comparison element, the finally obtained optimal sorting lines are also more in number.
Step S8 is also included after step S7: and calculating the average position accuracy of all comparison elements in the optimal sorting line.
In step S8, the average position accuracy of all the comparison elements in the optimal sorting line is calculated, the position accuracy of each comparison element in the optimal sorting line is summed, and then divided by the number of the comparison elements, so as to obtain the average position accuracy of the optimal sorting line. And when the optimal sequencing lines are multiple, calculating the average position accuracy of all the optimal sequencing lines, and sequencing and comparing according to the numerical value. For example, the average positional accuracy in table 6 above is (100.00% +100.00% +100.00% +81.82% + 90.00%)/5= 94.36%.
When each comparison element is calculated, a plurality of starting point sorting lines may be selected, for example, when the starting point sorting line at the beginning is selected, 3 sorting lines with the highest frequency appear in the data set Q, which are respectively a > B, a > D, and a > C, the 3 sorting lines may be used as the starting point sorting lines for subsequent calculation, and meanwhile, the starting point sorting lines are added when the next comparison element is calculated under the condition that the position accuracy rate possibly appears in the calculation process is the same, so that a plurality of optimal sorting lines can be obtained through calculation for the plurality of starting point sorting lines. By calculating the average position accuracy of all comparison elements in the optimal sorting line, the multiple optimal sorting lines can be further sorted respectively when the multiple optimal sorting lines exist.
The position accuracy of each comparison element is calculated from the highest-frequency ranking item, the ranking lines are continuously perfected from 2 items to a plurality of items, only the preamble comparison element of each newly added comparison element is considered in the calculation process, and the method can list the related ranking lines without a large amount of calculation. If all the elements should participate in the calculation of the position accuracy of one element from a more comprehensive point of view, not only the preorder item but also the subsequent orderer items should participate in the calculation.
Thus, step S9 is also included after step S8: rechecking the position accuracy of the comparison elements in the optimal sorting line, and searching all the comparison elements Q in the data set Q n Will each contain a comparison element q n Is compared to the optimal sort line. Such asIf the compared result is that the ordering relation is the same, the strip contains a comparison element q n The frequency of occurrence of the sort line of (a) in the data set Q is labeled as the comparison element Q n The position in the optimal sorting line is correct, if the comparison result shows that the sorting relation is opposite, the optimal sorting line contains a comparison element q n The frequency of occurrence of the sort line of (a) in the data set Q is labeled as the comparison element Q n Frequency of position errors in the optimal sorting line if the strip of data set Q contains a comparison element Q n The ordering relation of the ordering line of (a) does not appear in the optimal ordering line, the element q is compared n The position accuracy in the optimal sorting line is labeled 0.
This step is mainly to re-check the position accuracy of each element and the average position accuracy of all comparison elements, for example, the optimal ranking line is a > B > D > C > Y, refer to the occurrence frequency of the ranking line of the data set Q in table 2, search all the ranking lines related to the comparison element a, and obtain the result shown in table 7, so that the position accuracy re-check result of the comparison element a is a = correct frequency 15/total frequency 15 =100%.
TABLE 7
2 ordering items containing A Frequency of use Compare to the optimal sort line
A>B 4 Is accurate to
A>C 4 Is accurate to
A>D 4 Is accurate to
A>Y 3 Is accurate to
Is totaled 15 Correct sum of 15
Similarly, if the position accuracy of the comparison element B is rechecked, and the result is shown in table 8, the position accuracy of the comparison element B is rechecked as B = correct frequency 11/total frequency 12 × 100% =91.67%.
TABLE 8
2 items of rank containing B Frequency of Compare with optimal sort line
A>B 4 Is accurate to
B>D 3 Is accurate to
B>C 2 Correction of
B>Y 2 Correction of
C>B 1 Error(s) in
Total up to 12 Correct sum 11
Table 9 lists the position accuracy rate rechecking results of all the comparison elements, and the average position accuracy rate of all the comparison elements after rechecking can also be calculated according to the position accuracy rate of the comparison elements after rechecking. And finally, sequencing all the sequencing lines according to the average accuracy, and providing the sequencing line with the highest average accuracy for the user.
TABLE 9
Figure BDA0001790696370000101
Figure BDA0001790696370000111
The sorting work selects the sorting of two elements as a starting point sorting line, the length of the sorting line is increased along with the increase of the elements behind, so that the number of the elements participating in the sorting is increased along with the increase of the elements, the sorting elements participating in the sorting in the front cannot be compared with the elements not participating in the sorting behind, and the rechecking step arranged in the invention is to compare the accuracy of the sorting condition of each element with the sorting condition of all the elements, so that the calculation is more comprehensive, and the calculation amount is not too large.
The method of the present invention will be described below by taking an experiment for measuring the effective components of genuine herbs as an example.
Genuine herbs, also called genuine herbs, are selected by long-term clinical application of traditional Chinese medicine, and are produced in specific regions through specific production processes, and compared with the same kind of herbs produced in other regions, genuine herbs have better quality, better curative effect and higher popularity. In the effective component determination experiment of genuine medicinal materials,
in order to compare the content of tanshinone I in each province, which is the highest province, the quality of the province is determined by using the method disclosed by the invention.
Each experimental study was collected for each province ranking of tanshinone I and two comparative element ranking lines were formed, a partial example being shown in table 10. The data in table 10 were selected from published scientific journal, and some results of the measurements of different provinces of tanshinone I were listed and ranked.
Watch 10
Figure BDA0001790696370000112
Figure BDA0001790696370000121
The data in table 11 is a decomposition of the ranking line for each experiment into ranking lines containing only two comparison elements.
TABLE 11
Experimental number 2 items ordering
3 Province of Henan province>Anhui province
3 Province of Henan province>Province of Shaanxi
3 Province of Henan province>Gansu province
3 Province of Henan province>Sichuan province
3 Province of Henan province>Shandong province
3 Province of Henan province>Province of Hebei province
3 Anhui province>Province of Shaanxi
3 Anhui province>Gansu province
3 Anhui province>Sichuan province
3 Anhui province>Shandong province
3 Anhui province>Province of Hebei province
3 Province of Shaanxi>Gansu province
…… ……
The data in table 12 are the frequency of the ranking lines for the two comparison elements for all experiments.
TABLE 12
2 items ordering Frequency of use
Shandong province>Province of Henan province 10
Province of Henan province>Anhui province 8
Shandong province>Province of Hebei province 8
Shandong province>Sichuan province 8
Shandong province>Anhui province 7
Province of Henan province>Province of Hebei province 6
Province of Hebei province>Anhui province 6
Province of Henan province>Sichuan province 6
Shandong province>Jiangsu province 5
Sichuan province>Province of Henan province 4
Jiangsu province>Anhui province 4
…… ……
The data in table 13 is the accuracy of the optimal sorting line obtained by the method according to the present embodiment, and a plurality of optimal sorting lines with high accuracy are formed through the loop calculation of multiple accuracy probabilities.
Watch 13
Figure BDA0001790696370000131
Figure BDA0001790696370000141
The data in table 14 is obtained by rechecking the accuracy of each optimal ranking line, recalculating the average accuracy probability, and listing the ranking lines according to the average accuracy probability for the researchers.
TABLE 14
Figure BDA0001790696370000142
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (7)

1. A probability comprehensive sequencing method is characterized by comprising the following steps:
s1: defining a data set formed by sequencing results of previous experiments as a data set P, decomposing each sequencing line in the data set P into sequencing lines only containing two comparison elements, and defining a data set formed by all the sequencing lines only containing two comparison elements as a data set Q;
s2: counting the repetition frequency of each sequencing line in the data set Q;
s3: taking the sorting line with the highest occurrence frequency in the data set Q as a starting sorting line, and taking the comparison element Q in the starting sorting line as a starting sorting line 1 And a comparison element q 2 Adding newly appeared comparison elements q in the subsequent sorting line one by one as a basis n Comparing said element q n And a comparison element q 1 ~q n-1 Combining, listing the elements comprising comparison q 1 ~q n-1 N ordering lines constituting a data set M, wherein the comparison element q n The positions in each sorting line of the data set M are different, n is a positive integer, and the maximum value is the number of comparison elements in the data set Q;
s4: decomposing each sorting line in the data set M into a sorting line containing only the comparison elements q according to the method in the step S1 n And said comparison element q 1 ~q n-1 N-1 sorting lines of any one of the data sets M, wherein n groups of n-1 sorting lines decomposed in the data set M form a data set R;
s5: respectively searching the sorting lines in the data set R in the data set Q, comparing the sorting relation between the sorting lines in the data set Q and the sorting lines in the data set R, and marking the correct frequency or the wrong frequency of the sorting lines in the data set R according to the comparison result;
s6: separately calculating said comparison elements q n The position accuracy in each sorting line of the data set M is calculated by the following formula: the sum of correct frequencies of each group of sort lines in the data set R/the sum of correct frequencies and error frequencies of each group of sort lines in the data set R is 100%;
s7: selecting the comparison element q n Taking the ranking line with the highest position accuracy rate in the ranking lines of the data set M as a calculation comparison element q n+1 The starting point sorting line of the position accuracy returns to the step S3, the steps S3 to S6 are repeated, and the comparison element q is obtained n+1 Circularly executing the sorting line with the highest position accuracy until the optimal sorting line with the highest position accuracy of all the comparison elements is obtained;
the step S1 further comprises sorting the line of data P n Is combined according to the order of any two comparison elements in the sorting line p n The sorting relation in (2) is sorted to obtain a sorting line only containing the arbitrary two comparison elements.
2. The probabilistic comprehensive ranking method according to claim 1, wherein the step S5 further comprises searching the data set Q for the ranking lines in the data set R respectively, and if the first ranking line in the data set R is the same as the first ranking line in the data set Q, labeling the occurrence frequency of the first ranking line in the data set Q as the correct frequency of the first ranking line in the data set R; if a second sort line in the data set R is opposite to a second sort line in the data set Q, labeling a frequency of occurrence of the second sort line in the data set Q as a frequency of errors of the second sort line in the data set R; if a third sorted line in the data set R has neither the same sorted line nor the opposite sorted line in the data set Q, then the correct frequency for the third sorted line in the data set R is noted to be 0.
3. The method of claim 1, wherein in step S6, if the comparison element q is set as the element q n If the position accuracy rates in more than 2 sorting lines of the data set M are the same and the sorting line is the highest position accuracy rate, taking all the more than 2 sorting lines as the starting sorting line, and continuously calculating the next newly appeared comparison element q n+1
4. The method for comprehensive ranking of probability according to claim 1, further comprising step S8 after the step S7: and calculating the average position accuracy of all comparison elements in the optimal sorting line, summing the position accuracy of each comparison element in the optimal sorting line, and dividing by the number of the comparison elements to obtain the average position accuracy of the optimal sorting line.
5. The method for comprehensive ranking of probability according to claim 4, further comprising step S9 after the step S8: rechecking the position accuracy of the comparison elements in the optimal sorting line, and searching all the data sets Q containing the comparison elements Q n Will each contain said comparison element q n Is compared with the optimal sorting line, and the comparison element q is marked according to the comparison result of the sorting relation n The correct frequency or the wrong frequency.
6. The method of comprehensive ranking of probabilities according to claim 5, wherein the step S9 further comprises: if the comparison result is that the ordering relations are the same, the comparison element q is included n Is marked as the comparison element Q n The position correct frequency in the optimal sequencing line; if the comparison result is that the ordering relation is opposite, the comparison element q is included n Is marked as the comparison element Q n A position error frequency in the optimal ranking line; if the data set Q contains the comparison element Q n Does not appear in the optimal sorting line, the comparison element q is compared n The position accuracy in the optimal sorting line is labeled as 0.
7. The method for comprehensive ranking of probability according to claim 6, further comprising step S10 after step S9: and calculating the average position accuracy after rechecking of all comparison elements in the optimal sorting line.
CN201811035247.1A 2018-09-06 2018-09-06 Probability comprehensive ordering method Active CN109190089B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811035247.1A CN109190089B (en) 2018-09-06 2018-09-06 Probability comprehensive ordering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811035247.1A CN109190089B (en) 2018-09-06 2018-09-06 Probability comprehensive ordering method

Publications (2)

Publication Number Publication Date
CN109190089A CN109190089A (en) 2019-01-11
CN109190089B true CN109190089B (en) 2023-01-03

Family

ID=64914751

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811035247.1A Active CN109190089B (en) 2018-09-06 2018-09-06 Probability comprehensive ordering method

Country Status (1)

Country Link
CN (1) CN109190089B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1530852A (en) * 2003-03-10 2004-09-22 磊 杨 Computer sequencing technology based on probability distribution
CN101807925A (en) * 2010-02-08 2010-08-18 南京朗坤软件有限公司 Historical data compression method based on numerical ordering and linear fitting
CN104751254A (en) * 2015-04-23 2015-07-01 国家电网公司 Line loss rate prediction method based on non-isometric weighted grey model and fuzzy clustering sorting

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1530852A (en) * 2003-03-10 2004-09-22 磊 杨 Computer sequencing technology based on probability distribution
CN101807925A (en) * 2010-02-08 2010-08-18 南京朗坤软件有限公司 Historical data compression method based on numerical ordering and linear fitting
CN104751254A (en) * 2015-04-23 2015-07-01 国家电网公司 Line loss rate prediction method based on non-isometric weighted grey model and fuzzy clustering sorting

Also Published As

Publication number Publication date
CN109190089A (en) 2019-01-11

Similar Documents

Publication Publication Date Title
Leimeister et al. Fast alignment-free sequence comparison using spaced-word frequencies
Campbell Chi‐squared and Fisher–Irwin tests of two‐by‐two tables with small sample recommendations
Rathore et al. A rapid hybrid clustering algorithm for large volumes of high dimensional data
Chan et al. Clustered integer 3SUM via additive combinatorics
CN101859383B (en) Hyperspectral remote sensing image band selection method based on time sequence important point analysis
CN102200999B (en) Method for retrieving similarity shape
Qiu Toward deterministic and semiautomated SPADE analysis
CN108880846B (en) Method and device for determining vector representation form for nodes in network
CN105975794A (en) Breast cancer chemotherapy regimen recommendation method based on weighted KNN
CN110209946A (en) Based on social and community Products Show method, system and storage medium
CN109190089B (en) Probability comprehensive ordering method
Bokut et al. Composition–Diamond lemma for tensor product of free algebras
CA3033201A1 (en) Large scale social graph segmentation
Tarnopolski Graph-based clustering of gamma-ray bursts
CN105512322A (en) Frequent item set generating method and device
CN111506833B (en) Friend recommendation method based on single-source SimRank accurate solution
CN107526939A (en) A kind of quick small molecule structure alignment schemes
CN109446427A (en) A kind of information recommendation method and device
Gfeller et al. Towards optimal range medians
Onoda et al. Independent Component Analysis based Seeding method for k-means Clustering
Zhang et al. On 3DD-curves of DNA sequences
Huang et al. Construction of uniform designs and complex-structured uniform designs via partitionable t-designs
Chanchary et al. Time Windowed Data Structures for Graphs.
Zadeh et al. Max-sum diversification, monotone submodular functions and semi-metric spaces
CN107066554B (en) Microblog related person recommendation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant