CN109190089A - Probabilistic Synthesis sort method - Google Patents

Probabilistic Synthesis sort method Download PDF

Info

Publication number
CN109190089A
CN109190089A CN201811035247.1A CN201811035247A CN109190089A CN 109190089 A CN109190089 A CN 109190089A CN 201811035247 A CN201811035247 A CN 201811035247A CN 109190089 A CN109190089 A CN 109190089A
Authority
CN
China
Prior art keywords
line
data set
sequence
sequence line
comparison element
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811035247.1A
Other languages
Chinese (zh)
Other versions
CN109190089B (en
Inventor
李园白
杨阳
刘方舟
王静
王琳
张颖
张一颖
李萌
杜昱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute Of Information On Traditional Chinese Medicine Cacms
Original Assignee
Institute Of Information On Traditional Chinese Medicine Cacms
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute Of Information On Traditional Chinese Medicine Cacms filed Critical Institute Of Information On Traditional Chinese Medicine Cacms
Priority to CN201811035247.1A priority Critical patent/CN109190089B/en
Publication of CN109190089A publication Critical patent/CN109190089A/en
Application granted granted Critical
Publication of CN109190089B publication Critical patent/CN109190089B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/06Arrangements for sorting, selecting, merging, or comparing data on individual record carriers
    • G06F7/08Sorting, i.e. grouping record carriers in numerical or other ordered sequence according to the classification of at least some of the information they carry

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Algebra (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to field of computer technology, provide a kind of Probabilistic Synthesis sort method.This method mainly comprise the steps that by the past experimental result resolve into only include two comparison elements sequence line;Count the repetition frequency of every sequence line;The position accuracy for selecting the starting point sequence each comparison element of line cycle calculations, until obtaining the optimal sequencing line that all comparison elements are formed with extreme higher position accuracy.The present invention is carried out multiple sequence lines in the way of correct probability for the first time and merged, the line that sorts can not include whole elements, and can be in the presence of that Partial Elements collating sequence is inconsistent between different sequence lines, the method that exhaustive all possible sequence lines carry out correct probability comparison again is abandoned, by the way of the screening of highest correct probability, gradually the sequence line of highest correct probability is shown, the review of coherent element is finally carried out to the sequence line of high probability again, the calculation amount not only greatly reduced, and technical result accuracy rate is higher.

Description

Probabilistic Synthesis sort method
Technical field
The present invention relates to field of computer technology more particularly to a kind of Probabilistic Synthesis sort methods.
Background technique
During scientific experiment, some scientific experiments can obtain some ranking results, if the sequence knot of multiple experiments Fruit can be brought together comprehensive utilization, form a comprehensive ranking results, be to have certain significance of scientific research.
According to previous sort method, be the element determination value in all experiments is put together and is ranked up, but due to Experiment condition is different, experiment survey meter device is different, experimental method is different, although being directed to same experimental subjects, different research Experiment, the experimental result of acquisition is different, and different experiments are quite big to the difference of the measurement of same experimental subjects magnitude obtained, It is difficult to accept or reject, since some differences can reach the degree of the order of magnitude, thus it is integrated ordered only according to measured value progress, it is extremely inaccurate. For example, have a kind of scientific experiment be about composition measurement class scientific experiment, experiment be measure the X Chinese medicine in several provinces A it is effective The comparison result of component content, it is contemplated that look at the A active constituent content highest of the X Chinese medicine in which province, quality of medicinal material is best.Have Measuring be the first and second the third provinces, measuring be the third naphthacene of second, measuring be the first and second naphthacene, obtained What is obtained is the concrete content value of the A effective component of the X Chinese medicine in different provinces.When researcher wants to obtain that the A effective component of X Chinese medicine contains When all province comparison results of amount, such as which province highest or secondary height, if the measured value directly provided according to experiment by It is ranked up (experiment condition is different) according to size, sequence is extremely inaccurate.Because the A for measuring X Chinese medicine in certain experiment is effective Component content, the measured value of the first and second the third provinces are respectively 0.6mg/ml, 0.5mg/ml, 0.4mg/ml, and in another experiment The measured value that Ding Yijia is saved is respectively 10mg/ml, 9mg/ml, 8mg/ml, if directly sorting according to measured value is that can not obtain Accurate integrated ordered result.
Since experiment condition is consistent in single experiment, so the ranking results comprising part province are quasi- in single experiment True, such as: certain experiment obtains first > second > the third, and in addition experiment obtains fourth > second > first.If it is desired to obtain all provinces about in X The sequence of the A effective component of medicine needs a kind of sort method and the ranking results of each experiment is integrated, this sequence side Method not only otherwise by the experiment condition result of each research is different is limited, but also can solve different between different experiments province The problem of ranking results.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of Probabilistic Synthesis sort methods, can comprehensively utilize different sections Experiment ranking results are ground, the higher ranking results of accuracy rate are provided.
In order to solve the above-mentioned technical problems, the present invention provides a kind of Probabilistic Synthesis sort methods, comprising the following steps:
S1: the data set definition by the past experiment ranking results composition is data set P, will be each in the data set P Item sequence line resolves into sequence line only comprising two comparison elements, all sequence lines compositions only comprising two comparison elements Data set definition is data set Q;
S2: the repetition frequency of every sequence line in the data set Q is counted;
S3: using the highest sequence line of frequency of occurrence in the data set Q as starting point sequence line, in starting point sequence line Comparison element q1With comparison element q2Based on, emerging comparison element q in subsequent sequence line is added one by onen, by the ratio Compared with element qnWith comparison element q1~qn-1Combination, is listed comprising comparison element q1~qn-1N item sort line, the n item sorts line Composition data collection M, wherein the comparison element qnPosition in the every sequence line of the data set M is different from, and n is Positive integer, maximum value are the quantity of comparison element in the data set Q;
S4: every sequence line in the data set M is resolved into only according to the method in the step S1 comprising described Comparison element qnWith the comparison element q1~qn-1Any one of n-1 item sort line, the n group decomposited in the data set M N-1 item sequence line composition data collection R;
S5: searching for the sequence line in the data set R respectively in the data set Q, compares and sorts in the data set Q Sort the ordering relation of line in line and the data set R, and the correct of line of sorting in the data set R is marked according to comparing result The frequency or the wrong frequency;
S6: the comparison element q is calculated separatelynPosition accuracy in the every sequence line of the data set M, calculates Formula are as follows: in the data set R in the sum of correct frequency of every group of sequence line/data set R every group of sequence line correct frequency The summation * 100% of the secondary and wrong frequency;
S7: the comparison element q is chosennThe highest sequence line of position accuracy in the sequence line of the data set M is made To calculate comparison element qn+1The starting point sequence line of position accuracy, return step S3 repeat step S3~S6, obtain the ratio Compared with element qn+1The highest sequence line of position accuracy, circulation executes until acquisition all comparison element positions accuracy is highest Optimal sequencing line.
Further, the step S1 further includes, by the sequence line p of the data set PnIn any two comparison element In conjunction with according to any two comparison element in the sequence line pnIn ordering relation be ranked up, obtain only comprising institute State the sequence line of any two comparison element.
Further, the step S5 further includes searching for the sequence in the data set R respectively in the data set Q Line, if the first sequence line in the data set R is identical as the first sequence line in the data set Q, by the data The frequency of occurrence of first sequence line described in collection Q is labeled as the correct frequency of the first sequence line described in the data set R;If The second sequence line in the data set R and the second sequence line in the data set Q are on the contrary, then by institute in the data set Q The frequency of occurrence for stating the second sequence line is labeled as the wrong frequency of the second sequence line described in the data set R;If the number It is then marked in the data set Q both without identical sequence line or not opposite sequence line according to the third sequence line in collection R The correct frequency of the sequence line of third described in the data set R is 0.
Further, in the step S6, if the comparison element qn2 or more in the data set M are sorted Position accuracy in line is identical and is extreme higher position accuracy, then all regard described 2 or more sequence lines as starting point sequence line, Continue to calculate next emerging comparison element qn+1
Further, further include step S8 after the step S7: calculating all comparison elements in the optimal sequencing line Mean place accuracy, sum to the position accuracy of each comparison element in the optimal sequencing line, then divided by more first The quantity of element, obtains the mean place accuracy of the optimal sequencing line.
It further, further include step S9 after the step S8: to the position of comparison element in the optimal sequencing line Accuracy is checked, and is searched in the data set Q all comprising the comparison element qnSequence line, by every include institute State comparison element qnSequence line be compared with the optimal sequencing line, marked according to the comparison result of ordering relation described Comparison element qnThe correct frequency or the wrong frequency.
Further, the step S9 further include:, will be comprising described if the comparison result is that ordering relation is identical Comparison element qnFrequency of occurrence of the sequence line in the data set Q be labeled as the comparison element qnIn the optimal sequencing The correct frequency in position in line;If the comparison result be ordering relation on the contrary, if will include the comparison element qnRow Frequency of occurrence of the sequence line in the data set Q is labeled as the comparison element qnPositional fault in the optimal sequencing line The frequency;If in the data set Q including the comparison element qnSequence line ordering relation do not appear in it is described optimal In sequence line, then by the comparison element qnPosition accuracy in the optimal sequencing line is labeled as 0.
Further, further include step S10 after the step S9: it is multiple to calculate all comparison elements in the optimal sequencing line Mean place accuracy after core.
Above-mentioned technical proposal of the invention has the advantages that the present invention proposes solution aiming at the problem that multiple sequence line generalizations Certainly method is carried out multiple sequence lines in the way of correct probability for the first time and merged, and sequence line can not include whole elements, and Can be in the presence of that Partial Elements collating sequence is inconsistent between difference sequence line.The present invention abandons during realization The method that exhaustive all possible sequence lines carry out correct probability comparison again, the mode for having used highest correct probability to screen, gradually The sequence line of highest correct probability is shown, and finally carries out the review of coherent element to the sequence line of highest correct probability again, Calculation amount is not only greatly reduced in this way, and ranking results accuracy and the effect of exhaustive all sequence lines are consistent.
Detailed description of the invention
Fig. 1 is the flow diagram of Probabilistic Synthesis sort method of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Technical solution in embodiment is clearly and completely described, it is clear that described embodiment is that a part of the invention is implemented Example, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creativeness Every other embodiment obtained, shall fall within the protection scope of the present invention under the premise of labour.
Fig. 1 is the flow diagram of Probabilistic Synthesis sort method of the present invention.As shown in Figure 1, Probabilistic Synthesis of the invention sorts Method the following steps are included:
S1: the data set definition by the past experiment ranking results composition is data set P, and each in data set P is arranged Sequence line resolves into sequence line only comprising two comparison elements, the data of all sequence lines compositions only comprising two comparison elements Collection is defined as data set Q.
It in step sl, is data set P by the data set definition of the past experiment ranking results composition, it will be in data set P Each sequence line resolves into sequence line only comprising two comparison elements, and all only includes the sequence line group of two comparison elements At data set definition be data set Q.Sequence line in data set Q is to decompose the sequence line in data set P, by data Collect same sequence line p in PnIn any two comparison element combine, according to any two comparison element sequence line pnIn Ordering relation be ranked up, obtain only include any two comparison element a sequence line.
For example, having 6 sequences lines, respectively p in data set P1:A>C>B>D、p2:A>B>D>C>Y、p3:A>B>Y、p4:Y> D、p5:A>D>C>Y、p6: A > B > D > C, target are sequence of the expectation acquisition about 5 elements of ACBDY.It is any to choose sequence line p1: two comparison elements A and C in A > C > B > D, according to elements A and C in sequence line p1In ordering relation, then form sequence Line: A > C similarly chooses comparison element A and B, according to A and B in sequence line p1In ordering relation formed sequence line: A > B, successively Analogize, by the line p that sorts1: A > C > B > D forms 5 sequences line as shown in Table 1 after decomposing.
Table 1
Serial number 1 2 3 4 5
Sort line A>C A>B A>D C>B B>D
S2: the repetition frequency of every sequence line in statistical data collection Q.
In step s 2, summarize the sequence line repeated in data set Q, count the repetition frequency of every sequence line.Example Such as, after 6 sequence lines in above-mentioned data set P all being decomposed, summarize 13 sequence lines out, count the repetition of every sequence line The frequency, as shown in table 2.
Table 2
Serial number 1 2 3 4 5 6 7 8 9 10 11 12 13
Sort line A>B A>D A>C D>C B>D A>Y C>Y B>C B>Y D>Y C>B C>D Y>D
The frequency 4 4 4 3 3 3 2 2 2 2 1 1 1
S3: first with the comparison in starting point sequence line using the highest sequence line of frequency of occurrence in data set Q as starting point sequence line Plain q1With comparison element q2Based on, emerging comparison element q in subsequent sequence line is added one by onen, by comparison element qnWith than Compared with element q1~qn-1Combination, is listed comprising comparison element q1~qnN item sort line, n item sort line composition data collection M, wherein Comparison element qnPosition in the every sequence line of data set M is different from, and n is positive integer, and maximum value is to compare in data set Q Compared with the quantity of element.
In step s3, using the highest sequence line of frequency of occurrence in data set Q as starting point sequence line, for example, in table 2, with A > B that frequency of occurrence is 4 times is that starting point sequence line is gradually added in subsequent sequence line newly to go out based on comparison element A and B Existing comparison element, as Article 2 sort the emerging comparison element D of line, by emerging comparison element D compared with before 3 sequence lines comprising comparison element A, B and D 1. A > B > D, 2. A > D > B and 3. D > A > B are listed in elements A and B combination, wherein Position in every sequence line where comparison element D is different from, that is, lists all positions being likely to occur comparison element D.On State 3 sequence lines 1. A > B > D, 2. A > D > B and 3. D > A > B composition data collection M.Wherein, n is positive integer, and maximum value is data set Q The quantity of middle comparison element.
S4: every sequence line in data set M is resolved into only according to the method in step S1 comprising comparison element qnWith Comparison element q1~qn-1Any one of n-1 item sort line, the n group n-1 item decomposited in data set M sorts line composition data Collect R.
In step s 4, every sequence line in data set M is resolved into only according to the method in step S1 comprising qnAnd q1 ~qn-1Any one of n-1 item sort line, share n group n-1 item sequence line.Such as 3 of comparison element A, B and D composition are arranged Sequence line 1. A > B > D, 2. A > D > B and 3. D > A > B according to the method in step S1 resolves into 3 groups of sequence lines, every group of sequence line includes 2 Item sorts line, and every sequence line only includes any one of comparison element D and A, B, as shown in table 3.3*2 item sequence in table 3 Line composition data collection R.
Table 3
S5: searching for the sequence line in data set R respectively in data set Q, and sort line and data set R in correlation data collection Q The ordering relation of middle sequence line, according to comparing result come the correct frequency for the line that sorts in labeled data collection R or the wrong frequency.
In step s 5, the sequence line in data set R is searched for respectively in data set Q, if the first row in data set R Sequence line is identical as the first sequence line in data set Q, then the frequency of occurrence of the first sequence line in data set Q is labeled as data set The correct frequency of first sequence line in R, for example, searching for sequence line A > D in data set R, that is, table 3, hair in data set Q, that is, table 2 Existing identical sequence line, frequency of occurrence are 4 times, then the correct frequency for the line A > D that sorts in labeled data collection R is 4.
If the second sequence line in data set R and the second sequence line in data set Q on the contrary, if by data set Q the The frequency of occurrence of two sequence lines is labeled as the wrong frequency of the second sequence line in data set R, for example, searching in data set Q, that is, table 2 Sequence line D > B in rope data set R, that is, table 3, discovery have opposite sequence line B > D, and frequency of occurrence is 3 times, then labeled data The wrong frequency for collecting the line D > B that sorts in R is 3.
If the third sequence line in data set R is in data set Q both without identical sequence line or not opposite row Sequence line, then the correct frequency of third sequence line is 0 in labeled data collection R.That is there is no in data set for the sequence line in data set R Occur in Q, shows do not occurred this as a result, so its correct frequency can be designated as 0 in experiment.
It after by the sequence line in data set R all search, can obtain as shown in table 4 as a result, listing 3*2 in table 4 Correct, the mistake frequency of item sequence line.
Table 4
S6: comparison element q is calculated separatelynPosition accuracy in the every sequence line of data set M, calculation formula are as follows: In data set R in the sum of correct frequency of every group of sequence line/data set R the correct frequency of every group of sequence line and the wrong frequency it is total With * 100%.
In step s 6, comparison element q is calculated separatelynPosition accuracy in the every sequence line of data set M, calculates Formula are as follows: in data set R in the sum of correct frequency of every group of sequence line/data set R every group of sequence line the correct and wrong frequency Summation * 100%.For example, in table 4,3 bar sequence lines 1. A > B > D, 2. A > D > B and 3. D > A of the comparison element D in data set M Position in > B is different from, and according to above-mentioned formula, position accuracy of the comparison element D in sequence line 1. A > B > D can be calculated For (correct 4 times+3 times correct)/(correct 4 times+3 times correct) * 100%=100%;Comparison element D is in sequence line 2. A > D > B Position accuracy can be calculated as correct 4 times/(correct 4 times+mistake 3 times) * 100%=57.14%;Comparison element D is sorting Position accuracy of the line 3. in D > A > B can be calculated as correct 0 time/(4 times+mistake of mistake 3 times) * 100%=0.Calculated result is such as Shown in table 5.
Table 5
S7: comparison element q is chosennThe highest sequence line of position accuracy in the sequence line of data set M, as calculating ratio Compared with element qn+1The starting point sequence line of position accuracy, return step S3 repeat step S3~S6, obtain comparison element qn+1Position The highest sequence line of accuracy, circulation execute until obtaining the highest optimal sequencing line of all comparison element positions accuracy.
In step S7, comparison element q is chosennThe highest sequence line of position accuracy, example in the sequence line of data set M Such as, in 3 sequence lines of data set M 1. A > B > D, 2. A > D > B and 3. in D > A > B, comparison element D is in sequence line 1. A > B > D Position accuracy is up to 100%, therefore chooses sequence line 1. starting point row of the A > B > D as the next new comparison element C of calculating Sequence line.Return step S3 calculates next emerging comparison element C, repeats step S3~S6, is obtaining comparison element location of C just The highest sequence line of true rate, until obtaining the highest optimal sequencing line of all comparison element positions accuracy.
For example, can continue to calculate and compare Elements C and the position accuracy of Y, wherein with comparison element D in the example above Setting the highest sequence line A > B > D of accuracy is starting point sequence line, and comparison element C and sequence line A > B > D can be combined to 4 sequence lines 1. it is correct to can be calculated comparison element location of C by the above method by A > B > D > C, 2. A > B > C > D, 3. A > C > B > D, 4. C > A > B > D The highest sequence line of rate, then the rest may be inferred calculate comparison element Y position accuracy, finally obtain the optimal of 5 comparison elements Sequence line is A > B > D > C > Y, and each comparison element is as shown in table 6 in position accuracy wherein.
Table 6
Comparison element A B D C Y
Position accuracy 100.00% 100.00% 100.00% 81.82% 90.00%
Present embodiment pass through using the highest frequency only include two comparison elements sequence line be starting point sequence line, pass through meter The accuracy height for calculating each element in the sequence line after new element is added is ranked up the extension of line, until all coherent elements The sequence being involved in forms the highest sequence line of final accuracy.During realization, exhaustive all possible sequence lines have been abandoned The comparative approach for carrying out correct probability again, by the way of the screening of highest correct probability, gradually the sequence of highest correct probability Line is shown.
In step s 6, if comparison element qnPosition accuracy in 2 or more of data set M sequence lines it is identical and For extreme higher position accuracy, then all it regard this 2 or more sequence lines as starting point sequence line, continues to calculate next emerging comparison Element qn+1.Comparison element q ought occurnWhen position accuracy in 2 or more the sequence lines of data set M is identical, such as than 3 sequence lines 1. A > B > D, 2. A > D > B and 3. in D > A > B compared with element D in data set M, sequence line 1. A > B > D and 2. A > D > B Position accuracy it is identical and be extreme higher position accuracy, then will sequence line 1. A > B > D and 2. A > D > B is remained, respectively Using sort line 1. A > B > D and 2. A > D > B as starting point sort line, continue to calculate next emerging comparison element C.Since starting point is arranged There is the case where being greater than 1 in sequence line, therefore the optimal sequencing line finally obtained will also the case where being greater than 1 occur, if Repeatedly occurs the case where increasing starting point sequence line during calculating each comparison element, then the optimal sequencing line finally obtained A fairly large number of situation will be will appear.
Further include step S8 after step S7: calculating the mean place accuracy of all comparison elements in optimal sequencing line.
In step s 8, the mean place accuracy for calculating all comparison elements in optimal sequencing line, to optimal sequencing line In each comparison element the summation of position accuracy, then divided by the quantity of comparison element, obtain the mean place of optimal sequencing line Accuracy.When optimal sequencing line has a plurality of, the mean place accuracy of all optimal sequencing lines is calculated, is sorted by numerical value height Compare.For example, the mean place accuracy in above-mentioned table 6 is (100.00%+100.00%+100.00%+81.82%+ 90.00%)/5=94.36%.
When calculating each comparison element, since a plurality of starting point sequence line may be chosen, such as the starting point sequence most started When line options, the highest sequence line of frequency of occurrence has 3, respectively A > B, A > D, A > C in data set Q, this 3 sequence lines all may be used Subsequent calculating is carried out as starting point sequence line, meanwhile, the identical situation of the position accuracy being likely to occur in calculating process It will lead to and increase starting point sequence line when calculating next comparison element, therefore, the calculating carried out for a plurality of starting point sequence line can Obtain a plurality of optimal sequencing line.By calculating the mean place accuracy of all comparison elements in optimal sequencing line, can exist When a plurality of optimal sequencing line, to the further sequence respectively of a plurality of optimal sequencing line.
The position accuracy that preamble calculates each comparison element is calculated since the highest order item of the frequency, constantly handle Sequence line is sophisticated to multinomial from 2, only considered the preamble comparison element of each newly-increased comparison element, this side in calculating process Method, which does not need especially largely to calculate, can list relevant sequence line.If from the point of view of more comprehensively, all elements Should all participate in some element position accuracy calculate, not only should preamble order item participate in calculate, subsequent sequence Item should also participate in calculating.
As a result, further include step S9 after step S8: the position accuracy of comparison element in optimal sequencing line is answered Core is searched for all comprising comparison element q in data set QnSequence line, by every include comparison element qnSequence line with most Excellent sequence line is compared.It include comparison element q by this if comparison result is that ordering relation is identicalnSequence line exist Frequency of occurrence in data set Q is labeled as comparison element qnThe correct frequency in position in this bar optimal sequencing line, if compared It as a result is ordering relation on the contrary, including then comparison element q by thisnFrequency of occurrence of the sequence line in data set Q be labeled as Comparison element qnThe positional fault frequency in this bar optimal sequencing line, if this includes comparison element q in data set QnRow The ordering relation of sequence line does not appear in this bar optimal sequencing line, then by comparison element qnPosition in this bar optimal sequencing line It sets accuracy and is labeled as 0.
This step mainly checks the position accuracy of each element and the mean place accuracy of all comparison elements, Such as optimal sequencing line is A > B > D > C > Y, the sequence line frequency of occurrence of data set Q, it is first to search for all about comparison in reference table 2 The sequence line of plain A can obtain as shown in table 7 as a result, the position accuracy review result of so comparison element A is that A=is correct The total frequency 15*100%=100% of the frequency 15/.
Table 7
2 order items containing A The frequency Compared with optimal sequencing line
A>B 4 Correctly
A>C 4 Correctly
A>D 4 Correctly
A>Y 3 Correctly
It is total 15 Correct total 15
Similarly, the position accuracy for comparing element B is checked, the results are shown in Table 8, then comparison element B It is the total frequency 12*100%=91.67% of the correct frequency 11/ of B=that position accuracy, which checks result,.
Table 8
2 order items containing B The frequency Compared with optimal sequencing line
A>B 4 Correctly
B>D 3 Correctly
B>C 2 Correctly
B>Y 2 Correctly
C>B 1 Mistake
It is total 12 Correct total 11
The position accuracy that table 9 lists whole comparison elements is checked as a result, according to the position of the comparison element after review Accuracy, can also all comparison elements after calculation review mean place accuracy.Finally, each sequence line is according to averagely just True rate height sorts, for users to use the sequence line of highest average accuracy.
Table 9
It is first to choose the sequence of two elements as starting point sequence line, with the increasing of latter element because of this sequence work Add and increase the length of sequence line, sequence in this way is the increase with element and increases the first prime number for participating in sequence, in this way before The sequence element of participation can not be compared with the element to sort below there are no participation, set review step in the present invention Suddenly, it exactly allows the ordering scenario of each element and all elements to carry out the comparison of accuracy, makes calculating more comprehensive in this way, and And calculation amount is not too large.
Below to illustrate method of the invention for measuring the experiment of genunie medicinal materials effective component.
Genunie medicinal materials, also known as authentic medicinal herbs refer to and preferably come out by tcm clinical practice prolonged application, in specific region logical The produced medicinal material of specific production process is crossed, good effect good compared with other produced medical material quanlities of the same race in area has higher well-known Degree.In the effective constituent determination experiment of genunie medicinal materials,
One of genunie medicinal materials Radix Salviae Miltiorrhizae, many experiments measure this by studying the effective component in each province Radix Salviae Miltiorrhizae The quality of province Radix Salviae Miltiorrhizae, wherein the effective component Tanshinone I in Radix Salviae Miltiorrhizae is a common significant ingredient, in order to more each Which province highest of the content of province Tanshinone I, therefore be ranked up using method of the invention.
Each province that each experimental study is acquired about Tanshinone I is sorted, and forms two comparison elements sequence lines, portion Divide example as shown in table 10.Data are screened from the technical journal published in table 10, and Tanshinone I difference province is surveyed Fixed partial results, which are set out, to be come, and starts to be ranked up.
Table 10
Data are the sequence lines sequence line of each experiment resolved into only comprising two comparison elements in table 11.
Table 11
Test serial number 2 sequences
3 Henan Province > Anhui Province
3 Henan Province > Shaanxi Province
3 Henan Province > Gansu Province
3 Henan Province > Sichuan Province
3 Henan Province > Shandong Province
3 Henan Province > Hebei province
3 Anhui Province > Shaanxi Province
3 Anhui Province > Gansu Province
3 Anhui Province > Sichuan Province
3 Anhui Province > Shandong Province
3 Anhui Province > Hebei province
3 Shaanxi Province > Gansu Province
…… ……
Data are the frequencys of the sequence line for two comparison elements for counting all experiments in table 12.
Table 12
2 sequences The frequency
Shandong Province > Henan Province 10
Henan Province > Anhui Province 8
Shandong Province > Hebei province 8
Shandong Province > Sichuan Province 8
Shandong Province > Anhui Province 7
Henan Province > Hebei province 6
Hebei province > Anhui Province 6
Henan Province > Sichuan Province 6
Shandong Province > Jiangsu Province 5
Sichuan Province > Henan Province 4
Jiangsu Province > Anhui Province 4
…… ……
Data are the accuracy for the optimal sequencing line that method according to the present embodiment obtains in table 13, are by repeatedly just The cycle calculations of true probability, form a plurality of optimal sequencing line by accuracy height.
Table 13
Data are that the review of accuracy is carried out to every optimal sequencing line in table 14, recalculate average correct probability, with The height of average correct probability is ranked up line and enumerates, and uses for researcher.
Table 14
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (8)

1. a kind of Probabilistic Synthesis sort method, which comprises the following steps:
S1: the data set definition by the past experiment ranking results composition is data set P, and each in the data set P is arranged Sequence line resolves into sequence line only comprising two comparison elements, the data of all sequence lines compositions only comprising two comparison elements Collection is defined as data set Q;
S2: the repetition frequency of every sequence line in the data set Q is counted;
S3: using the highest sequence line of frequency of occurrence in the data set Q as starting point sequence line, with the ratio in starting point sequence line Compared with element q1With comparison element q2Based on, emerging comparison element q in subsequent sequence line is added one by onen, the comparison is first Plain qnWith comparison element q1~qn-1Combination, is listed comprising comparison element q1~qn-1N item sort line, n item sequence line composition Data set M, wherein the comparison element qnPosition in the every sequence line of the data set M is different from, and n is positive whole Number, maximum value are the quantity of comparison element in the data set Q;
S4: every sequence line in the data set M is resolved into only according to the method in the step S1 comprising the comparison Element qnWith the comparison element q1~qn-1Any one of n-1 item sort line, the n group n-1 item decomposited in the data set M Sort line composition data collection R;
S5: searching for the sequence line in the data set R respectively in the data set Q, compare in the data set Q sort line and The ordering relation of sequence line in the data set R, the correct frequency for the line that sorts in the data set R is marked according to comparing result Or the wrong frequency;
S6: the comparison element q is calculated separatelynPosition accuracy in the every sequence line of the data set M, calculation formula Are as follows: in the data set R in the sum of correct frequency of every group of sequence line/data set R the correct frequency of every group of sequence line and The summation * 100% of the mistake frequency;
S7: the comparison element q is chosennThe highest sequence line of position accuracy in the sequence line of the data set M, as meter Calculate comparison element qn+1The starting point sequence line of position accuracy, return step S3 repeat step S3~S6, it is first to obtain the comparison Plain qn+1The highest sequence line of position accuracy, circulation executes until acquisition all comparison element positions accuracy is highest optimal Sort line.
2. Probabilistic Synthesis sort method according to claim 1, which is characterized in that the step S1 further includes, will be described The sequence line p of data set PnIn any two comparison element combine, according to any two comparison element in the sequence Line pnIn ordering relation be ranked up, obtain only include any two comparison element a sequence line.
3. Probabilistic Synthesis sort method according to claim 1, which is characterized in that the step S5 further includes, described The sequence line in the data set R is searched in data set Q respectively, if the first sequence line and the number in the data set R It is identical according to the first sequence line in collection Q, then the frequency of occurrence of the first sequence line described in the data set Q is labeled as the number According to the correct frequency of the first sequence line described in collection R;If in the second sequence line and the data set Q in the data set R Second sequence line on the contrary, then by described in the data set Q second sequence line frequency of occurrence be labeled as in the data set R The wrong frequency of the second sequence line;If the third sequence line in the data set R is in the data set Q both without phase With sequence line also not opposite sequence line, then the correct frequency for marking the sequence line of third described in the data set R is 0.
4. Probabilistic Synthesis sort method according to claim 1, which is characterized in that in the step S6, if described Comparison element qnPosition accuracy in 2 or more the sequence lines of the data set M is identical and is extreme higher position accuracy, then It all regard described 2 or more sequence lines as starting point sequence line, continues to calculate next emerging comparison element qn+1
5. Probabilistic Synthesis sort method according to claim 1, which is characterized in that after the step S7 further include step S8: calculating the mean place accuracy of all comparison elements in the optimal sequencing line, to each ratio in the optimal sequencing line Position accuracy compared with element is summed, then divided by the quantity of comparison element, the mean place for obtaining the optimal sequencing line is correct Rate.
6. according to claim 1 to any Probabilistic Synthesis sort method in 5, which is characterized in that after the step S8 Further include step S9: the position accuracy of comparison element in the optimal sequencing line being checked, is searched in the data set Q Suo Suoyou includes the comparison element qnSequence line, by every include the comparison element qnSequence line and the optimal row Sequence line is compared, and the comparison element q is marked according to the comparison result of ordering relationnThe correct frequency or the wrong frequency.
7. Probabilistic Synthesis sort method according to claim 6, which is characterized in that the step S9 further include: if institute Stating comparison result is that ordering relation is identical, then will include the comparison element qnAppearance of the sequence line in the data set Q The frequency is labeled as the comparison element qnThe correct frequency in position in the optimal sequencing line;If the comparison result is row Order relation will be on the contrary, then will include the comparison element qnFrequency of occurrence of the sequence line in the data set Q be labeled as it is described Comparison element qnThe positional fault frequency in the optimal sequencing line;If in the data set Q including the comparison element qn The ordering relation of sequence line do not appear in the optimal sequencing line, then by the comparison element qnIn the optimal sequencing Position accuracy in line is labeled as 0.
8. Probabilistic Synthesis sort method according to claim 7, which is characterized in that further include step after the step S9 S10: the mean place accuracy in the optimal sequencing line after all comparison element reviews is calculated.
CN201811035247.1A 2018-09-06 2018-09-06 Probability comprehensive ordering method Active CN109190089B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811035247.1A CN109190089B (en) 2018-09-06 2018-09-06 Probability comprehensive ordering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811035247.1A CN109190089B (en) 2018-09-06 2018-09-06 Probability comprehensive ordering method

Publications (2)

Publication Number Publication Date
CN109190089A true CN109190089A (en) 2019-01-11
CN109190089B CN109190089B (en) 2023-01-03

Family

ID=64914751

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811035247.1A Active CN109190089B (en) 2018-09-06 2018-09-06 Probability comprehensive ordering method

Country Status (1)

Country Link
CN (1) CN109190089B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1530852A (en) * 2003-03-10 2004-09-22 磊 杨 Computer sequencing technology based on probability distribution
CN101807925A (en) * 2010-02-08 2010-08-18 南京朗坤软件有限公司 Historical data compression method based on numerical ordering and linear fitting
CN104751254A (en) * 2015-04-23 2015-07-01 国家电网公司 Line loss rate prediction method based on non-isometric weighted grey model and fuzzy clustering sorting

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1530852A (en) * 2003-03-10 2004-09-22 磊 杨 Computer sequencing technology based on probability distribution
CN101807925A (en) * 2010-02-08 2010-08-18 南京朗坤软件有限公司 Historical data compression method based on numerical ordering and linear fitting
CN104751254A (en) * 2015-04-23 2015-07-01 国家电网公司 Line loss rate prediction method based on non-isometric weighted grey model and fuzzy clustering sorting

Also Published As

Publication number Publication date
CN109190089B (en) 2023-01-03

Similar Documents

Publication Publication Date Title
CN104217015B (en) Based on the hierarchy clustering method for sharing arest neighbors each other
Shah et al. Rumor centrality: a universal source detector
Stanton et al. Constructing and sampling graphs with a prescribed joint degree distribution
CN104809130B (en) Method, equipment and the system of data query
CN106874322A (en) A kind of data table correlation method and device
CN106777946B (en) Personalized health service recommendation method based on hierarchal model
Wei et al. Possibility degree method for ranking intuitionistic fuzzy numbers
CN109190089A (en) Probabilistic Synthesis sort method
CN103559318B (en) The method that the object containing heterogeneous information network packet is ranked up
CN106951325A (en) Space computational fields calculate intensity cube construction method
Balaban et al. Computer program for finding all possible cycles in graphs
Vragović et al. Network community structure and loop coefficient method
CN107562948A (en) A kind of printenv multidimensional data clustering method based on distance
JP6511971B2 (en) Information processing apparatus and program
Gupta et al. Community detection in heterogenous networks using incremental seed expansion
CN105761119B (en) Online number distribution calculation method and device
CN108984630A (en) Application method of the Node Contraction in Complex Networks importance in spam page detection
Ballester-Bolinches et al. A question on partial CAP-subgroups of finite groups
Chanchary et al. Time Windowed Data Structures for Graphs.
Mendonça et al. Asymptotic behavior of the length of the longest increasing subsequences of random walks
Jin et al. An efficient detecting communities algorithm with self-adapted fuzzy C-means clustering in complex networks
Nikolić et al. Complexity of some interesting (chemical) graphs
Sheng et al. Exact and approximate algorithms for the most connected vertex problem
Michieli Complex network analysis of men single atp tennis matches
Morales et al. On the classification of resolvable 2-(12, 6, 5c) designs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant