CN105468603B - Data selecting method and device - Google Patents

Data selecting method and device Download PDF

Info

Publication number
CN105468603B
CN105468603B CN201410419106.5A CN201410419106A CN105468603B CN 105468603 B CN105468603 B CN 105468603B CN 201410419106 A CN201410419106 A CN 201410419106A CN 105468603 B CN105468603 B CN 105468603B
Authority
CN
China
Prior art keywords
data
pending
group
total relevance
pending data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410419106.5A
Other languages
Chinese (zh)
Other versions
CN105468603A (en
Inventor
李岩
牛志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201410419106.5A priority Critical patent/CN105468603B/en
Publication of CN105468603A publication Critical patent/CN105468603A/en
Application granted granted Critical
Publication of CN105468603B publication Critical patent/CN105468603B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of data selecting method, comprising steps of A, selecting M+N data as pending data group from pending data;B, the degree of correlation of each of pending data group data and other each data is summed to obtain the total relevance of each data;C, N number of data are determined from the pending data group, the total relevance of identified N number of data is all larger than in the pending data group, the total relevance of other data in addition to identified N number of data;D, N number of data in addition to the data selected is selected to replace identified N number of data as new pending data group from the pending data;E, step B, C and D are continued to execute, until all data are selected to finish;F, N number of data that last time determines are deleted, target data is obtained, the number of the target data is M.Invention additionally discloses a kind of data selection means.The present invention realizes the efficiency and accuracy for improving data selection.

Description

Data selecting method and device
Technical field
The present invention relates to technical field of data processing, are related specifically to data selecting method and device.
Background technique
Binary data is that the byte stream as 0101... is constituted.Be different from real data (real data it is each A dimension is all the data of float or double type), each dimension of binary data is 0 or 1, occupies 1 bit。
In fields such as extensive content-based image retrieval, music recognition, streetscape identification, face retrieval, face verifications In, all widely use the feature of binary data.Since its calculating speed is fast, occupancy memory space is few, in mass data Today, the advantage of binary data is fairly obvious.Especially in mobile internet era, mobile phone, smartwatch, intelligent glasses Equal embedded devices are more more and more universal, in this operation and all limited calculating equipment of storage capacity, also to carry out in real time When data characteristics is extracted, binary data increasingly constitutes preferred feature.
In massive information retrieving, inverted index is a kind of common data structure.Inverted index structure is by phase The data structure in the same list is mapped to key characteristics.At present in the Chinese of the search engines such as Baidu, Google It is used widely in search service.How to be efficiently indexed to using the feature foundation of the binary data in multi-medium data The present is still a project open, by extensive discussions.
During establishing inverted index, how to select keyword is the key that influence final effect.If keyword Be unevenly distributed, the time difference of difference keyword consumption is very big when retrieval, can not carry out to the response time of system fine Control.Especially when the bit digit of binary data is larger, the lack of uniformity of this distribution is exponentially uneven Weighing apparatus, so that final retrieval time is very uncontrollable.
The method of the binary data keyword selection of mainstream has at present: 1) random screening is several from binary data Bit are used as keyword, 2) select several bit before binary data as keyword, 3) select mean value close to 0.5 Several bit of bit are as modes such as keywords.
The method of above-mentioned 3 kinds of selections keyword, all haves the defects that very big.Random selection selects several bit's in front Mode is all easy to lead to the unbalanced of data distribution, and mean value is selected to only take into account the distribution of single bit for 0.5 several bit, Existing relevance between bit is not accounted for, so be also easy to appear data is unevenly distributed weighing apparatus.And in keyword selection It needs to be selected in the way of exhaustive in calculating process, the process of keyword selection is complicated and not accurate enough, reduces The efficiency and accuracy of keyword selection.
Summary of the invention
The embodiment of the present invention provides a kind of data selecting method and device, it is intended to realize the calculating for reducing data selection Journey, and then effectively improve the efficiency and accuracy of data selection.
The embodiment of the present invention proposes a kind of data selecting method, the data selecting method comprising steps of
A, select M+N data as pending data group from pending data according to preset rules;
B, the degree of correlation of each of pending data group data and other each data is summed to obtain each The total relevance of data;
C, N number of data are determined from the pending data group, the total relevance of identified N number of data is all larger than described In pending data group, the total relevance of other data in addition to identified N number of data;
D, N number of determined by N number of data replacement of the selection in addition to the data selected from the pending data Data are as new pending data group;
E, step B, C and D are continued to execute, until all data in the pending data are selected to finish;
F, N number of data that last time determines are deleted from the pending data group, obtain target data, the target The number of data is M.
The embodiment of the present invention also proposes that a kind of data selection means, the data selection means include:
Selecting module, for selecting M+N data as pending data group from pending data according to preset rules;
Computing module is asked for the degree of correlation to each of the pending data group data and other each data With obtain the total relevance of each data;
Processing module, for determining N number of data, the total correlation of identified N number of data from the pending data group Degree is all larger than in the pending data group, the total relevance of other data in addition to identified N number of data;
Replacement module, for N number of data replacement of the selection in addition to the data selected from the pending data Identified N number of data are as new pending data group;
The processing module is also used to when all data in the pending data are selected to finish, from it is described to It handles and deletes N number of data that last time determines in data group, obtain target data, the number of the target data is M.
The embodiment of the present invention from pending data by first selecting M+N data as pending data group, from described N number of data are determined in pending data group, and selection is N number of in addition to identified N number of data from the pending data Data replace identified N number of data, repetitive operation, until all data are selected to finish in the pending data, delete Identified N number of data in the pending data group, obtain target data, and the number of the target data is M.Realize number According to selection operation, without carrying out exhaustive data selection, and the data for selecting correlation small from pending data.Effectively avoid The technical problem for causing data selection complexity high the selection of exhaustive data, reduces the calculating process of data selection, Jin Eryou Effect improves the efficiency and accuracy of data selection.
Detailed description of the invention
Fig. 1 is the flow chart of the first embodiment of data selecting method of the present invention;
Fig. 2 is the refined flow chart of step S10 in Fig. 1;
Fig. 3 is a refined flow chart of step S20 in Fig. 1;
Fig. 4 is another refined flow chart of step S20 in Fig. 1;
Fig. 5 is the flow chart of the second embodiment of data selecting method of the present invention;
Fig. 6 is the flow chart of the 3rd embodiment of data selecting method of the present invention;
Fig. 7 is the flow chart of the fourth embodiment of data selecting method of the present invention;
Fig. 8 is the functional block diagram of the preferred embodiment of data selection means of the present invention;
Fig. 9 is the hardware structural diagram of terminal where data selection means of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
The present invention proposes a kind of data selecting method.
As shown in FIG. 1, FIG. 1 is the flow charts of the first embodiment of data selecting method of the present invention.What the present embodiment was mentioned Data selecting method the following steps are included:
Step S10 selects M+N data as pending data group according to preset rules from pending data;
When data to be handled input needs to handle, or receives data selection instruction, pending data is obtained.Institute Stating pending data can be binary data, octal data, decimal data or 16 binary datas etc..According to preset rules Select M+N data as pending data group from pending data.The pending data includes X data, and X > > M, M > > N, for example, it may be X=200, M=50, N=3 or X=300, M=100, N=4 etc..It is described according to preset rules from M+N data are selected to may is that at random from the pending data in pending data as the process of pending data group Select M+N data.For example, being randomly choosed by such a way that predetermined number selects a data, described default Number can be 1,2 or 3 etc..Also it may also is that with reference to Fig. 2, Fig. 2 is the refined flow chart of step S10 in Fig. 1.Step S101, The pending data is subjected to mixing;Step S102, the preceding M+N of selected and sorted from the pending data after mixing A data.For example, the sequence of each data is 0,1,2,3,4,5 in the pending data, the sequence of mixing is 2,3,1,5, 0,4.Carrying out function used by mixing can be the Shuffle function in C++, the mixing letter that can also be used using other Number.In order to reduce the utilization of computing resource, N is preferably less than 3.It is understood that N also can also be greater than 3.
Step S20 sums to the degree of correlation of each of pending data group data and other each data To the total relevance of each data;
From selected in the pending data M+N data as pending data group after, be calculated described wait locate The degree of correlation for managing each data and other each data in data group, according to the degree of correlation being calculated, to described to be processed The degree of correlation of each of data group data and other each data sums to obtain the total relevance of each data.For example, institute The data stated in pending data are a, other data are respectively b, c, d.A and b, a and c, a and d, b is then first calculated With c, b and d, the degree of correlation of c and d, then the degree of correlation of a and b, a and c being calculated, a and d are summed to obtain the total relevance of a A1, to a and b, b and c being calculated, the degree of correlation of b and d sum to obtain the total relevance b1 of b, to a and c being calculated, b It sums to obtain the total relevance c1 of c with the degree of correlation of c, c and d, to a and d, b and d being calculated, the degree of correlation of c and d are summed Obtain the total relevance d1 of d.The degree of correlation of other each data can pass through each number in the pending data group It is embodied according to the related coefficient with other data, calculates the formula of related coefficient are as follows: WhereinIndicate expectation, xi, yiRespectively represent each data in the pending data.Obtain the pending data The related coefficient Cij of each data and other data in group, wherein i represents first number in the pending data group According to j represents another data in the pending data group.
Step S30, determines N number of data from the pending data group, and the total relevance of identified N number of data is big In the pending data group, the total relevance of other data in addition to identified N number of data;
It sums to obtain in the degree of correlation to each of pending data group data and other each data each After the total relevance of data, N number of data are determined from the pending data group, the total relevance of identified N number of data is equal Greater than in the pending data group, the total relevance of other data in addition to identified N number of data.For example, it is described to Handling includes tetra- data of a, b, c and d in data group, and wherein M is 3, N 1, and the total relevance of a is a1, and the total relevance of b is The total relevance of b1, c are c1, and the total relevance of d is d1, if a1=0.5, b1=0.8, c1=0.4, d1=0.9, because of the value of d1 Greater than a1, b1 and c1, then d1 is determined from the pending data group.
It is described to determine that the process of N number of data may is that with reference to Fig. 3, Fig. 3 be to walk in Fig. 1 from the pending data group A refined flow chart of rapid S20.Step S201, according to the sequence of total relevance from large to small by the obtained total relevance of summation into Row sequence;Step S201 selects the preceding N number of data of total relevance sequence from the pending data group.Also it may also is that With reference to Fig. 4, Fig. 4 is another refined flow chart of step S20 in Fig. 1.Step S203, according to the sequence of total relevance from small to large The total relevance that summation obtains is ranked up;Step S204 selects total relevance sequence to exist from the pending data group N number of data afterwards.
Step S40, N number of data replacement of the selection in addition to the data selected determines from the pending data N number of data as new pending data group;
In the present embodiment, after determining N number of data, selection is except the number selected from the pending data group N number of data except replace identified N number of data as new pending data group.For example, the pending data packet L data are included, M+N data is selected from the pending data for the first time, is selected from the pending data for the second time N number of data except the M+N are removed, step S20, step S30 and step S40 are continued to execute, until the pending data In all data selected to finish.
Step S50 deletes N number of data that last time determines from the pending data group, obtains target data, institute The number for stating target data is M.
When all data in the pending data are selected to finish, deleted from the pending data group last The N number of data once determined, obtain target data, and the number of the target data is M.
The embodiment of the present invention from pending data by first selecting M+N data as pending data group, from described N number of data are determined in pending data group, and selection is N number of in addition to identified N number of data from the pending data Data replace identified N number of data, repetitive operation, until all data are selected to finish in the pending data, delete The N number of data determined for the last time in the pending data group, obtain target data, and the number of the target data is M.It is real Existing data selection operation, without carrying out exhaustive data selection, and the data for selecting correlation small from pending data.Effectively The technical problem for selecting exhaustive data to cause data selection complexity high is avoided, the calculating process of data selection is reduced, into And effectively increase the efficiency and accuracy of data selection.
Further, the first embodiment based on above-mentioned data selecting method proposes the of data selecting method of the present invention Two embodiments.As shown in figure 5, the data selecting method further comprises the steps of: when the pending data is not the multiple of N
Step S60, judges whether the number of non-selected data in the pending data is less than N.Described wait locate When managing the number of non-selected data in data less than N, following step S70 is executed;It is not chosen in the pending data When the number for the data selected is greater than or equal to N, step S40 is executed.
Step S70 selects N number of data determined by the non-selected data replacement to make from the pending data For new pending data group;
Step S80, determines S data from the new pending data group, and the total relevance of the S data is big In the pending data group, the total relevance of other data in addition to the S data;
Step S90 deletes the S data from the new pending data group, obtains target data, the target The number of data is M.
When the number of the pending data is the multiple of N, the process of the first embodiment of above-mentioned data selection is executed Select target data.It is first real according to the first of the selection of above-mentioned data when the number of the pending data is not the multiple of N The step S10, S20, S30 and step S40 that apply example carry out the selection of the pending data, are in the even described pending data The number of the data selected is more than or equal to N, thens follow the steps S40.It is small in the number S of the non-selected data of last time When N, according to the step S70 in the present embodiment, step S80 and step S90 select target data.The even described number to be processed The number of non-selected data is less than N in, thens follow the steps S70, step S80 and step S90.Wait locate described in the determination The number S of non-selected data may is that non-selected in the acquisition pending data less than the process of N in reason data The number S of data, S and N are compared, to obtain the result whether S is less than N.The embodiment of the present invention passes through in the number to be processed According to number be not the multiple of N when, last time carry out data replacement when, by number non-selected in the pending data According to identified N number of data are replaced, the efficient selection for carrying out data is realized.
Further, the first embodiment based on above-mentioned data selecting method proposes the of data selecting method of the present invention Three embodiments.As shown in fig. 6, the data selecting method further comprises the steps of:
The sum of the total relevance of each data in each target data is calculated in step S100;
Step S10, S20, S30, S40, S50 and S60 are repeated, the number repeated is P times, obtains P target Data, because the M+N data selected every time are different, therefore the target data obtained after repeating every time is different, and repetition is held Obtain P target data row P times.Number of repetition P, the expection and system that the setting of P selects data according to user are set in advance Performance is configured, and the number of repetition of each pending data can be different, and default P is 0 time.In user without repeating When the setting of number, without repeating, a target data, as final target data are obtained.
After obtaining P target data, the total relevance of each data in each target data is calculated, and right The total relevance of each data in each target data is summed.
Step S110 determines the smallest target data of a sum of total relevance from the P target data, will be described The smallest target data of the sum of total relevance is as final target data.
The smallest target data of a sum of total relevance is determined from the P target data, for example, P is 3, respectively For M1, M2 and M3, the sum of the total relevance of target data M1 is m1, and the sum of total relevance of target data M2 is m2, number of targets The sum of total relevance according to M3 is m3.If m1=2.3, m2=2.7, m3=2.1, then the sum of total relevance of target data M3 is most Small, using target data M3 as final target data, i.e. target data M3 is the expected obtained data of user, complete it is described to Handle the selection of data.The embodiment of the present invention obtains multiple targets by repeatedly carrying out selection processing to the pending data Data, and select from multiple target datas the smallest target data of the sum of total relevance as final target data, So that the selection of target data is more accurate.
Further, the 3rd embodiment based on above-mentioned data selecting method proposes the of data selecting method of the present invention Four embodiments.As shown in fig. 7, before the step S20, further includes:
Step S120 precalculates to obtain and save each data and other each data in the pending data The degree of correlation.
In the present embodiment, need to handle in data to be handled, or when receiving data selection instruction, obtain to After handling data, precalculate to obtain the degree of correlation of each data and other each data in the pending data.Meter Degree of correlation table corresponding with the obtained degree of correlation can be generated in the obtained degree of correlation.In the pending data is calculated It is described to each of pending data group data and its after the degree of correlation of each data and other each data The process that the degree of correlation of his each data sums to obtain the total relevance of each data may is that with reference to Fig. 7.Step S20 includes: Step S205 determines each data and other each data in the pending data group according to the degree of correlation being calculated The degree of correlation;Step S206, it is each with other to each of pending data group data according to the determining degree of correlation The degree of correlation of data sums to obtain the total relevance of each data.I.e. by first calculating and saving each in the pending data The degree of correlation of a data and other each data from the degree of correlation of preservation, selects corresponding data after selecting M+N data The degree of correlation degree of correlation of each of pending data group data and other each data is summed to obtain each number According to total relevance operation.The embodiment of the present invention is by precalculating to obtain in the pending data each data and its He selects from the degree of correlation saved in advance each degree of correlation when needing the degree of correlation with each data and other data It selects the corresponding degree of correlation and carries out sum operation, it is not necessary to carry out the calculating and sum operation of Data mutuality degree every time, save data The process of selection, and then the efficiency of data selection is improved, while saving the computing resource of system, improve the property of system Energy.
The executing subject of the data selecting method of first to fourth embodiment of above-mentioned data selecting method all can be whole End.Further, this method can be by installing client (such as data select software) realization at the terminal, wherein should Terminal can include but is not limited to laptop, mobile phone, tablet computer or PDA (Personal Digital Assistant, personal digital assistant) etc. electronic equipments.
Further, the preferred embodiment of data selection means of the present invention is proposed.As shown in figure 8, the data selection dress Setting includes: selecting module 10, computing module 20, processing module 30 and replacement module 40, wherein the selecting module 10 includes mixed Arrange unit 11 and selecting unit 12.
Selecting module 10, for selecting M+N data as pending data from pending data according to preset rules Group;
When data to be handled input needs to handle, or receives data selection instruction, pending data is obtained.It presses Select M+N data as pending data group from pending data according to preset rules.The pending data includes X number According to, and X > > M, M > > N, for example, it may be X=200, M=50, N=3 or X=300, M=100, N=4 etc..It is described according to Preset rules selected from pending data M+N data as the process of pending data group may is that at random from it is described to It handles and selects M+N data in data.For example, randomly choosed by such a way that predetermined number selects a data, The predetermined number can be 1,2 or 3 etc..
The mixing unit 11, for the pending data to be carried out mixing;
The selecting unit 12, for the preceding M+N data of selected and sorted from the pending data after mixing. For example, the sequence of each data is 0,1,2,3,4,5 in the pending data, the sequence of mixing is 2,3,1,5,0,4.It carries out Function used by mixing can be the Shuffle function in C++, the mixing function that can also be used using other.In order to drop The utilization of low computing resource, N are preferably less than 3.It is understood that N also can also be greater than 3.
Computing module 20, for the degree of correlation to each of the pending data group data and other each data Summation obtains the total relevance of each data;
From selected in the pending data M+N data as pending data group after, be calculated described wait locate The degree of correlation for managing each data and other each data in data group, according to the degree of correlation being calculated, to described to be processed The degree of correlation of each of data group data and other each data sums to obtain the total relevance of each data.For example, institute The data stated in pending data are a, other data are respectively b, c, d.A and b, a and c, a and d, b is then first calculated With c, b and d, the degree of correlation of c and d, then the degree of correlation of a and b, a and c being calculated, a and d are summed to obtain the total relevance of a A1, to a and b, b and c being calculated, the degree of correlation of b and d sum to obtain the total relevance b1 of b, to a and c being calculated, b It sums to obtain the total relevance c1 of c with the degree of correlation of c, c and d, to a and d, b and d being calculated, the degree of correlation of c and d are summed Obtain the total relevance d1 of d.The degree of correlation of other each data can pass through each number in the pending data group It is embodied according to the related coefficient with other data, calculates the formula of related coefficient are as follows: WhereinIndicate expectation, xi, yiRespectively represent each data in the pending data.Obtain the pending data The related coefficient Cij of each data and other data in group, wherein i represents first number in the pending data group According to j represents another data in the pending data group.
The processing module 30, for determining N number of data, total phase of N number of data from the pending data group Guan Du is all larger than in the pending data group, the total relevance of other data in addition to N number of data;
It sums to obtain in the degree of correlation to each of pending data group data and other each data each After the total relevance of data, N number of data are determined from the pending data group, the total relevance of N number of data is all larger than In the pending data group, the total relevance of other data in addition to N number of data.For example, the pending data It include tetra- data of a, b, c and d in group, wherein M is 3, N 1, and the total relevance of a is a1, and the total relevance of b is b1, and c's is total The degree of correlation is c1, and the total relevance of d is d1, if a1=0.5, b1=0.8, c1=0.4, d1=0.9, because the value of d1 be greater than a1, B1 and c1 then determines d1 from the pending data group.
It is described to determine that the process of N number of data may is that according to total relevance from large to small from the pending data group Sequence the obtained total relevance of summation is ranked up;Select total relevance sequence preceding from the pending data group N number of data.Also it may also is that being ranked up the total relevance that summation obtains according to the sequence of total relevance from large to small;From The preceding N number of data of total relevance sequence are selected in the pending data group.
The replacement module 40, for N number of number of the selection in addition to the data selected from the pending data According to replacement N number of data as new pending data group;
The processing module 30, when being also used to all data in the pending data and being selected to finish, from described N number of data that last time determines are deleted in pending data group, obtain target data, and the number of the target data is M.
In the present embodiment, after determining N number of data, selection is except the number selected from the pending data group N number of data except replace identified N number of data as new pending data group.For example, the pending data packet L data are included, M+N data is selected from the pending data for the first time, is selected from the pending data for the second time N number of data except the M+N are removed, and so on, until all data in the pending data are selected to finish.From N number of data that last time determines are deleted in the pending data group, obtain target data, the number of the target data is M。
The embodiment of the present invention from pending data by first selecting M+N data as pending data group, from described N number of data are determined in pending data group, and N number of data of the selection in addition to N number of data from the pending data N number of data determined by replacing, repetitive operation, until all data are selected to finish in the pending data, described in deletion N number of data that last time in pending data group determines, obtain target data, and the number of the target data is M.It realizes Data selection operation, without carrying out exhaustive data selection, and the data for selecting correlation small from pending data.Effectively keep away Exempt from the technical problem for selecting exhaustive data to cause data selection complexity high, reduces the calculating process of data selection, in turn Effectively increase the efficiency and accuracy of data selection.
Further, when the pending data is not the multiple of N, the processing module 30 is also used to judge described Whether the number of non-selected data is less than N in pending data.
The replacement module 40 is also used to when the number of data non-selected in the pending data is less than N, from The non-selected data are selected to replace identified N number of data as new pending data in the pending data Group;
The processing module 30 is also used to determine S data, the S data from the new pending data group Total relevance be all larger than in the pending data group, the total relevance of other data in addition to the S data;From institute It states and deletes the S data in new pending data group, obtain target data, the number of the target data is M.
When the number of the pending data is the multiple of N, by computing module 20 in the pending data group Each data and the degrees of correlation of other each data sum to obtain the total relevance of each data;Processing module 30 is from described N number of data are determined in pending data group, the total relevance of identified N number of data is all larger than in the pending data group, The total relevance of other data in addition to identified N number of data;Replacement module 40 selects to remove from the pending data N number of data except the data selected replace identified N number of data as new pending data group;By handling mould Block 30 deletes last from the pending data group when all data in the pending data are selected to finish N number of data of secondary determination, obtain target data, and the number of the target data is M.
When the number of the pending data is not the multiple of N, the non-selected data in the pending data Number when being greater than, by computing module 20 to the phase of each of the pending data group data and other each data Guan Du sums to obtain the total relevance of each data;Processing module 30 determines N number of data from the pending data group, and institute is really The total relevance of fixed N number of data is all larger than in the pending data group, other data in addition to identified N number of data Total relevance;Replacement module 40 selects N number of data in addition to the data selected to replace from the pending data Identified N number of data are as new pending data group;When the number S of the non-selected data of last time is less than N, replacement Module 40 selects the non-selected data to replace the maximum N number of data work of the total relevance from the pending data For new pending data group;Processing module 30 determines S data, the S data from the new pending data group Total relevance be all larger than in the pending data group, the total relevance of other data in addition to the S data;From institute It states and deletes the S data in new pending data group, obtain target data.
The processing module 30 determines that process of the number S less than N of non-selected data in the pending data can To be: obtaining the number S of non-selected data in the pending data, S and N are compared, to obtain whether S is less than N's As a result.The embodiment of the present invention is by carrying out data in last time and replacing when the number of the pending data is not the multiple of N When changing, data non-selected in the pending data are replaced into identified N number of data, realize efficient progress data Selection.
Further, when obtaining P mesh by selecting module 10, computing module 20, processing module 30 and replacement module 40 When marking data, the computing module 20 is also used to when obtaining P target data, is calculated in each target data each The sum of the total relevance of a data;
P target data is obtained by selecting module 10, computing module 20, processing module 30 and replacement module 40, because every M+N data of secondary selection are different, therefore the target data obtained every time is different.The quantity of P is set in advance, the setting of P according to The expection and system performance that user selects data are configured, the number of the target data of the acquisition of each pending data Can be different, default P is 0 time.In setting of the user without the numerical value of P, a target data is obtained, it is as final Target data.
After obtaining P target data, the total relevance of each data in each target data is calculated, and right The total relevances of each data in each target data is summed, that is, respectively obtain each target data total relevance it With.
The processing module 30 is also used to determine the smallest mesh of a sum of total relevance from the P target data Data are marked, using the smallest target data of the sum of the total relevance as final target data.
The smallest target data of a sum of total relevance is determined from the P target data, for example, P is 3, respectively For M1, M2 and M3, the sum of the total relevance of target data M1 is m1, and the sum of total relevance of target data M2 is m2, number of targets The sum of total relevance according to M3 is m3.If m1=2.3, m2=2.7, m3=2.1, then the sum of total relevance of target data M3 is most Small, using target data M3 as final target data, i.e. target data M3 is the expected obtained data of user, complete it is described to Handle the selection of data.The embodiment of the present invention obtains multiple targets by repeatedly carrying out selection processing to the pending data Data, and select from multiple target datas the smallest target data of the sum of total relevance as final target data, So that the selection of target data is more accurate.
Further, the computing module 20 is also used to precalculate to obtain and save each in the pending data The degree of correlation of a data and other each data.
In the present embodiment, need to handle in data to be handled, or when receiving data selection instruction, obtain to After handling data, precalculate to obtain the degree of correlation of each data and other each data in the pending data.Meter Degree of correlation table corresponding with the obtained degree of correlation can be generated in the obtained degree of correlation.In the pending data is calculated It is described to each of pending data group data and its after the degree of correlation of each data and other each data The process that the degree of correlation of his each data sums to obtain the total relevance of each data may is that according to the correlation being calculated Degree, determines the degree of correlation of each data and other each data in the pending data group;It is right according to the determining degree of correlation The degree of correlation of each of pending data group data and other each data sums to obtain the total correlation of each data Degree.I.e. by first calculating and saving the degree of correlation of each data and other each data in the pending data, in selection M After+N number of data, from the degree of correlation of preservation, select the degree of correlation of corresponding data to each of described pending data group The degree of correlation of data and other each data sums to obtain the operation of the total relevance of each data.The embodiment of the present invention passes through pre- Each data and other each degrees of correlation in the pending data are first calculated, are needing with each data and its When the degree of correlation of his data, the corresponding degree of correlation is selected to carry out sum operation from the degree of correlation saved in advance, it is not necessary to every time The calculating and sum operation of Data mutuality degree are carried out, the process of data selection is saved, and then improves the efficiency of data selection, together When save the computing resource of system, improve the performance of system.
As shown in figure 9, Fig. 9 is total line chart of terminal where data selection means in the embodiment of the present invention.The terminal can be with It include: at least one processor 301, such as CPU, at least one network interface 304, user interface 303, memory 305, at least One communication bus 302.Wherein, communication bus 302 is for realizing the connection communication between these components.Wherein, user interface 303 may include display screen (Display), keyboard (Keyboard), can also include standard wireline interface and wireless interface. Network interface 304 may include standard wireline interface and wireless interface (such as radio network interface).Memory 305 can be height Fast RAM memory is also possible to non-labile memory (non-volatile memory), and a for example, at least disk is deposited Reservoir.Memory 305 can also be that at least one is located remotely from the storage device of aforementioned processor 301.As a kind of computer It may include that operating system, network communication module, Subscriber Interface Module SIM and audio play control in the memory 305 of storage medium The program of system.
In the terminal, network interface 304 is mainly used for connecting server, with clothes for data selection means institute shown in Fig. 9 Business device carries out data communication;And user interface 303 is mainly used for receiving user instructions, and interacts with user;And processor 301 programs that can be used for that the data stored in memory 305 is called to select, and execute following operation:
It selects to request by 303 detecting data of user interface;When user interface 303 detects data selection request, A, Select M+N data as pending data group from pending data according to preset rules;B, to the pending data group Each of the degrees of correlation of data and other each data sum to obtain the total relevance of each data;C, from described to be processed N number of data are determined in data group, the total relevance of identified N number of data is all larger than in the pending data group, really except institute The total relevance of other data except fixed N number of data;D, selection removes the data selected from the pending data Except N number of data replace determined by N number of data as new pending data group;E, step B, C and D are continued to execute, directly It is selected to finish to all data in the pending data;F, last time is deleted from the pending data group to determine N number of data, obtain target data, the number of the target data is M.
In one embodiment, when the number of the pending data is not the multiple of N, processor 301 calls storage Following operation can also be performed in the program of the data selection stored in device 305:
When the number S of data non-selected in the pending data is less than N, selected from the pending data The non-selected data replace the maximum N number of data of the total relevance as new pending data group;
From the new pending data group determine S data, the total relevance of the S data be all larger than described in It handles in data group, the total relevance of other data in addition to the S data;
The S data are deleted from the new pending data group, obtain target data.
In one embodiment, processor 301 calls the program of the data stored in memory 305 selection can also be performed It operates below:
Step A, B, C, D, E and F are repeated, the number repeated is P times, obtains P target data;
The sum of the total relevance of each data in each target data is calculated;
The smallest target data of a sum of total relevance is determined from the P target data, by the total relevance The sum of the smallest target data as final target data.
In one embodiment, processor 301 calls the program of the data stored in memory 305 selection can also be performed It operates below:
The pending data is subjected to mixing;
The preceding M+N data of selected and sorted from the pending data after mixing.
In one embodiment, processor 301 calls the program of the data stored in memory 305 selection can also be performed It operates below:
Precalculate to obtain and save the degree of correlation of each data and other each data in the pending data.
In one embodiment, processor 301 calls the program of the data stored in memory 305 selection can also be performed It operates below:
Precalculate to obtain and save each data and other each degrees of correlation in the pending data.
In one embodiment, processor 301 calls the program of the data stored in memory 305 selection can also be performed It operates below:
The total relevance that summation obtains is ranked up according to the sequence of total relevance from large to small;
The preceding N number of data of total relevance sequence are selected from the pending data group.
In one embodiment, processor 301 calls the program of the data stored in memory 305 selection can also be performed It operates below:
The total relevance that summation obtains is ranked up according to the sequence of total relevance from small to large;
The posterior N number of data of total relevance sequence are selected from the pending data group.
Terminal where data selection means described in the present embodiment Fig. 9 selects M+N by elder generation from pending data Data determine N number of data as pending data group from the pending data group, and select from the pending data N number of data in addition to N number of data replace N number of data, repetitive operation, until all numbers in the pending data It is finished according to by selection, deletes N number of data in the pending data group, obtain target data, realize data selection behaviour Make, without carrying out exhaustive data selection.Effectively avoid the technology for selecting exhaustive data to cause data selection complexity high Problem, reduces the calculating process of data selection, and then effectively increases the efficiency of data selection.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or device.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, computer, clothes Business device or the network equipment etc.) execute method described in each embodiment of the present invention.
The above description is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all utilizations Equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content is applied directly or indirectly in other correlations Technical field, be included within the scope of the present invention.

Claims (16)

1. a kind of data selecting method, which is characterized in that the data selecting method comprising steps of
A, select M+N data as pending data group from pending data according to preset rules;
B, the degree of correlation of each of pending data group data and other each data is summed to obtain each data Total relevance;
C, N number of data are determined from the pending data group, the total relevance of identified N number of data is all larger than described wait locate It manages in data group, the total relevance of other data in addition to identified N number of data;
D, N number of data of the selection in addition to the data selected replace identified N number of data from the pending data As new pending data group;
E, step B, C and D are continued to execute, until all data in the pending data are selected to finish;
F, N number of data that last time determines are deleted from the pending data group, obtain target data, the target data Number be M.
2. data selecting method as described in claim 1, which is characterized in that when the number of the pending data is not N When multiple, the data selecting method is further comprised the steps of:
When the number S of data non-selected in the pending data is less than N, from the pending data described in selection Non-selected data replace identified N number of data as new pending data group;
S data are determined from the new pending data group, the total relevance of the S data is all larger than described to be processed In data group, the total relevance of other data in addition to the S data;
The S data are deleted from the new pending data group, obtain target data, the number of the target data is M。
3. data selecting method as described in claim 1, which is characterized in that the data selecting method further comprises the steps of:
Step A, B, C, D, E and F are repeated, the number repeated is P times, obtains P target data;
The sum of the total relevance of each data in each target data is calculated;
The smallest target data of a sum of total relevance is determined from the P target data, by the sum of described total relevance The smallest target data is as final target data.
4. data selecting method as described in any one of claims 1 to 3, which is characterized in that it is described according to preset rules to M+N data are selected to include: as the step of pending data group in processing data
The pending data is subjected to mixing;
The preceding M+N data of selected and sorted from the pending data after mixing.
5. data selecting method as described in any one of claims 1 to 3, which is characterized in that described to the pending data Before the degree of correlation of each of group data and other each data sums the step of obtaining the total relevance of each data, also Include:
Precalculate to obtain and save the degree of correlation of each data and other each data in the pending data.
6. data selecting method as claimed in claim 5, which is characterized in that described to each in the pending data group The degree of correlation of a data and other each data sums the step of obtaining the total relevance of each data and includes:
According to the degree of correlation being calculated, determine that each data is related to other each data in the pending data group Degree;
According to the determining degree of correlation, the degree of correlation of each of the pending data group data and other each data is asked With obtain the total relevance of each data.
7. data selecting method as described in any one of claims 1 to 3, which is characterized in that described from the pending data It is determined in group and includes: the step of N number of data
The total relevance that summation obtains is ranked up according to the sequence of total relevance from large to small;
The preceding N number of data of total relevance sequence are selected from the pending data group.
8. data selecting method as described in any one of claims 1 to 3, which is characterized in that described from the pending data It is determined in group and includes: the step of N number of data
The total relevance that summation obtains is ranked up according to the sequence of total relevance from small to large;
The posterior N number of data of total relevance sequence are selected from the pending data group.
9. a kind of data selection means, which is characterized in that the data selection means include:
Selecting module, for selecting M+N data as pending data group from pending data according to preset rules;
Computing module is summed for the degree of correlation to each of pending data group data and other each data To the total relevance of each data;
Processing module, for determining N number of data from the pending data group, the total relevance of identified N number of data is equal Greater than in the pending data group, the total relevance of other data in addition to identified N number of data;
Replacement module, for selecting N number of data replacement institute in addition to the data selected from the pending data really Fixed N number of data are as new pending data group;
The processing module is also used to continue by the computing module to each of pending data group data and its The degree of correlation of his each data sums to obtain the total relevance of each data;The processing module is from the pending data group Determine N number of data, the total relevance of identified N number of data is all larger than in the pending data group, removes identified N number of number The total relevance of other data except;The replacement module is selected from the pending data except the data selected Except N number of data replace determined by N number of data as new pending data group, until the institute in the pending data When there are data to be selected to finish, N number of data that last time determines are deleted from the pending data group, obtain number of targets According to the number of the target data is M.
10. data selection means as claimed in claim 9, which is characterized in that when the number of the pending data is not N When multiple, the replacement module is also used to when the number S of data non-selected in the pending data is less than N, from institute It states and the non-selected data is selected to replace the maximum N number of data of the total relevance as new wait locate in pending data Manage data group;
The processing module is also used to determine S data, total phase of the S data from the new pending data group Guan Du is all larger than in the pending data group, the total relevance of other data in addition to the S data;From described new The S data are deleted in pending data group, obtain target data.
11. data selection means as claimed in claim 9, which is characterized in that when obtaining P target data, the calculating Module is also used to be calculated the sum of the total relevance of each data in each target data;
The processing module is also used to from the P target data the smallest target data of the sum of a determining total relevance, Using the smallest target data of the sum of the total relevance as final target data.
12. such as the described in any item data selection means of claim 9 to 11, which is characterized in that the selecting module includes mixed Unit and acquiring unit are arranged,
The mixing unit, for the pending data to be carried out mixing;
The acquiring unit, for the preceding M+N data of selected and sorted from the pending data after mixing.
13. such as the described in any item data selection means of claim 9 to 11, which is characterized in that the computing module is also used to Precalculate to obtain and save the degree of correlation of each data and other each data in the pending data.
14. such as the described in any item data selection means of claim 13, which is characterized in that the computing module is also used to root According to the degree of correlation being calculated, the degree of correlation of each data and other each data in the pending data group is determined;Root According to the determining degree of correlation, the degree of correlation of each of pending data group data and other each data is summed to obtain The total relevance of each data.
15. such as the described in any item data selection means of claim 9 to 11, which is characterized in that the processing module is also used to The total relevance that summation obtains is ranked up according to the sequence of total relevance from large to small;It is selected from the pending data group Select the preceding N number of data of total relevance sequence.
16. such as the described in any item data selection means of claim 9 to 11, which is characterized in that the processing module is also used to The total relevance that summation obtains is ranked up according to the sequence of total relevance from small to large;It is selected from the pending data group Select the posterior N number of data of total relevance sequence.
CN201410419106.5A 2014-08-22 2014-08-22 Data selecting method and device Active CN105468603B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410419106.5A CN105468603B (en) 2014-08-22 2014-08-22 Data selecting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410419106.5A CN105468603B (en) 2014-08-22 2014-08-22 Data selecting method and device

Publications (2)

Publication Number Publication Date
CN105468603A CN105468603A (en) 2016-04-06
CN105468603B true CN105468603B (en) 2019-04-02

Family

ID=55606315

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410419106.5A Active CN105468603B (en) 2014-08-22 2014-08-22 Data selecting method and device

Country Status (1)

Country Link
CN (1) CN105468603B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529450A (en) * 2016-11-03 2017-03-22 珠海格力电器股份有限公司 Emoticon picture generating method and device
CN109542927B (en) * 2018-10-24 2021-09-28 南京邮电大学 Effective data screening method, readable storage medium and terminal

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09223058A (en) * 1996-02-14 1997-08-26 Nec Corp System for managing file space for bit map setting index part and data part as independent files
CN101884043A (en) * 2007-12-05 2010-11-10 新叶股份有限公司 Bit string merge sort device, method, and program
CN101911060A (en) * 2007-12-28 2010-12-08 新叶股份有限公司 Database index key update method and program
CN102763105A (en) * 2010-02-23 2012-10-31 诺基亚公司 Method and apparatus for segmenting and summarizing media content
CN104050242A (en) * 2014-05-27 2014-09-17 哈尔滨理工大学 Feature selection and classification method based on maximum information coefficient and feature selection and classification device based on maximum information coefficient

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09223058A (en) * 1996-02-14 1997-08-26 Nec Corp System for managing file space for bit map setting index part and data part as independent files
CN101884043A (en) * 2007-12-05 2010-11-10 新叶股份有限公司 Bit string merge sort device, method, and program
CN101911060A (en) * 2007-12-28 2010-12-08 新叶股份有限公司 Database index key update method and program
CN102763105A (en) * 2010-02-23 2012-10-31 诺基亚公司 Method and apparatus for segmenting and summarizing media content
CN104050242A (en) * 2014-05-27 2014-09-17 哈尔滨理工大学 Feature selection and classification method based on maximum information coefficient and feature selection and classification device based on maximum information coefficient

Also Published As

Publication number Publication date
CN105468603A (en) 2016-04-06

Similar Documents

Publication Publication Date Title
US8171228B2 (en) Garbage collection in a cache with reduced complexity
CN108108384B (en) Data storage method and device
CN103970870A (en) Database query method and server
CN101556678A (en) Processing method of batch processing services, system and service processing control equipment
US11531831B2 (en) Managing machine learning features
US20160337442A1 (en) Scheduled network communication for efficient re-partitioning of data
US8161485B2 (en) Scheduling jobs in a plurality of queues and dividing jobs into high and normal priority and calculating a queue selection reference value
JP2019511773A (en) Service parameter selection method and related device
CN109614510B (en) Image retrieval method, image retrieval device, image processor and storage medium
CN107784195A (en) Data processing method and device
CN105468603B (en) Data selecting method and device
CN109542612A (en) A kind of hot spot keyword acquisition methods, device and server
CN112445776B (en) Presto-based dynamic barrel dividing method, system, equipment and readable storage medium
CN107871055A (en) A kind of data analysing method and device
CN113010315A (en) Resource allocation method, resource allocation device and computer-readable storage medium
CN110222046B (en) List data processing method, device, server and storage medium
CN110909072B (en) Data table establishment method, device and equipment
CN108763381A (en) Divide table method and apparatus based on consistency hash algorithm
CN112764935B (en) Big data processing method and device, electronic equipment and storage medium
CN104636474A (en) Method and equipment for establishment of audio fingerprint database and method and equipment for retrieval of audio fingerprints
CN110688223B (en) Data processing method and related product
CN111782688A (en) Request processing method, device and equipment based on big data analysis and storage medium
US8015207B2 (en) Method and apparatus for unstructured data mining and distributed processing
CN103731500B (en) Data batch insertion method based on Bigtable storage system
CN112631752B (en) List operation method and device based on operation priority

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20190802

Address after: Shenzhen Futian District City, Guangdong province 518044 Zhenxing Road, SEG Science Park 2 East Room 403

Co-patentee after: Tencent cloud computing (Beijing) limited liability company

Patentee after: Tencent Technology (Shenzhen) Co., Ltd.

Address before: Shenzhen Futian District City, Guangdong province 518000 Zhenxing Road, SEG Science Park 2 East Room 403

Patentee before: Tencent Technology (Shenzhen) Co., Ltd.