CN107766486B

CN107766486B - Method, device, readable medium and storage controller for randomly extracting sample data

Info

Publication number: CN107766486B
Application number: CN201710959595.7A
Authority: CN
Inventors: 邵辉; 曹雪韬; 王宏达; 崔冲冲
Original assignee: Inspur General Software Co Ltd
Current assignee: Inspur General Software Co Ltd
Priority date: 2017-10-16
Filing date: 2017-10-16
Publication date: 2021-04-20
Anticipated expiration: 2037-10-16
Also published as: CN107766486A

Abstract

The invention provides a method, a device, a readable medium and a storage controller for randomly extracting sample data, wherein the method comprises the following steps: a0: arranging all sample data of the sample data set into a sequence queue, and determining the extraction quantity; a1: generating a random number corresponding to current sample data at the head of the sequence queue; a2: detecting whether the random number is less than the extraction number, if so, executing A3; otherwise, a4 is executed; a3, taking out the current sample data at the head of the queue as reference sample data, and executing A5; a4: placing the current sample data at the head of the queue at the tail of the sequence queue, and executing A1; a5: and detecting the current number of each extracted reference sample data, and executing A1 when the current number is less than the extraction number. By the technical scheme of the invention, the sample data with the corresponding quantity can be more accurately and randomly extracted from the sample data set.

Description

Method, device, readable medium and storage controller for randomly extracting sample data

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for randomly extracting sample data, a readable medium, and a storage controller.

Background

The application scene of randomly extracting the sample data is very wide. Specifically, when the sample data set is large, a small amount of sample data can be randomly extracted from a large amount of sample data in the sample data set for analysis so as to realize a corresponding service.

At present, when m sample data are randomly extracted from n sample data of a sample data set, n sample data of the sample data set may be arranged into a sequence queue, then m positive integers smaller than n are generated according to actual requirements, and then m sample data with the same sequence position as each positive integer in the sequence queue are taken out, that is, m sample data are randomly extracted from n sample data of the sample data set.

In the above technical solution, the sample data in the sample data set may not be accurately randomly extracted due to the too large or too small amount of sample data in the sample data set.

Disclosure of Invention

The embodiment of the invention provides a method, a device, a readable medium and a storage controller for randomly extracting sample data, which can more accurately randomly extract a corresponding amount of sample data from a sample data set.

In a first aspect, the present invention provides a method for randomly extracting sample data, including:

a0: arranging all sample data of the sample data set into a sequence queue, and determining the extraction quantity;

a1: generating a random number corresponding to current sample data at the head of the sequence queue;

a2: detecting whether the random number is less than the extraction number, if so, executing A3; otherwise, a4 is executed;

a3, taking out the current sample data at the head of the queue as reference sample data, and executing A5;

a4: placing the current sample data at the head of the queue at the tail of the sequence queue, and executing A1;

a5: and detecting the current number of each extracted reference sample data, and executing A1 when the current number is less than the extraction number.

Preferably, the first and second electrodes are formed of a metal,

further comprising: presetting at least two weight tables, wherein each weight table corresponds to a weight coefficient and at least one piece of characteristic information;

after the a3, further comprising:

analyzing the reference sample data to determine current characteristic information carried in the reference sample data;

and storing the reference sample data to a target weight table in the at least two weight tables, wherein at least one piece of target characteristic information corresponding to the target weight table comprises the current characteristic information.

Preferably, the first and second electrodes are formed of a metal,

further comprising:

when the current number is not less than the extraction number, determining the screening number respectively corresponding to each weight table according to the storage number of the reference sample data respectively stored in each weight table and the weight coefficient respectively corresponding to each weight table;

and extracting target sample data with the target screening quantity corresponding to the weight table from each reference sample data stored in the weight table aiming at each weight table.

In a second aspect, an embodiment of the present invention provides an apparatus for randomly extracting sample data, including:

the device comprises a preprocessing module, a random number management module, an extraction management module, a queue management module and a detection module; wherein,

the preprocessing module is used for arranging all sample data of the sample data set into a sequence queue and determining the number of the samples to be extracted;

the random number management module is used for generating a random number corresponding to the current sample data at the head of the queue in the sequence queue, detecting whether the random number is smaller than the extraction quantity, and if so, triggering the extraction management module; otherwise, triggering the queue management module;

the extraction management module is used for taking out the current sample data at the head of the queue as reference sample data under the triggering of the random number management module and triggering the detection module;

the queue management module is used for placing the current sample data at the head of the queue at the tail of the sequence queue under the triggering of the random number management module and triggering the random number management module;

the detection module is configured to detect the current number of each of the reference sample data taken out under the trigger of the extraction management module, and trigger the random number management module when the current number is smaller than the extraction number.

Preferably, the first and second electrodes are formed of a metal,

further comprising: the device comprises a setting module, an analysis module and a storage processing module; wherein,

the setting module is used for presetting at least two weight tables, and each weight table corresponds to a weight coefficient and at least one piece of characteristic information respectively;

the analysis module is used for analyzing the reference sample data to determine current characteristic information carried in the reference sample data;

the storage processing module is configured to store the reference sample data to a target weight table of the at least two weight tables, where at least one piece of target feature information corresponding to the target weight table includes the current feature information.

Preferably, the first and second electrodes are formed of a metal,

further comprising: the device comprises a quantity determining module and a screening and extracting module; wherein,

the number determining module is configured to determine, when the current number is not less than the extracted number, a filtering number corresponding to each weight table according to a storage number of reference sample data stored in each weight table and a weight coefficient corresponding to each weight table;

and the screening and extracting module is used for extracting target screening quantity target sample data corresponding to the weight tables from each reference sample data stored in the weight tables aiming at each weight table.

In a third aspect, an embodiment of the present invention provides a readable medium, which is characterized by including an execution instruction, and when a processor of a storage controller executes the execution instruction, the storage controller executes the method according to any one of the first aspect.

In a fourth aspect, an embodiment of the present invention provides a storage controller, including: a processor, a memory, and a bus;

the processor and the memory are connected through the bus;

the memory, when the storage controller is running, the processor executes the execution instructions stored by the memory to cause the storage controller to perform the method of any one of the first aspect.

The embodiment of the invention provides a method, a device, a readable medium and a storage controller for randomly extracting sample data, wherein in the method, each sample data of a sample data set is arranged into a sequence queue, the extraction quantity of the sample data needing to be extracted is determined, then a random number corresponding to the current sample data at the head of the queue in the sequence queue can be generated aiming at the formed sequence queue, whether the random number is less than the extraction quantity or not is detected, when the random number is not less than the extraction quantity, the current sample data at the head of the queue in the sequence queue does not accord with the extraction condition in the extraction process, the current sample data can be placed at the tail of the sequence queue, otherwise, when the random number is less than the extraction quantity, the current sample data at the head of the queue can be taken out as target sample data, the extraction process is circularly executed aiming at the current sample data at the head of the queue in the sequence queue, until the number of the extracted reference sample data reaches the determined extraction number, the sample data with the corresponding number is extracted from each sample data of the sample data set. In summary, according to the technical scheme provided by the embodiment of the invention, when the corresponding amount of sample data is randomly extracted from the sample data set, whether the condition that the sample data can be extracted as reference sample data is not directly related to the amount of the sample data in the sample data set is determined, and the corresponding amount of sample data can be more accurately randomly extracted from the sample data set.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of a method for randomly extracting sample data according to an embodiment of the present invention;

FIG. 2 is a flow chart of another method for randomly sampling sample data according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an apparatus for randomly extracting sample data according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of another apparatus for randomly extracting sample data according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.

As shown in fig. 1, an embodiment of the present invention provides a method for randomly extracting sample data, including:

In the above embodiment of the present invention, by arranging each sample data of the sample data set into a sequential queue, and determining the number of samples to be extracted, then generating a random number corresponding to the current sample data at the head of the queue in the sequential queue for the formed sequential queue, and detecting whether the random number is smaller than the number of extractions, if the random number is not smaller than the number of extractions, it indicates that the current sample data at the head of the queue in the sequential queue does not meet the extraction condition in the current extraction process, the current sample data can be placed at the tail of the sequential queue, otherwise, if the random number is smaller than the number of extractions, the current sample data at the head of the queue can be taken out as the target sample data, the aforementioned extraction process is cyclically performed for the current sample data at the head of the queue in the sequential queue until the number of the taken-out reference sample data reaches the determined number of extractions, therefore, the sample data of corresponding quantity can be extracted from each sample data of the sample data set. In summary, according to the technical scheme provided by the embodiment of the invention, when the corresponding amount of sample data is randomly extracted from the sample data set, whether the condition that the sample data can be extracted as reference sample data is not directly related to the amount of the sample data in the sample data set is determined, and the corresponding amount of sample data can be more accurately randomly extracted from the sample data set.

In one embodiment of the present invention, the method further comprises: presetting at least two weight tables, wherein each weight table corresponds to a weight coefficient and at least one piece of characteristic information;

after the a3, further comprising:

In the above embodiment of the present invention, at least two weight tables are preset, each weight table corresponds to one weight coefficient and at least one feature information, after the target sample data is taken out, the target sample data can be analyzed to determine the current feature information carried in the target sample data, and the target sample data is stored in the target weight table of the at least two weight tables, where at least one target feature information corresponding to the target weight table includes the current feature information; therefore, the user can conveniently extract the sample data with corresponding quantity from different weight tables according to different feature information and different weight coefficients in combination with actual service requirements in the subsequent process.

Specifically, in an embodiment of the present invention, the method further includes: when the current number is not less than the extraction number, determining the screening number respectively corresponding to each weight table according to the storage number of the reference sample data respectively stored in each weight table and the weight coefficient respectively corresponding to each weight table; and extracting target sample data with the target screening quantity corresponding to the weight table from each reference sample data stored in the weight table aiming at each weight table.

In order to more clearly illustrate the technical solution and advantages of the present invention, for example, a set number of reference sample data are randomly extracted from a sample data set, each reference sample data is respectively stored in a corresponding weight table according to feature information carried in each reference sample data, and then a corresponding number of target sample data are respectively extracted from each weight table, as shown in fig. 2, the following steps may be specifically included:

step 201, at least two weight tables are preset, and each weight table corresponds to a weight coefficient and at least one feature information.

Step 202, arranging each sample data of the sample data set into a sequence queue, and determining the extraction quantity of the reference sample data.

In steps 201 to 202, the user may set the number of weight tables, the weight coefficient corresponding to each weight table, and at least one piece of feature information according to the actual service requirement.

For example, taking the example that each sample data in the sample data set is the business information of each enterprise registered in the east city and the west city of beijing city, and the business information of each enterprise in the east city needs to be extracted with emphasis, two weight tables A, B may be set, where at least one feature information corresponding to the weight table a includes the east city, the weight coefficient corresponding to the weight table a is 0.6, at least one feature information corresponding to the weight table B includes the west city, and the weight coefficient corresponding to the weight table B is 0.4.

It should be understood that, a corresponding black list may be further configured to store sample data that does not comply with the corresponding rule, for example, when the extracted reference sample data does not include feature information "eastern city region" or "western city region", the reference sample data may be stored in the black list.

Step 203, generating a random number corresponding to the current sample data at the head of the queue in the sequence queue.

Step 204, detecting whether the generated random number is smaller than the extraction number, if so, executing 206; otherwise, step 205 is performed.

And step 205, placing the current sample data at the head of the queue at the tail of the sequence queue.

Here, after step 205 is performed, step 203 may be performed again.

And step 206, taking out the current sample data at the head of the queue as reference sample data.

Step 207, the extracted reference sample data is analyzed to determine the current characteristic information carried in the reference sample data.

Step 208, storing the reference sample data to a target weight table of the at least two weight tables.

And at least one piece of target characteristic information corresponding to the target weight table comprises current characteristic information.

For example, when the extracted reference sample data is analyzed in step 207 to determine that the current feature information carried by the reference sample data is "eastern city area", the reference sample data may be stored in the weight table a, when the extracted reference sample data is analyzed in step 207 to determine that the current feature information carried by the reference sample data is "western city area", the reference sample data may be stored in the weight table B, and when the extracted reference sample data is analyzed in step 207 to determine that the current feature information carried by the reference sample data is neither "eastern city area" nor "western city area", the reference sample data may be stored in the blacklist.

Step 209, detecting whether the current number of each extracted reference sample data is less than the extraction number, if so, executing step 203; otherwise, step 210 is performed.

Step 210, determining the screening number corresponding to each weight table according to the storage number of the reference sample data stored in each weight table and the weight coefficient corresponding to each weight table.

For example, when the number of extractions is 400, the number of reference sample data stored in the weight table a is 200, and the number of reference sample data stored in the weight table B is 200, it may be determined that the number of filters corresponding to the weight table a is 120, which is a product of the number of stores 200 and the corresponding weight coefficient 0.6, and similarly, it may be determined that the number of filters corresponding to the weight table B is 80, which is a product of the number of stores 200 and the corresponding weight coefficient 0.4.

It is understood that, when the product of the storage quantity of the reference sample data stored in a certain weight table and the corresponding weight coefficient is not an integer, the product can be rounded by a rounding method, and the rounded numerical value is used as the screening quantity corresponding to the weight table.

Step 211, for each weight table, extracting a target screening quantity of target sample data corresponding to the weight table from each reference sample data stored in the weight table.

Obviously, 120 target sample data can be randomly extracted from 200 reference sample data stored in the weight value table a by a method similar to the steps 202 to 206; 80 target sample data are randomly extracted from 200 reference sample data stored in the weight value table B by a method similar to the steps 202 to 206.

As shown in fig. 3, an embodiment of the present invention provides an apparatus for randomly extracting sample data, including:

a preprocessing module 301, a random number management module 302, an extraction management module 303, a queue management module 304 and a detection module 305; wherein,

the preprocessing module 301 is configured to arrange sample data of the sample data set into a sequence queue and determine an extraction number;

the random number management module 302 is configured to generate a random number corresponding to current sample data at the head of the queue in the sequential queue, detect whether the random number is smaller than the extraction number, and trigger the extraction management module 303 if the random number is smaller than the extraction number; otherwise, triggering the queue management module 304;

the extraction management module 303 is configured to take out the current sample data at the head of the queue as reference sample data under the triggering of the random number management module 302, and trigger the detection module 305;

the queue management module 304 is configured to place the current sample data at the head of the queue at the tail of the sequential queue under the trigger of the random number management module, and trigger the random number management module 302;

the detecting module 305 is configured to detect the current number of each extracted reference sample data under the trigger of the extraction management module, and trigger the random number management module 302 when the current number is smaller than the extraction number.

As shown in fig. 4, in an embodiment of the present invention, the method further includes: a setting module 401, an analysis module and a 402 storage processing module 403; wherein,

the setting module 401 is configured to preset at least two weight tables, where each weight table corresponds to a weight coefficient and at least one piece of feature information;

the analyzing module 402 is configured to analyze the reference sample data to determine current feature information carried in the reference sample data;

the storage processing module 403 is configured to store the reference sample data in a target weight table of the at least two weight tables, where at least one piece of target feature information corresponding to the target weight table includes the current feature information.

Based on the embodiment shown in fig. 4, in an embodiment of the present invention, the method further includes: a quantity determination module (not shown in the drawings) and a screening extraction module (not shown in the drawings); wherein,

Because the information interaction, execution process, and other contents between the units in the device are based on the same concept as the method embodiment of the present invention, specific contents may refer to the description in the method embodiment of the present invention, and are not described herein again.

An embodiment of the present invention provides a readable medium, which includes an execution instruction, and when a processor of a storage controller executes the execution instruction, the storage controller executes the method for randomly extracting sample data provided in any embodiment of the present invention.

An embodiment of the present invention provides a storage controller, including: a processor, a memory, and a bus;

the processor and the memory are connected through the bus;

the memory, when the storage controller is running, the processor executes the execution instructions stored in the memory to make the storage controller execute the method for randomly drawing sample data provided in any one embodiment of the present invention.

In summary, the embodiments of the present invention have at least the following advantages:

1. in an embodiment of the present invention, by arranging each sample data of the sample data set into a sequential queue, and determining the number of samples to be extracted, then generating a random number corresponding to the current sample data at the head of the queue in the sequential queue for the formed sequential queue, and detecting whether the random number is less than the number of extractions, if the random number is not less than the number of extractions, it indicates that the current sample data at the head of the queue in the sequential queue does not meet the extraction condition in the current extraction process, the current sample data can be placed at the tail of the sequential queue, otherwise, if the random number is less than the number of extractions, the current sample data at the head of the queue can be taken out as the target sample data, the aforementioned extraction process is cyclically performed for the current sample data at the head of the queue in the sequential queue until the number of the taken-out reference sample data reaches the determined number of extractions, therefore, the sample data of corresponding quantity can be extracted from each sample data of the sample data set. In summary, according to the technical scheme provided by the embodiment of the invention, when the corresponding amount of sample data is randomly extracted from the sample data set, whether the condition that the sample data can be extracted as reference sample data is not directly related to the amount of the sample data in the sample data set is determined, and the corresponding amount of sample data can be more accurately randomly extracted from the sample data set.

2. In one embodiment of the invention, at least two weight tables are preset, each weight table corresponds to a weight coefficient and at least one piece of feature information, after target sample data is taken out, the target sample data can be analyzed to determine current feature information carried in the target sample data, the target sample data is stored in a target weight table of the at least two weight tables, and the at least one piece of target feature information corresponding to the target weight table comprises the current feature information; therefore, the user can conveniently extract the sample data with corresponding quantity from different weight tables according to different feature information and different weight coefficients in combination with actual service requirements in the subsequent process.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other similar elements in a process, method, article, or apparatus that comprises the element.

Finally, it is to be noted that: the above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method for randomly extracting sample data, comprising:

a5: detecting the current number of each extracted reference sample data, and executing A1 when the current number is smaller than the extraction number;

after the a3, further comprising:

storing the reference sample data to a target weight table of the at least two weight tables, wherein at least one feature information corresponding to the target weight table comprises the current feature information;

further comprising:

and extracting target sample data with the screening quantity corresponding to the weight tables from each reference sample data stored in the weight tables for each weight table.

2. An apparatus for randomly extracting sample data, comprising:

the detection module is used for detecting the current number of each taken reference sample data under the triggering of the extraction management module, and triggering the random number management module when the current number is smaller than the extraction number;

the storage processing module is configured to store the reference sample data to a target weight table of the at least two weight tables, where at least one piece of feature information corresponding to the target weight table includes the current feature information;

and the screening extraction module is used for extracting target sample data corresponding to the screening quantity of the weight tables from each reference sample data stored in the weight tables aiming at each weight table.

3. A readable medium comprising executable instructions that, when executed by a processor of a storage controller, cause the storage controller to perform the method of claim 1.

4. A storage controller, comprising: a processor, a memory, and a bus;

the processor and the memory are connected through the bus;

the memory, the processor executing execution instructions stored by the memory to cause the storage controller to perform the method of claim 1 when the storage controller is running.