CN113392902A - Data set processing method and device, storage medium and electronic equipment - Google Patents

Data set processing method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN113392902A
CN113392902A CN202110661196.9A CN202110661196A CN113392902A CN 113392902 A CN113392902 A CN 113392902A CN 202110661196 A CN202110661196 A CN 202110661196A CN 113392902 A CN113392902 A CN 113392902A
Authority
CN
China
Prior art keywords
data set
sample
sampling
sample data
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110661196.9A
Other languages
Chinese (zh)
Inventor
高宗
陈彦宇
马雅奇
谭龙田
周慧子
陈高
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gree Electric Appliances Inc of Zhuhai
Zhuhai Lianyun Technology Co Ltd
Original Assignee
Gree Electric Appliances Inc of Zhuhai
Zhuhai Lianyun Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gree Electric Appliances Inc of Zhuhai, Zhuhai Lianyun Technology Co Ltd filed Critical Gree Electric Appliances Inc of Zhuhai
Priority to CN202110661196.9A priority Critical patent/CN113392902A/en
Publication of CN113392902A publication Critical patent/CN113392902A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The application relates to the technical field of data processing, in particular to a data set processing method, a data set processing device and electronic equipment, wherein the method comprises the following steps: acquiring a data set to be processed, wherein the data set to be processed comprises first type sample data; randomly selecting first sample data of at least three different action types from a data set to be processed, and splicing the first sample data of at least three different action types into splicing sample data; sampling the spliced sample data to obtain a sampling sample data set; expanding a data set to be processed according to the sampling sample data set to obtain a target data set; the target data set includes all samples in the sample data set and the data set to be processed. According to the method, the spliced sample data is sampled, so that the data set to be processed can be rapidly expanded by using fewer original samples, the sample acquisition cost is reduced, and the diversity of the data in the target data set can be ensured.

Description

Data set processing method and device, storage medium and electronic equipment
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data set processing method and apparatus, a storage medium, and an electronic device.
Background
With continuous optimization of the deep convolutional network model, huge energy is gradually emitted in the field of motion recognition, and particularly in the field of video monitoring, strong support can be provided for system decision by real-time perception and early warning. However, the development of the deep convolution-based neural network in the field of motion recognition still has a regulatory limit, and the main reason for limiting the development is that the acquisition cost of the data set is too high. In order to train a network model capable of performing motion recognition, a single-motion video is required to be used as a data sample, and a motion category is required to be used as a label for training. However, in a specific scenario, it is difficult to acquire a large amount of motion sample data and perform labeling. In addition, in the actual detection process, the detected sample to be detected usually includes a plurality of actions, and the one-hot label encoding manner cannot correctly reflect the real action state of the sample.
Disclosure of Invention
In view of the foregoing problems, the present application provides a data set processing method, an apparatus, a storage medium, and an electronic device.
In a first aspect, the present application provides a data set processing method, including:
acquiring a data set to be processed, wherein the data set to be processed comprises first type sample data, and the first type sample is the sample data comprising a single action;
randomly selecting first sample data of at least three different action categories from the data set to be processed, and splicing the first sample data of the at least three different action categories into splicing sample data;
sampling the spliced sample data to obtain a sampled sample data set;
expanding the data set to be processed according to the sampling sample data set to obtain a target data set; the target data set comprises all samples in the sample data set and the data set to be processed.
In the above embodiment, after the sample data of at least three different motion categories is randomly selected from the data set to be processed, the sample data is spliced into one spliced sample data. After the spliced sample data is sampled, the original data set can be rapidly expanded, and a target data set is obtained. According to the method, the spliced sample data is sampled, so that the data set to be processed can be rapidly expanded by using fewer original samples, the sample acquisition cost is reduced, and the diversity of the data in the target data set can be ensured.
According to an embodiment of the present application, optionally, in the data set processing method, acquiring a data set to be processed includes:
acquiring an original data set, and dividing second type sample data in the original data set into a plurality of first type samples to obtain a divided original data set; the second type of sample data is sample data comprising a plurality of actions;
coding the first type samples according to the action types of the actions included in each first type sample to obtain a label corresponding to each first type sample;
and obtaining the data set to be processed according to the first type of samples and the labels corresponding to the first type of samples.
According to an embodiment of the present application, optionally, in the data set processing method, acquiring an original data set, and dividing a second type of sample data in the original data set into a plurality of first type samples to obtain a divided original data set, the method includes:
determining a starting frame and an ending frame of each action in the second type of sample data;
and dividing the second type of sample data into a plurality of first type samples according to the starting frame and the ending frame.
According to an embodiment of the present application, optionally, in the data set processing method, sampling the splicing sample data to obtain a sample data set of samples includes:
sampling the spliced sample data to obtain a plurality of sampling samples;
encoding each sample to obtain a label for the sample;
and acquiring a sampling sample data set according to the plurality of sampling samples and the codes.
According to an embodiment of the present application, optionally, in the data set processing method, sampling the splicing sample data includes:
acquiring a sampling length;
determining a starting sampling point and an ending sampling point of the splicing sample data;
and performing sliding sampling on the splicing sample data according to the sampling length, the initial sampling point and the end sampling point.
According to an embodiment of the present application, optionally, in the data set processing method, obtaining the sampling length includes:
determining a head segment, a middle segment and a tail segment according to first type sample data of three different action types in the splicing sample data, wherein the head segment, the middle segment and the tail segment respectively correspond to different action types;
and acquiring the sampling length according to the length of the middle segment.
According to an embodiment of the present application, optionally, in the above data set processing method,
encoding each sample to obtain a label for the sample, comprising:
determining a relative offset between a center position of the sampled sample and a center position of the intermediate segment;
calculating a first confidence level of a behavior type included by the sampling sample belonging to the intermediate segment according to the relative offset;
determining a second confidence coefficient that the sampling sample belongs to other behavior types according to the first confidence coefficient;
and coding the sampling sample according to the first confidence coefficient and the second confidence coefficient to obtain a label corresponding to the sampling sample.
In a second aspect, the present application provides a data set processing apparatus, the apparatus comprising:
the device comprises a to-be-processed data set acquisition module, a to-be-processed data set acquisition module and a to-be-processed data set processing module, wherein the to-be-processed data set comprises first type sample data, and the first type sample is the sample data comprising a single action;
the sample splicing module is used for randomly selecting first sample data of at least three different action categories from the data set to be processed and splicing the first sample data of the at least three different action categories into splicing sample data;
the sampling module is used for sampling the splicing sample data to obtain a sampling sample data set;
the data expansion module is used for expanding the data set to be processed according to the sampling sample data set to obtain a target data set; the target data set comprises all samples in the sample data set and the data set to be processed.
According to an embodiment of the present application, optionally, in the data set processing apparatus, the to-be-processed data set obtaining module includes:
the device comprises a segmentation unit, a storage unit and a processing unit, wherein the segmentation unit is used for acquiring an original data set, and segmenting second-class sample data in the original data set into a plurality of first-class samples to obtain a segmented original data set; the second type of sample data is sample data comprising a plurality of actions;
the encoding unit is used for encoding the first type samples according to the action types of the actions included in each first type sample to obtain a label corresponding to each first type sample;
and the preprocessing unit is used for obtaining the data set to be processed according to the first type of samples and the labels corresponding to the first type of samples.
According to an embodiment of the present application, optionally, in the data set processing apparatus, the dividing unit includes:
the frame determining unit is used for determining a starting frame and an ending frame of each action in the second type of sample data;
and the sample dividing unit is used for dividing the second type of sample data into a plurality of first type of samples according to the starting frame and the ending frame.
According to an embodiment of the present application, optionally, in the data set processing apparatus, the sampling module includes:
a sampling sample obtaining unit, configured to sample the splicing sample data to obtain multiple sampling samples;
the sampling coding unit is used for coding each sampling sample to obtain a label of the sampling sample;
and the sampling sample data set acquisition unit is used for acquiring the sampling sample data set according to the plurality of sampling samples and the codes.
According to an embodiment of the present application, optionally, in the data set processing apparatus, the sampling sample acquiring unit includes:
a sampling length obtaining unit for obtaining a sampling length;
the sampling point determining unit is used for determining a starting sampling point and an ending sampling point of the splicing sample data;
and the sampling unit is used for sampling the splicing sample data according to the sampling length, the initial sampling point and the end sampling point.
According to an embodiment of the present application, optionally, in the data set processing apparatus, the sampling length obtaining unit includes:
a segment determining subunit, configured to determine a head segment, a middle segment, and a tail segment according to first type of sample data of three different motion types in the splicing sample data, where the head segment, the middle segment, and the tail segment respectively correspond to different motion types;
and the sliding window determining subunit is used for acquiring the sampling length according to the length of the intermediate segment.
According to an embodiment of the present application, optionally, in the data set processing apparatus, the sampling encoding unit includes:
a relative offset determination unit for determining a relative offset between the centre position of the sample and the centre position of the intermediate segment;
a first confidence determining unit, configured to calculate a first confidence that the sampled sample belongs to the behavior type included in the intermediate segment according to the relative offset;
the second confidence degree determining unit is used for determining a second confidence degree that the sampling sample belongs to other behavior types according to the first confidence degree;
and the label obtaining unit is used for coding the sampling sample according to the first confidence coefficient and the second confidence coefficient to obtain a label corresponding to the sampling sample.
In a third aspect, the present application provides a storage medium storing a computer program executable by one or more processors and operable to implement a data set processing method as described above.
In a fourth aspect, the present application provides an electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the computer program is executed by the processor to perform the data set processing method.
Compared with the prior art, one or more embodiments in the above scheme can have the following advantages or beneficial effects:
the application provides a data set processing method, a data set processing device, a storage medium and an electronic device, wherein the method comprises the following steps: acquiring a data set to be processed, wherein the data set to be processed comprises first type sample data, and the first type sample is the sample data comprising a single action; randomly selecting first sample data of at least three different action categories from the data set to be processed, and splicing the first sample data of the at least three different action categories into splicing sample data; sampling the spliced sample data to obtain a sampled sample data set; expanding the data set to be processed according to the sampling sample data set to obtain a target data set; the target data set comprises all samples in the sample data set and the data set to be processed. And randomly selecting sample data of at least three different action categories from the data set to be processed, and splicing the sample data into spliced sample data. After the spliced sample data is sampled, the original data set can be rapidly expanded, and a target data set is obtained. According to the method, the spliced sample data is sampled, so that the data set to be processed can be rapidly expanded by using fewer original samples, the sample acquisition cost is reduced, and the diversity of the data in the target data set can be ensured.
Drawings
The present application will be described in more detail below on the basis of embodiments and with reference to the accompanying drawings.
Fig. 1 is a schematic flowchart of a data set processing method according to an embodiment of the present application.
Fig. 2 is a schematic diagram of sliding sampling according to a third embodiment of the present application.
Fig. 3 is a block diagram of a data set processing apparatus according to an embodiment of the present application.
Fig. 4 is a connection block diagram of an electronic device according to a second embodiment of the present application.
In the drawings, like parts are designated with like reference numerals, and the drawings are not drawn to scale.
Detailed Description
The following detailed description will be provided with reference to the accompanying drawings and embodiments, so that how to apply the technical means to solve the technical problems and achieve the corresponding technical effects can be fully understood and implemented. The embodiments and various features in the embodiments of the present application can be combined with each other without conflict, and the formed technical solutions are all within the scope of protection of the present application.
Example one
The present invention provides a data set processing method, please refer to fig. 1, which includes the following steps:
step S110: acquiring a data set to be processed, wherein the data set to be processed comprises first type sample data.
Wherein the first type of sample is sample data comprising a single action. When the data set to be processed is obtained, the original data set may be obtained first, and then the original data set is preprocessed to obtain the data set to be processed. The preprocessing of the raw data set can be performed in a variety of ways or a combination of ways. For example, invalid data in the original data set may be filtered and filtered to obtain the original data set only including valid data, where the invalid data may refer to data in the data set that has a defect, or may refer to a data format that is not supported by the method, and the invalid data may be specifically filtered according to an actual application. In addition, when the original data set is preprocessed, the data in the original data set may be processed in a unified format, or the data in the original data set may be processed in a unified format.
Step S120: randomly selecting first type sample data of at least three different action types from the data set to be processed, and splicing the first type sample data of the at least three different action types into splicing sample data.
The sample data of at least three different motion categories randomly selected from the to-be-processed data set may be sample data including different motion categories, for example, if three sample data are randomly selected from the to-be-processed data set as S1, S2, and S3, the behavior category included in S1 is a, the behavior category included in S2 is B, and the behavior category included in S3 is C. The three sample data are randomly selected in the data set to be processed, and the randomness of the sample data is considered, so that the three sample data can be randomly spliced when spliced into spliced sample data, and can also be sequentially spliced according to the sampling sequence. It can be understood that the sample data in the data set to be processed may also be divided into a plurality of large classes according to the categories, each large class includes a plurality of sample data with the same category, then the plurality of large classes are sorted according to a certain order, then three large classes arranged in order are sequentially selected, one sample data is randomly selected from each large class, and finally the selected sample data is randomly spliced. That is, when selecting sample data from the to-be-processed data set, it is sufficient to ensure that the types of actions of the sample data are various.
Step S130: and sampling the spliced sample data to obtain a sampled sample data set.
When splicing sample data is sampled, a sliding sampling mode or a random sampling mode can be adopted, and the specific sampling mode can be determined according to actual sampling requirements.
If the sliding window length of the sliding sample is n and the sliding amount of the sliding sample is 0.05n, the number of sample data that can be theoretically obtained is n
Figure BDA0003115383930000071
If sampling is performed in a random sampling mode, a sample center can be randomly generated on the spliced sample, a sampling range is determined according to the sample center, and random sampling is performed according to the sampling range.
Step S140: expanding the data set to be processed according to the sampling sample data set to obtain a target data set; the target data set comprises all samples in the sample data set and the data set to be processed.
The spliced sample data is subjected to sliding sampling to obtain a sample data set, and the original data set can be rapidly expanded according to the sample data set.
To sum up, the present application provides a data set processing method, comprising: acquiring a data set to be processed, wherein the data set to be processed comprises first type sample data, and the first type sample is the sample data comprising a single action; randomly selecting first sample data of at least three different action categories from the data set to be processed, and splicing the first sample data of the at least three different action categories into splicing sample data; sampling the spliced sample data to obtain a sampled sample data set; expanding the data set to be processed according to the sampling sample data set to obtain a target data set; the target data set comprises all samples in the sample data set and the data set to be processed. And randomly selecting sample data of at least three different action categories from the data set to be processed, and splicing the sample data into spliced sample data. After the spliced sample data is sampled, the original data set can be rapidly expanded, and a target data set is obtained. According to the method, the spliced sample data is sampled, so that the data set to be processed can be rapidly expanded by using fewer original samples, the sample acquisition cost is reduced, and the diversity of the data in the target data set can be ensured.
Example two
On the basis of the first embodiment, the present embodiment explains the method in the first embodiment through a specific implementation case.
In the data set processing method, in the step of obtaining an original data set and preprocessing the original data set to obtain a data set to be processed, the original data set may be obtained first, and a second type of sample data in the original data set is divided into a plurality of first type samples to obtain a divided original data set; and the second type of sample data is sample data comprising a plurality of actions. Then, the first type samples are coded according to the action types of the actions included in the first type samples, and the labels corresponding to the first type samples are obtained. And finally, obtaining the data set to be processed according to the first type of sample and the corresponding label.
For example, the original data set includes video data, the video data includes actions in a process flow, the second type of sample data includes multiple different actions and tags in the process flow, the first type of sample includes a single action in the process flow, and the first type of sample data includes the first type of sample and its corresponding tag. The actions included in the sample may be manually set actions, e.g., a single action may be defined at each part mounting step during the device assembly process. If the mounting step comprises two steps of taking and assembling the fan blade and screwing the nut, a single action can be defined as the action of taking and assembling the fan blade, and another single action is defined as the action of screwing the nut. It will be appreciated that the sample data included in the original data set may also be data with time series attributes, i.e. data that dynamically changes over time with data tags. For example, audio data that needs to be subjected to voice recognition, including audio data of a continuous multi-person conversation, may also be used as an original data set in the method to be subjected to expansion processing, sample data included in the original data set may be determined according to an actual application scenario, where the sample data of the first type may be specifically defined according to the application scenario.
When an original data set is obtained, a second type of sample data in the original data set is divided into a plurality of first type samples, and when the divided original data set is obtained, a start frame and an end frame of each action in the second type of sample data can be determined, and then the second type of sample data is divided into a plurality of first type samples according to the start frame and the end frame.
When the start frame and the end frame of each preset behavior in the second type of sample data are determined, the start frame and the end frame may be determined in a manual labeling manner, may also be determined in an image recognition manner, and may also be checked in a manual labeling manner after the start frame and the end frame are determined in the image recognition manner.
After the second type of sample data comprising a plurality of actions is divided into the first type of sample comprising a single action, the first type of sample is coded according to the type of the action included in the first type of sample, so as to obtain a label corresponding to the first type of sample. Specifically, the label corresponding to the second type sample in the second type sample data including the plurality of actions includes a start frame, an end frame, and a category of the action, so that the start frame and the end frame can be divided into the first type sample, and then the first type sample is encoded according to the corresponding action category. If the labels of the second type of sample data are { [0, 1, a ], [2, 4, B ], [5, 6, C ], [7, 8, D ] }, the motion with motion type a can be divided into the first type of sample according to the start frame 0 and the end frame 1, and then encoded according to the motion type a. The motion of motion class B may then be split into second first type samples according to the start frame 2 and the end frame 4, which are then encoded according to motion class B. The following segmentation is done by analogy and is not described in detail. It should be understood that the encoding method herein may adopt a one-hot (one-hot) label encoding method, and may also adopt other encoding methods. Taking one-hot as an example, if the following action type [ a, B, C, D ]4 is shared, then [1, 0, 0, 0] can be obtained by encoding an action of action type a, and [0, 1, 0, 0] can be obtained by encoding an action of action type B, and so on, the action of action type C is encoded as [0, 0,1, 0], and the action of action type D is encoded as [0, 0, 0,1 ].
EXAMPLE III
On the basis of the first embodiment, the present embodiment explains the method in the first embodiment through a specific implementation case.
When sampling the splicing sample data to obtain a sampling sample data set, the splicing sample data may be sampled first to obtain a sampling sample. And then coding the sampling sample according to the splicing sample data to obtain a label of the sampling sample, and finally obtaining the sampling sample data according to the sampling sample and the code.
When the splicing sample data is sampled, the sampling length can be obtained firstly, and then the initial sampling point and the ending sampling point of the splicing sample data are determined; and sampling the splicing sample data according to the sampling length, the initial sampling point and the end sampling point.
The sample length may be obtained according to the following manner: determining a head segment, a middle segment and a tail segment according to preprocessed sample data in the splicing sample data, wherein the head segment, the middle segment and the tail segment respectively correspond to different action categories; and acquiring the sampling length according to the length of the middle segment. It will be appreciated that the sample length may also be set directly.
After a sampling sample is obtained, the sampling sample needs to be encoded according to the splicing sample data to obtain a label of the sampling sample. Determining a relative offset between a center position of the sampled sample and a center position of the intermediate segment; calculating a first confidence level of a behavior type included by the sampling sample belonging to the intermediate segment according to the relative offset; determining a second confidence coefficient that the sampling sample belongs to other behavior types according to the first confidence coefficient; and coding the sampling sample according to the first confidence coefficient and the second confidence coefficient to obtain a label corresponding to the sampling sample.
When sampling the spliced sample data to obtain the sample data set of samples, it is assumed that a sliding sampling method is used. As shown in fig. 2, the random three-segment segments included in the spliced sample S are Si, Sj, and Sk, respectively, and the action types of the actions included therein are A, B, C, respectively. If the spliced sample S is subjected to sliding sampling according to the length of the middle segment Sj segment. Assuming that the Si video length is m, the Sj segment length is n, and the Sk segment length is o, the S segment length is m + n + o. The starting point of the sliding sampling center is m, the length is n, the sliding amount is 0.05n, the starting frame and the ending frame of the k-th video segment obtained by sampling are [0.05n (k-1) + m-0.5n,0.05n (k-1) + m +0.5n ], the video segment is cut out as a sample according to the mode and the starting point and the ending point of the starting frame and the ending frame, and the video segment is stored until the sampling center reaches the ending frame of the segment Sj, so that a new group of sampling samples can be obtained. The starting sampling point is a starting point m of the sliding sampling center, i.e., a starting frame of the segment Sj, and the ending sampling point is an ending frame of the segment Sj.
Theoretically, the confidence that the sample belongs to the action class B should be higher as the center of the sample is closer to the center of Sj, but the confidence should be decreased when the center of the sample is shifted to both sides; the gaussian distribution is a mathematical model that fits the scene, with larger output values indicating closer to the center. The labels can be processed by adopting a Gaussian smoothing method, and the closer to the action center, the higher confidence label can be obtained. Here, smoothing processing x to (0,1) is performed using a standard positive-phase distribution curve, and assuming that a new sample is T and x is a relative offset between the center of T and the center of Sj, there are:
Figure BDA0003115383930000101
Figure BDA0003115383930000102
Figure BDA0003115383930000103
wherein, TcenterFor the position of the centre of the sample T in the video S, SjcenterIs the position of the center of sample Sj in video S, conf is the confidence label that sample T belongs to action B, confotherThe confidence that the sample T belongs to each of the other action categories is taken as the sample, and N is the total number of action categories of the action. For example, the confidence of the class B to which the sample at the slide start position (x ═ 0.5) belongs is:
Figure BDA0003115383930000104
the confidence that this sample belongs to other actions is (assuming a total of 4 classes of actions):
Figure BDA0003115383930000105
the tag is encoded to obtain a vector [0.22,0.34,0.22,0.22], wherein it is required to ensure that the sum of the numbers of the encoded vector is 1, because it represents the confidence belonging to each action and the probability of occurrence of each event is 1. Where the confidence of the other actions to which the sample belongs is a divisor of 0.22,0.22 x 3 is 0.66, it is necessary to approximate the confidence 0.35 of the class B to which the sample belongs to 0.34, so as to ensure a sum of 1.
If sampling is performed on the spliced sample data to obtain a sample data set of samples, a random sampling method is adopted, namely a sample center T is randomly generated on the segment SjcenterThen the start frame and the end frame of the sample are [ T ]center-0.5n,Tcenter+0.5n]The sliding sampling embodiment is the same as the sliding sampling embodiment in the obtaining mode of the sample label, and the description is omitted here.
Example four
Referring to fig. 3, the present application provides a data set processing apparatus 300 comprising:
a to-be-processed data set obtaining module 310, configured to obtain a to-be-processed data set, where the to-be-processed data set includes a first type of sample data, where the first type of sample is sample data including a single action;
the sample splicing module 320 is configured to randomly select first type sample data of at least three different action categories from the data set to be processed, and splice the first type sample data of the at least three different action categories into splicing sample data;
a sampling module 330, configured to sample the splicing sample data to obtain a sample data set;
a data expansion module 340, configured to expand the to-be-processed data set according to the sample data set to obtain a target data set; the target data set comprises all samples in the sample data set and the data set to be processed.
According to an embodiment of the present application, optionally, in the data set processing apparatus, the to-be-processed data set obtaining module 310 includes:
the device comprises a segmentation unit, a storage unit and a processing unit, wherein the segmentation unit is used for acquiring an original data set, and segmenting second-class sample data in the original data set into a plurality of first-class samples to obtain a segmented original data set; the second type of sample data is sample data comprising a plurality of actions;
the encoding unit is used for encoding the first type samples according to the action types of the actions included in each first type sample to obtain a label corresponding to each first type sample;
and the preprocessing unit is used for obtaining the data set to be processed according to the first type of samples and the labels corresponding to the first type of samples.
According to an embodiment of the present application, optionally, in the data set processing apparatus, the dividing unit includes:
the frame determining unit is used for determining a starting frame and an ending frame of each action in the second type of sample data;
and the sample dividing unit is used for dividing the second type of sample data into a plurality of first type of samples according to the starting frame and the ending frame.
According to an embodiment of the present application, optionally, in the data set processing apparatus, the sampling module 330 includes:
a sampling sample obtaining unit, configured to sample the splicing sample data to obtain multiple sampling samples;
the sampling coding unit is used for coding each sampling sample to obtain a label of the sampling sample;
and the sampling sample data set acquisition unit is used for acquiring the sampling sample data set according to the plurality of sampling samples and the codes.
According to an embodiment of the present application, optionally, in the data set processing apparatus, the sampling sample acquiring unit includes:
a sampling length obtaining unit for obtaining a sampling length;
the sampling point determining unit is used for determining a starting sampling point and an ending sampling point of the splicing sample data;
and the sampling unit is used for sampling the splicing sample data according to the sampling length, the initial sampling point and the end sampling point.
According to an embodiment of the present application, optionally, in the data set processing apparatus, the sampling length obtaining unit includes:
a segment determining subunit, configured to determine a head segment, a middle segment, and a tail segment according to first type of sample data of three different motion types in the splicing sample data, where the head segment, the middle segment, and the tail segment respectively correspond to different motion types;
and the sliding window determining subunit is used for acquiring the sampling length according to the length of the intermediate segment.
According to an embodiment of the present application, optionally, in the data set processing apparatus, the sampling encoding unit includes:
a relative offset determination unit for determining a relative offset between the centre position of the sample and the centre position of the intermediate segment;
a first confidence determining unit, configured to calculate a first confidence that the sampled sample belongs to the behavior type included in the intermediate segment according to the relative offset;
the second confidence degree determining unit is used for determining a second confidence degree that the sampling sample belongs to other behavior types according to the first confidence degree;
and the label obtaining unit is used for coding the sampling sample according to the first confidence coefficient and the second confidence coefficient to obtain a label corresponding to the sampling sample.
In summary, the present application provides a data set processing apparatus comprising: a to-be-processed data set obtaining module 310, configured to obtain a to-be-processed data set, where the to-be-processed data set includes a first type of sample data, where the first type of sample is sample data including a single action; the sample splicing module 320 is configured to randomly select first type sample data of at least three different action categories from the data set to be processed, and splice the first type sample data of the at least three different action categories into splicing sample data; a sampling module 330, configured to sample the splicing sample data to obtain a sample data set; a data expansion module 340, configured to expand the to-be-processed data set according to the sample data set to obtain a target data set; the target data set comprises all samples in the sample data set and the data set to be processed. And randomly selecting sample data of at least three different action categories from the data set to be processed, and splicing the sample data into spliced sample data. After the spliced sample data is sampled, the original data set can be rapidly expanded, and a target data set is obtained. According to the method, the spliced sample data is sampled, so that the data set to be processed can be rapidly expanded by using fewer original samples, the sample acquisition cost is reduced, and the diversity of the data in the target data set can be ensured.
EXAMPLE five
The present embodiments also provide a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., having stored thereon a computer program that, when executed by a processor, performs the method steps of the above-described embodiments.
The specific embodiment process of the above method steps can be referred to as embodiment one, and the detailed description of this embodiment is not repeated herein.
EXAMPLE six
The embodiment of the present application provides an electronic device, which may be a mobile phone, a computer, or a tablet computer, and the like, and includes a memory and a processor, where the memory stores a computer program, and the computer program, when executed by the processor, implements the data set processing method as described in the first embodiment. It is understood that, as shown in fig. 4, the electronic device 400 may further include: a processor 401, a memory 402, a multimedia component 403, an input/output (I/O) interface 404, and a communication component 405.
The processor 401 is configured to execute all or part of the steps in the data set processing method according to the first embodiment. The memory 402 is used to store various types of data, which may include, for example, instructions for any application or method in the electronic device, as well as application-related data.
The Processor 401 may be implemented by an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and is configured to execute the data set Processing method in the first embodiment.
The Memory 402 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk.
The multimedia component 403 may include a screen, which may be a touch screen, and an audio component for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in a memory or transmitted through a communication component. The audio assembly also includes at least one speaker for outputting audio signals.
The I/O interface 404 provides an interface between the processor 401 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons.
The communication component 405 is used for wired or wireless communication between the electronic device 400 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding Communication component 405 may include: Wi-Fi module, bluetooth module, NFC module.
In summary, the present application provides a data set processing method, an apparatus, a storage medium, and an electronic device, where the method includes obtaining a data set to be processed, where the data set to be processed includes a first type of sample data, where the first type of sample is sample data including a single action; randomly selecting first sample data of at least three different action categories from the data set to be processed, and splicing the first sample data of the at least three different action categories into splicing sample data; sampling the spliced sample data to obtain a sampled sample data set; expanding the data set to be processed according to the sampling sample data set to obtain a target data set; the target data set comprises all samples in the sample data set and the data set to be processed. And randomly selecting sample data of at least three different action categories from the data set to be processed, and splicing the sample data into spliced sample data. After the spliced sample data is sampled, the original data set can be rapidly expanded, and a target data set is obtained. According to the method, the spliced sample data is sampled, so that the data set to be processed can be rapidly expanded by using fewer original samples, the sample acquisition cost is reduced, and the diversity of the data in the target data set can be ensured.
In the several embodiments provided in the embodiments of the present application, it should be understood that the disclosed system and method may be implemented in other ways. The system and method embodiments described above are merely illustrative.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Although the embodiments disclosed in the present application are described above, the descriptions are only for the convenience of understanding the present application, and are not intended to limit the present application. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims.

Claims (10)

1. A method of processing a data set, the method comprising:
acquiring a data set to be processed, wherein the data set to be processed comprises first type sample data, and the first type sample is the sample data comprising a single action;
randomly selecting first sample data of at least three different action categories from the data set to be processed, and splicing the first sample data of the at least three different action categories into splicing sample data;
sampling the spliced sample data to obtain a sampled sample data set;
expanding the data set to be processed according to the sampling sample data set to obtain a target data set; the target data set comprises all samples in the sample data set and the data set to be processed.
2. The method of claim 1, wherein obtaining a dataset to be processed comprises:
acquiring an original data set, and dividing second type sample data in the original data set into a plurality of first type samples to obtain a divided original data set; the second type of sample data is sample data comprising a plurality of actions;
coding the first type samples according to the action types of the actions included in each first type sample to obtain a label corresponding to each first type sample;
and obtaining the data set to be processed according to the first type of samples and the labels corresponding to the first type of samples.
3. The method of claim 2, wherein obtaining an original data set and segmenting a second type of sample data in the original data set into a plurality of first type samples to obtain a segmented original data set, comprises:
determining a starting frame and an ending frame of each action in the second type of sample data;
and dividing the second type of sample data into a plurality of first type samples according to the starting frame and the ending frame.
4. The method of claim 1, wherein sampling the concatenation sample data to obtain a sample data set, comprises:
sampling the spliced sample data to obtain a plurality of sampling samples;
encoding each sample to obtain a label for the sample;
and acquiring a sampling sample data set according to the plurality of sampling samples and the codes.
5. The method of claim 4, wherein sampling the concatenation sample data comprises:
acquiring a sampling length;
determining a starting sampling point and an ending sampling point of the splicing sample data;
and sampling the splicing sample data according to the sampling length, the initial sampling point and the end sampling point.
6. The method of claim 5, wherein obtaining a sample length comprises:
determining a head segment, a middle segment and a tail segment according to first type sample data of three different action types in the splicing sample data, wherein the head segment, the middle segment and the tail segment respectively correspond to different action types;
and acquiring the sampling length according to the length of the middle segment.
7. The method of claim 6, wherein encoding each sample to obtain a label for the sample comprises:
determining a relative offset between a center position of the sampled sample and a center position of the intermediate segment;
calculating a first confidence level of a behavior type included by the sampling sample belonging to the intermediate segment according to the relative offset;
determining a second confidence coefficient that the sampling sample belongs to other behavior types according to the first confidence coefficient;
and coding the sampling sample according to the first confidence coefficient and the second confidence coefficient to obtain a label corresponding to the sampling sample.
8. A data set processing apparatus, characterized in that the apparatus comprises:
the device comprises a to-be-processed data set acquisition module, a to-be-processed data set acquisition module and a to-be-processed data set processing module, wherein the to-be-processed data set comprises first type sample data, and the first type sample is the sample data comprising a single action;
the sample splicing module is used for randomly selecting first sample data of at least three different action categories from the data set to be processed and splicing the first sample data of the at least three different action categories into splicing sample data;
the sampling module is used for sampling the splicing sample data to obtain a sampling sample data set;
the data expansion module is used for expanding the data set to be processed according to the sampling sample data set to obtain a target data set; the target data set comprises all samples in the sample data set and the data set to be processed.
9. A storage medium storing a computer program which, when executed by one or more processors, is adapted to implement the data set processing method of any one of claims 1 to 7.
10. An electronic device, comprising a memory and a processor, the memory having stored thereon a computer program which, when executed by the processor, performs a data set processing method according to any one of claims 1 to 7.
CN202110661196.9A 2021-06-15 2021-06-15 Data set processing method and device, storage medium and electronic equipment Pending CN113392902A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110661196.9A CN113392902A (en) 2021-06-15 2021-06-15 Data set processing method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110661196.9A CN113392902A (en) 2021-06-15 2021-06-15 Data set processing method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN113392902A true CN113392902A (en) 2021-09-14

Family

ID=77621249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110661196.9A Pending CN113392902A (en) 2021-06-15 2021-06-15 Data set processing method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113392902A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115579130A (en) * 2022-11-10 2023-01-06 中国中医科学院望京医院(中国中医科学院骨伤科研究所) Method, device, equipment and medium for evaluating limb function of patient

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647565A (en) * 2018-03-28 2018-10-12 浙江工业大学 A kind of data preprocessing method classified to electrocardiosignal based on deep learning model
CN111222476A (en) * 2020-01-10 2020-06-02 北京百度网讯科技有限公司 Video time sequence action detection method and device, electronic equipment and storage medium
CN111354340A (en) * 2018-12-20 2020-06-30 北京嘀嘀无限科技发展有限公司 Data annotation accuracy verification method and device, electronic equipment and storage medium
CN111508480A (en) * 2020-04-20 2020-08-07 网易(杭州)网络有限公司 Training method of audio recognition model, audio recognition method, device and equipment
CN112489676A (en) * 2020-12-15 2021-03-12 腾讯音乐娱乐科技(深圳)有限公司 Model training method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647565A (en) * 2018-03-28 2018-10-12 浙江工业大学 A kind of data preprocessing method classified to electrocardiosignal based on deep learning model
CN111354340A (en) * 2018-12-20 2020-06-30 北京嘀嘀无限科技发展有限公司 Data annotation accuracy verification method and device, electronic equipment and storage medium
CN111222476A (en) * 2020-01-10 2020-06-02 北京百度网讯科技有限公司 Video time sequence action detection method and device, electronic equipment and storage medium
CN111508480A (en) * 2020-04-20 2020-08-07 网易(杭州)网络有限公司 Training method of audio recognition model, audio recognition method, device and equipment
CN112489676A (en) * 2020-12-15 2021-03-12 腾讯音乐娱乐科技(深圳)有限公司 Model training method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115579130A (en) * 2022-11-10 2023-01-06 中国中医科学院望京医院(中国中医科学院骨伤科研究所) Method, device, equipment and medium for evaluating limb function of patient
CN115579130B (en) * 2022-11-10 2023-03-14 中国中医科学院望京医院(中国中医科学院骨伤科研究所) Method, device, equipment and medium for evaluating limb function of patient

Similar Documents

Publication Publication Date Title
CN110084172B (en) Character recognition method and device and electronic equipment
CN112561080B (en) Sample screening method, sample screening device and terminal equipment
KR102002024B1 (en) Method for processing labeling of object and object management server
CN109087667B (en) Voice fluency recognition method and device, computer equipment and readable storage medium
CN110633594A (en) Target detection method and device
CN114723646A (en) Image data generation method with label, device, storage medium and electronic equipment
CN112183542A (en) Text image-based recognition method, device, equipment and medium
CN116932919A (en) Information pushing method, device, electronic equipment and computer readable medium
CN116994000A (en) Part edge feature extraction method and device, electronic equipment and storage medium
CN114863182A (en) Image classification method, and training method and device of image classification model
CN109165572B (en) Method and apparatus for generating information
CN113392902A (en) Data set processing method and device, storage medium and electronic equipment
CN113761845A (en) Text generation method and device, storage medium and electronic equipment
CN110210314B (en) Face detection method, device, computer equipment and storage medium
JP2023133274A (en) Training method for roi detection model, detection method, apparatus therefor, device therefor, and medium therefor
CN114694146B (en) Training method of text recognition model, text recognition method, device and equipment
CN116503596A (en) Picture segmentation method, device, medium and electronic equipment
CN115063826A (en) Mobile terminal driver license identification method and system based on deep learning
CN115984868A (en) Text processing method, device, medium and equipment
CN115565186A (en) Method and device for training character recognition model, electronic equipment and storage medium
CN113033552B (en) Text recognition method and device and electronic equipment
CN114724144A (en) Text recognition method, model training method, device, equipment and medium
CN114580548A (en) Training method of target detection model, target detection method and device
CN113780239A (en) Iris recognition method, iris recognition device, electronic equipment and computer readable medium
CN113610856A (en) Method and device for training image segmentation model and image segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination