CN108717547A - The method and device of sample data generation method and device, training pattern - Google Patents
The method and device of sample data generation method and device, training pattern Download PDFInfo
- Publication number
- CN108717547A CN108717547A CN201810289135.2A CN201810289135A CN108717547A CN 108717547 A CN108717547 A CN 108717547A CN 201810289135 A CN201810289135 A CN 201810289135A CN 108717547 A CN108717547 A CN 108717547A
- Authority
- CN
- China
- Prior art keywords
- sample data
- slice
- accounting
- samples pictures
- analyzed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2155—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Abstract
This application provides a kind of sample data generation method and the method and devices of device, training pattern, wherein the sample data generation method includes:Samples pictures are obtained, include multiple target categories in samples pictures;It determines and is distributed the first object classification that accounting is less than the first default accounting in samples pictures, and/or distribution accounting is more than the second target category of the second default accounting;Samples pictures are traversed according to preset window size, generate slice to be analyzed;Sample data is determined from slice to be analyzed according to default screening conditions so that obtained sample data meets following condition:The case where for being screening foundation with first object classification, the sample data increasing proportion of first object classification is included in sample data;For being the case where screening foundation with the second target category, the sample data ratio comprising the second target category is reduced in sample data.The application avoids the problem of data nonbalance, and the precision of model training is improved by the equilibrium criterion of structure.
Description
Technical field
This application involves technical field of data processing, in particular to a kind of sample data generation method and device, instruction
Practice the method and device of model.
Background technology
For machine learning, especially deep learning, the operation of most of algorithms is required to a large amount of sample number
Based on.The abundant degree and accuracy of sample data have very important significance for machine learning.
For example, based on deep learning realize semantic segmentation need using a large amount of sample data to neural network model into
Row training, can just enable the neural network model after training obtain preferable semantic segmentation result.Wherein, above-mentioned sample data
May include:A large amount of samples pictures and after carrying out Precise Semantics segmentation according to object category to the object in samples pictures
Picture.
Although the data volume of above-mentioned samples pictures is particularly big, certain class sample data quantity is considerably less than other classes
Sample data quantity, this unbalanced data are often difficult to avoid that in research work.Above-mentioned data nonbalance phenomenon and data
Acquisition modes have inevitable contact, and in the related art, different application scenarios will get different original image collection, and
The size for the original image that the original image is concentrated is usually very big and can not match the size of neural network model, generally can be right
Above-mentioned original image is traversed according to preset window size, and the sample data corresponding to original image collection is obtained with slice.
However, in the related technology due to unintentionally carrying out picture cutting to carry out the acquisition of data, data nonbalance is asked
Topic is serious, big in the composition of sample and two levels of characteristic dimension so as to cause small classification information (being likely to useful information)
Classification information is covered so that semantic segmentation later be often difficult to learn to small classification information and cause the precision of model training compared with
Difference.
Invention content
In view of this, the embodiment of the present application is designed to provide a kind of sample data generation method and device, training mould
The method and device of type, avoids the problem that data nonbalance to a certain extent, and model training is improved by building equilibrium criterion
Precision.
In a first aspect, the embodiment of the present application provides a kind of sample data generation method, the method includes:
Samples pictures are obtained, include multiple target categories in the samples pictures;
It determines and is distributed the first object classification that accounting is less than the first default accounting in the samples pictures, and/or distribution accounts for
Than the second target category more than the second default accounting;
The samples pictures are traversed according to preset window size, generate slice to be analyzed;
Sample data is determined from slice to be analyzed according to default screening conditions so that obtained sample data meets as follows
Condition:
The case where for being screening foundation with first object classification, the sample number of first object classification is included in sample data
According to increasing proportion;The case where for being screening foundation with the second target category, the sample of the second target category is included in sample data
Notebook data ratio is reduced.
With reference to first aspect, the embodiment of the present application provides the first possible embodiment of first aspect, wherein needle
The case where to being screening foundation with first object classification, sample data is determined from slice to be analyzed according to default screening conditions,
Including:
Determine distribution accounting of the first object classification included in slice to be analyzed in the slice to be analyzed;
If it is determined that distribution accounting be more than the first default slice accounting, then the slice to be analyzed is determined as sample data.
With reference to first aspect, the embodiment of the present application provides second of possible embodiment of first aspect, wherein needle
The case where to being screening foundation with the second target category, sample data is determined from slice to be analyzed according to default screening conditions,
Including:
Determine distribution accounting of the second target category included in slice to be analyzed in the slice to be analyzed;
If it is determined that distribution accounting be less than the second default slice accounting, then the slice to be analyzed is determined as sample data.
With reference to first aspect, appoint in the possible embodiment of the first of first aspect and second of possible embodiment
One possible embodiment, the embodiment of the present application provide the third possible embodiment of first aspect, wherein also wrap
It includes:
Pixel in the samples pictures is labeled, mark figure is obtained;Wherein, the pixel of same target category is constituted
Mark value having the same;
According to the corresponding different labeled value of different target classification in the mark figure, determine each target category in the sample
Distribution accounting in picture.
The third possible embodiment with reference to first aspect, the embodiment of the present application provide the 4th kind of first aspect
Possible embodiment, wherein the case where being more than one for samples pictures quantity, according to different target classification in the mark figure
Corresponding different labeled value determines distribution accounting of each target category in the samples pictures, including:
For different target classification, which is corresponded into mark value in the corresponding mark figure of each samples pictures
Total quantity, it is corresponding with each samples pictures mark figure in total number of pixels ratio, be determined as the target category in sample
Distribution accounting in this picture.
With reference to first aspect, the embodiment of the present application provides the 5th kind of possible embodiment of first aspect, wherein presses
Sample data is determined from slice to be analyzed according to default screening conditions, including:
For according to specified preset window size the samples pictures are traversed with the slice to be analyzed generated, according to
Default screening conditions determine sample data from slice to be analyzed.
The 5th kind of possible embodiment with reference to first aspect, the embodiment of the present application provide the 6th kind of first aspect
Possible embodiment, wherein distribution accounting and second target category of the first object classification in the samples pictures are in institute
The difference stated between the distribution accounting in samples pictures is bigger, specifies the big smallest number of preset window more.
Second aspect, the embodiment of the present application also provides the first a kind of based on first aspect, first aspect is possible
The sample data training pattern that embodiment is generated to the possible embodiment of any one of the 6th kind of possible embodiment
Method, the method includes:
Determine the mark sample data of sample data;
Sample data and mark sample data input semantic segmentation model are trained the semantic segmentation model;
Wherein, the semantic segmentation model is for realizing semantic segmentation.
The third aspect, the embodiment of the present application also provides a kind of sample data generating means, described device includes:
Samples pictures acquisition module includes multiple target categories in the samples pictures for obtaining samples pictures;
Target category determining module is distributed first of accounting less than the first default accounting for determining in the samples pictures
Target category, and/or distribution accounting are more than the second target category of the second default accounting;
Slice acquisition module to be analyzed, for being traversed to the samples pictures according to preset window size, generation waits for
Analysis slice;
Sample data generation module, for determining sample data from slice to be analyzed according to default screening conditions so that
Obtained sample data meets following condition:
The case where for being screening foundation with first object classification, the sample number of first object classification is included in sample data
According to increasing proportion;The case where for being screening foundation with the second target category, the sample of the second target category is included in sample data
Notebook data ratio is reduced.
Fourth aspect, the embodiment of the present application also provides a kind of sample data training patterns generated based on the third aspect
Device, described device include:
Labeled data determining module, the mark sample data for determining sample data;
Semantic segmentation model training module, for sample data and mark sample data to be inputted semantic segmentation model pair
The semantic segmentation model is trained;Wherein, the semantic segmentation model is for realizing semantic segmentation.
In the embodiment of the present application, the samples pictures based on acquisition determine first object classification and/or the second target therein
Classification, and after being traversed to above-mentioned samples pictures according to preset window size, generate slice to be analyzed, with can according to point
When safety pin is to being screening foundation with first object classification and the second target category, the sample data for including first object classification is determined
Increasing proportion and screening conditions comprising the corresponding sample data ratio reduction of the second target from above-mentioned slice to be analyzed really
Random sample notebook data, that is, the embodiment of the present application raising group target object (namely the corresponding first object class of first object type
While ratio not) in slice to be analyzed, major class target object (namely corresponding second target of the second target type is reduced
Classification) ratio, avoid the problem that data nonbalance to a certain extent, the sample data of structure reached major class target object and
The balance of both group target objects, further such that the precision for the model that the sample data based on structure is trained compared with
It is high.
To enable the above objects, features, and advantages of the application to be clearer and more comprehensible, preferred embodiment cited below particularly, and coordinate
Appended attached drawing, is described in detail below.
Description of the drawings
It, below will be to needed in the embodiment attached in order to illustrate more clearly of the technical solution of the embodiment of the present application
Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows a kind of flow chart for sample data generation method that the embodiment of the present application is provided;
Fig. 2 shows the flow charts for another sample data generation method that the embodiment of the present application is provided;
Fig. 3 shows the flow chart for another sample data generation method that the embodiment of the present application is provided;
Fig. 4 shows the flow chart for another sample data generation method that the embodiment of the present application is provided;
Fig. 5 shows a kind of method flow diagram for training pattern that the embodiment of the present application is provided;
Fig. 6 shows a kind of structural schematic diagram for sample data generating means that the embodiment of the present application is provided;
Fig. 7 shows a kind of structural schematic diagram for computer equipment that the embodiment of the present application is provided;
Fig. 8 shows a kind of schematic device for training pattern that the embodiment of the present application is provided;
Fig. 9 shows the structural schematic diagram for another computer equipment that the embodiment of the present application is provided.
Specific implementation mode
To keep the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
Middle attached drawing, technical solutions in the embodiments of the present application are clearly and completely described, it is clear that described embodiment is only
It is some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is real
Applying the component of example can be arranged and designed with a variety of different configurations.Therefore, below to the application's for providing in the accompanying drawings
The detailed description of embodiment is not intended to limit claimed scope of the present application, but is merely representative of the selected reality of the application
Apply example.Based on embodiments herein, institute that those skilled in the art are obtained without making creative work
There is other embodiment, shall fall in the protection scope of this application.
In view of due to unintentionally carrying out picture cutting to carry out the acquisition of data, data nonbalance is asked in the related technology
Topic is serious, is covered by big classification information in two levels of the composition of sample and characteristic dimension so as to cause small classification information so that it
Semantic segmentation afterwards is often difficult to learn to cause the precision of model training poor to small classification information.Based on this, the application one
A kind of sample data generation method is proposed in kind embodiment, avoids the problem that data nonbalance to a certain extent, and pass through structure
Equilibrium criterion is built to improve the precision of model training, embodiment as described below.
Referring to Fig. 1, it is the flow chart of sample data generation method provided by the embodiments of the present application, is set applied to computer
Standby, above-mentioned sample data generation method includes the following steps:
S101, samples pictures are obtained, includes multiple target categories in samples pictures.
Here, samples pictures can be that image capture device (such as camera, video camera) absorbs obtained picture, can be with
It is the picture that remote sensor scans.In the embodiment of the present application, in addition to can be from above-mentioned image capture device and remote sensor
Samples pictures are directly acquired, can also be the samples pictures obtained by modes such as data-interface or web crawlers.For number
For interface level, samples pictures can be accurately open from internet site (such as China Resource Satellite Applied Center website)
Data-interface obtained, for web crawlers, web crawlers technology, such as python may be used in the embodiment of the present application
(a kind of explanation type computer programming language of object-oriented) realizes the function of reptile, in the source code desired acquisition
Samples pictures crawl local computer equipment.
S102, it determines that distribution accounting is less than the first object classification of the first default accounting in samples pictures, and/or is distributed and accounts for
Than the second target category more than the second default accounting.
Here, for the samples pictures of above-mentioned acquisition, in the multiple target categories that can include from the samples pictures, only
It determines that distribution accounts for smaller first object classification, can also only determine and be distributed larger the second target category of accounting, it can be with
Not only it had determined first object classification but also had determined the second target category.Wherein, it can be one that distribution, which accounts for smaller first object classification,
A or multiple, similar, the second larger target category of distribution accounting can be one or multiple, the application
Embodiment does not do this specific limitation.
In addition, the embodiment of the present application can determine first mesh based on the corresponding mark value of first object classification in mark figure
The distribution accounting for marking classification, is also based on the corresponding mark value of the second target category in mark figure and determines second target category
Distribution accounting.Wherein, above-mentioned mark figure is obtained after being labeled to the pixel in samples pictures.
For example, it is respectively 10% and 90% to determine that the distribution accounting of two target categories is by mark figure, if first
Default accounting is 15%, and the second default accounting is 85%, in this way, determining that the target category that distribution accounting is 10% is first object
Classification determines that the target category that distribution accounting is 90% is the second target category.
S103, samples pictures are traversed according to preset window size, generates slice to be analyzed.
Here, the traversal in the embodiment of the present application can be multiple, and traversing the preset window size that uses every time can be with
Difference, in this way, traversal can generate the slice to be analyzed corresponding to preset window size every time.
Wherein, distribution accounting in samples pictures of the specified quantity of above-mentioned preset window size and first object classification and
Difference between distribution accounting of second target category in samples pictures is related.When difference is bigger, specify preset window big
Smallest number is more, and hour is got in difference, specifies the big smallest number of preset window fewer.
S104, sample data is determined from slice to be analyzed according to default screening conditions so that obtained sample data symbol
Close following condition:
The case where for being screening foundation with first object classification, the sample number of first object classification is included in sample data
According to increasing proportion;The case where for being screening foundation with the second target category, the sample of the second target category is included in sample data
Notebook data ratio is reduced.
Here, sample data is filtered out from the slice to be analyzed that traversal obtains.In order to ensure distribution account for it is smaller
First object classification and distribution account for more second target category and can reach data balancing, the embodiment of the present application can divide
The following two kinds situation carries out the screening of sample data.
The first situation:Using first object classification as screening foundation, due to the first object classification be distribution accounting compared with
Small target category, that is, the first object classification corresponds to disadvantage classification.In order to realize data balancing, the embodiment of the present application
It can make the sample data increasing proportion for including first object classification from the sample data determined in slice to be analyzed.
The second situation:Using the second target category as screening foundation, due to second target category be distribution accounting compared with
More target category, that is, second target category corresponds to advantage classification.In order to realize data balancing, the embodiment of the present application
It can so that the sample data ratio comprising the second target category is reduced from the sample data determined in slice to be analyzed.
Furthermore it is also possible to simultaneously using first object classification and the second target category as screening foundation, even if sample data
In include the sample data increasing proportion of first object classification, meanwhile, the sample number of the second target category is included in sample data
It is reduced according to ratio.
For the first above-mentioned situation, referring to Fig. 2, the corresponding sample data generating process of above-mentioned S104 especially by
Following steps are realized:
S201, distribution accounting of the first object classification included in slice to be analyzed in the slice to be analyzed is determined;
S202, if it is determined that distribution accounting be more than the first default slice accounting, then the slice to be analyzed is determined as sample
Data.
Here, the embodiment of the present application determines the distribution accounting for the first object classification for including in slice to be analyzed first, so
Afterwards when determining that distribution accounting is more than the first default slice accounting, which is determined as sample data, in determination
When going out to be distributed accounting less than or equal to the first default slice accounting, give up the slice to be analyzed.It that is to say, for slice to be analyzed
For, the distribution accounting of the disadvantage classification (i.e. first object classification) only in the slice to be analyzed is sufficiently large (to be more than first
Default slice accounting) when, just retain corresponding slice to be analyzed, to improve the corresponding sample data ratio of disadvantage classification.
Wherein, the above-mentioned first default slice accounting can be determined based on the first default accounting, the above-mentioned second default slice
Accounting can be determined based on the second default accounting.
For above-mentioned the second situation, referring to Fig. 3, the corresponding sample data generating process of above-mentioned S104 especially by
Following steps are realized:
S301, distribution accounting of the second target category included in slice to be analyzed in the slice to be analyzed is determined;
S302, if it is determined that distribution accounting be less than the second default slice accounting, then the slice to be analyzed is determined as sample
Data.
Here, the embodiment of the present application determines the distribution accounting for the second target category for including in slice to be analyzed first, so
Afterwards when determining that distribution accounting is less than the second default slice accounting, which is determined as sample data, in determination
When going out to be distributed accounting more than or equal to the second default slice accounting, give up the slice to be analyzed.It that is to say, for slice to be analyzed
For, the distribution accounting of the advantage classification (i.e. the second target category) only in the slice to be analyzed is sufficiently small (to be less than second
Default slice accounting) when, just retain corresponding slice to be analyzed, to reduce the corresponding sample data ratio of advantage classification.
Further, complex chart 2 and method shown in Fig. 3, the corresponding sample data generating process of above-mentioned S104 can also
It is achieved by the steps of:
Step 1: determining that distribution of the first object classification in the slice to be analyzed included in slice to be analyzed accounts for
Than;
Step 2: determining that distribution of second target category in the slice to be analyzed included in slice to be analyzed accounts for
Than;
Step 1: the no strict sequence of the execution of step 2.
Step 3: if it is determined that distribution accounting be more than the first default slice accounting, and less than the second default slice accounting, then
The slice to be analyzed is determined as sample data.
As shown in figure 4, sample data generation method provided by the embodiments of the present application determines each target class as follows
Distribution accounting not in samples pictures:
S401, the pixel in samples pictures is labeled, obtains mark figure;Wherein, the picture of same target category is constituted
Element mark value having the same;
S402, according to mark figure in the corresponding different labeled value of different target classification, determine each target category in sample graph
Distribution accounting in piece.
Here, the embodiment of the present application is labeled samples pictures according to pixel scale, obtains corresponding mark figure, then
Based on the corresponding different labeled value of different target classification in mark figure, determine that distribution of each target category in samples pictures accounts for
Than.Next to determining that it is detailed that the process of the distribution accounting of first object classification and the second target category in samples pictures carries out
It illustrates.
For first object classification, schemed according to the corresponding mark value of first object classification in above-mentioned mark figure and mark
Pixel quantity between ratio, determine distribution accounting of the first object classification in samples pictures.In addition, in samples pictures
Quantity is more than for the moment, and it is total in the corresponding mark figure of each samples pictures that mark value can be corresponded to according to first object classification
Quantity, the ratio for marking total number of pixels in figure corresponding with each samples pictures, is determined as above-mentioned first object classification and exists
Distribution accounting in samples pictures.
For the second target category, schemed according to the corresponding mark value of the second target category in above-mentioned mark figure and mark
Pixel quantity between ratio, determine distribution accounting of second target category in samples pictures.In addition, in samples pictures
Quantity is more than for the moment, and it is total in the corresponding mark figure of each samples pictures that mark value can be corresponded to according to the second target category
Quantity, the ratio for marking total number of pixels in figure corresponding with each samples pictures, is determined as above-mentioned second target category and exists
Distribution accounting in samples pictures.
Sample data generation method provided by the embodiments of the present application can be directed to according to specified preset window size to sample
Picture is traversed the slice to be analyzed generated, and sample data is determined from slice to be analyzed according to default screening conditions.
Specifically, in the embodiment of the present application, can be traversed according to multiple preset window sizes multiple.In this way, for the first time
When traversal, obtained slice to be analyzed is all left sample data, in the embodiment of the present application, can be opened when traversing for second
Beginning discard portion slice to be analyzed can also start discard portion slice to be analyzed when third time traverses, can also be the 4th
Secondary, the 5th inferior traversal is to start to give up.If distribution accounting and second target category of the first object classification in samples pictures
The difference between distribution accounting in samples pictures is bigger namely the difference of advantage classification and disadvantage classification is bigger, corresponding
Data nonbalance phenomenon is also more serious, may begin to discard portion slice to be analyzed when traversing for second at this time, that
The big smallest number of preset window of corresponding instruction is also more.If distribution accounting and of the first object classification in samples pictures
Difference between distribution accounting of two target categories in samples pictures is smaller namely the difference of advantage classification and disadvantage classification not
Greatly, just start discard portion slice to be analyzed when may be traversed at this time at the 5th time, then the preset window size of corresponding instruction
Quantity is also fewer.
Based on the sample data that above-described embodiment generates, the embodiment of the present application also provides a kind of method of training pattern,
As shown in figure 5, the flow chart of the method for training pattern provided by the embodiments of the present application, is applied to computer equipment, above-mentioned instruction
The method for practicing model includes the following steps:
S501, the mark sample data for determining sample data;
S502, sample data and mark sample data input semantic segmentation model instruct semantic segmentation model
Practice;Wherein, semantic segmentation model is for realizing semantic segmentation.
Here, above using sample data as the input feature vector of semantic segmentation model to be trained in model training stage
It states the corresponding mark sample data of sample data and obtains the parameter information of semantic segmentation model as a result, training as output, namely
Obtain trained semantic segmentation model.Neural network model may be used as semantic segmentation model, mould in the embodiment of the present application
The type training stage namely trains the process of some unknown parameter informations in neural network model.Later, so that it may to be based on this
Semantic segmentation model is to provide semantic segmentation to service, and only needs target data inputting trained semantic segmentation model at this time
In can be obtained corresponding label target data, can restore the effect after semantic segmentation based on the label target data
Figure.
Based on same inventive concept, sample number corresponding with sample data generation method is additionally provided in the embodiment of the present application
According to generating means, since the principle that the device in the embodiment of the present application solves the problems, such as is given birth to the above-mentioned sample data of the embodiment of the present application
It is similar at method, therefore the implementation of device may refer to the implementation of method, overlaps will not be repeated.As shown in fig. 6, being this Shen
Please the structural schematic diagram of sample data generating means that is provided of embodiment, which includes:
Samples pictures acquisition module 61 includes multiple target categories in samples pictures for obtaining samples pictures;
Target category determining module 62, for determining the first mesh for being distributed accounting in samples pictures and being less than the first default accounting
Classification is marked, and/or distribution accounting is more than the second target category of the second default accounting;
Slice acquisition module 63 to be analyzed is generated and is waited for point for being traversed to samples pictures according to preset window size
Analysis slice;
Sample data generation module 64 makes for determining sample data from slice to be analyzed according to default screening conditions
The sample data that must be obtained meets following condition:
The case where for being screening foundation with first object classification, the sample number of first object classification is included in sample data
According to increasing proportion;The case where for being screening foundation with the second target category, the sample of the second target category is included in sample data
Notebook data ratio is reduced.
In a kind of possible embodiment, above-mentioned sample data generation module 64 is specifically used for determining slice to be analyzed
Included in distribution accounting of the first object classification in the slice to be analyzed;If it is determined that distribution accounting to be more than first default
It is sliced accounting, then the slice to be analyzed is determined as sample data.
In alternatively possible embodiment, above-mentioned sample data generation module 64 is cut specifically for determination is to be analyzed
Distribution accounting of second target category included in piece in the slice to be analyzed;If it is determined that distribution accounting to be less than second pre-
If being sliced accounting, then the slice to be analyzed is determined as sample data.
Above-mentioned sample data generating means further include:
Pixel labeling module 65 obtains mark figure for being labeled to the pixel in samples pictures;Wherein, it constitutes same
The pixel mark value having the same of one target category;
It is distributed accounting determining module 66, for according to the corresponding different labeled value of different target classification in mark figure, determining
Distribution accounting of each target category in samples pictures.
In specific implementation, the case where being more than one for samples pictures quantity, above-mentioned distribution accounting determining module 66, specifically
For being directed to different target classification, it is total in the corresponding mark figure of each samples pictures which is corresponded into mark value
Quantity, the ratio for marking total number of pixels in figure corresponding with each samples pictures, is determined as the target category in sample graph
Distribution accounting in piece.
In another possible embodiment, above-mentioned sample data generation module 64 is specifically used for for according to specified
Preset window size traverses samples pictures the slice to be analyzed generated, according to default screening conditions from slice to be analyzed
Middle determining sample data.
In specific implementation, distribution accounting and second target category of the first object classification in samples pictures are in sample graph
The difference between distribution accounting in piece is bigger, specifies the big smallest number of preset window more.
As shown in fig. 7, a kind of structural schematic diagram of the computer equipment provided by the embodiment of the present application, the computer are set
It is standby to include:Processor 71, memory 72 and bus 73, memory 72 storage execute instruction, when device run, processor 71 and
It is communicated by bus 73 between memory 72, what is stored in the execution memory 72 of processor 71 executes instruction as follows:
Samples pictures are obtained, include multiple target categories in samples pictures;
It determines and is distributed the first object classification that accounting is less than the first default accounting in samples pictures, and/or distribution accounting is big
In the second target category of the second default accounting;
Samples pictures are traversed according to preset window size, generate slice to be analyzed;
Sample data is determined from slice to be analyzed according to default screening conditions so that obtained sample data meets as follows
Condition:
The case where for being screening foundation with first object classification, the sample number of first object classification is included in sample data
According to increasing proportion;The case where for being screening foundation with the second target category, the sample of the second target category is included in sample data
Notebook data ratio is reduced.
In a kind of possible embodiment, above-mentioned processor 71 is specifically used for determining included in slice to be analyzed
Distribution accounting of the first object classification in the slice to be analyzed;If it is determined that distribution accounting be more than the first default slice accounting,
The slice to be analyzed is then determined as sample data.
In alternatively possible embodiment, above-mentioned processor 71 is specifically used for determining included in slice to be analyzed
Distribution accounting of second target category in the slice to be analyzed;If it is determined that distribution accounting be less than the second default slice and account for
Than the slice to be analyzed is then determined as sample data.
Above-mentioned processor 71 is additionally operable to, and is labeled to the pixel in samples pictures, and mark figure is obtained;Wherein, it constitutes same
The pixel mark value having the same of one target category;According to the corresponding different labeled value of different target classification in mark figure, really
Fixed distribution accounting of each target category in samples pictures.
In specific implementation, the case where being more than one for samples pictures quantity, above-mentioned processor 71 are specifically used for for not
The target category is corresponded to total quantity of the mark value in the corresponding mark figure of each samples pictures by same target category, and each
The ratio of total number of pixels, is determined as distribution of the target category in samples pictures in the corresponding mark figure of samples pictures
Accounting.
In another possible embodiment, above-mentioned processor 71 is specifically used for for big according to specified preset window
It is small that samples pictures are traversed with the slice to be analyzed generated, sample is determined from slice to be analyzed according to default screening conditions
Data.
In specific implementation, distribution accounting and second target category of the first object classification in samples pictures are in sample graph
The difference between distribution accounting in piece is bigger, and above-mentioned processor 71 specifies the big smallest number of preset window more.
Based on same inventive concept, training pattern corresponding with the method for training pattern is additionally provided in the embodiment of the present application
Device, due to the method for principle and the above-mentioned training pattern of the embodiment of the present application that the device in the embodiment of the present application solves the problems, such as
It is similar, therefore the implementation of device may refer to the implementation of method, overlaps will not be repeated.As shown in figure 8, implementing for the application
The schematic device for the training pattern that example is provided, the device of the training pattern include:
Labeled data determining module 81, the mark sample data for determining sample data;
Semantic segmentation model training module 82, for sample data and mark sample data to be inputted semantic segmentation model
Semantic segmentation model is trained;Wherein, semantic segmentation model is for realizing semantic segmentation.
As shown in figure 9, a kind of structural schematic diagram of the computer equipment provided by the embodiment of the present application, the computer are set
It is standby to include:Processor 91, memory 92 and bus 93, memory 92 storage execute instruction, when device run, processor 91 and
It is communicated by bus 93 between memory 92, what is stored in the execution memory 92 of processor 91 executes instruction as follows:
Determine the mark sample data of sample data;
Sample data and mark sample data input semantic segmentation model are trained semantic segmentation model;Its
In, semantic segmentation model is for realizing semantic segmentation.
The computer program product of sample data generation method, the method for training pattern that the embodiment of the present application is provided,
Computer readable storage medium including storing program code, the instruction that program code includes can be used for executing previous methods real
The method in example is applied, specific implementation can be found in embodiment of the method, and details are not described herein.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description
It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
If function is realized in the form of SFU software functional unit and when sold or used as an independent product, can store
In a computer read/write memory medium.Based on this understanding, the technical solution of the application is substantially in other words to existing
There is the part for the part or the technical solution that technology contributes that can be expressed in the form of software products, the computer
Software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be personal meter
Calculation machine, server or network equipment etc.) execute each embodiment method of the application all or part of step.And it is above-mentioned
Storage medium includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory
The various media that can store program code such as (RAM, Random Access Memory), magnetic disc or CD.
More than, the only specific implementation mode of the application, but the protection domain of the application is not limited thereto, and it is any to be familiar with
Those skilled in the art can easily think of the change or the replacement in the technical scope that the application discloses, and should all cover
Within the protection domain of the application.Therefore, the protection domain of the application should be subject to the protection scope in claims.
Claims (10)
1. a kind of sample data generation method, which is characterized in that the method includes:
Samples pictures are obtained, include multiple target categories in the samples pictures;
It determines and is distributed the first object classification that accounting is less than the first default accounting in the samples pictures, and/or distribution accounting is big
In the second target category of the second default accounting;
The samples pictures are traversed according to preset window size, generate slice to be analyzed;
Sample data is determined from slice to be analyzed according to default screening conditions so that obtained sample data meets following item
Part:
The case where for being screening foundation with first object classification, the sample data ratio of first object classification is included in sample data
Example increases;The case where for being screening foundation with the second target category, the sample number of the second target category is included in sample data
It is reduced according to ratio.
2. according to the method described in claim 1, it is characterized in that, for first object classification be screening foundation the case where,
Sample data is determined from slice to be analyzed according to default screening conditions, including:
Determine distribution accounting of the first object classification included in slice to be analyzed in the slice to be analyzed;
If it is determined that distribution accounting be more than the first default slice accounting, then the slice to be analyzed is determined as sample data.
3. according to the method described in claim 1, it is characterized in that, for the second target category be screening foundation the case where,
Sample data is determined from slice to be analyzed according to default screening conditions, including:
Determine distribution accounting of the second target category included in slice to be analyzed in the slice to be analyzed;
If it is determined that distribution accounting be less than the second default slice accounting, then the slice to be analyzed is determined as sample data.
4. according to claim 1-3 any one of them methods, which is characterized in that further include:
Pixel in the samples pictures is labeled, mark figure is obtained;Wherein, the pixel for constituting same target category has
Identical mark value;
According to the corresponding different labeled value of different target classification in the mark figure, determine each target category in the samples pictures
In distribution accounting.
5. according to the method described in claim 4, it is characterized in that, the case where samples pictures quantity is more than one is directed to, according to institute
The corresponding different labeled value of different target classification in mark figure is stated, determines that distribution of each target category in the samples pictures accounts for
Than, including:
For different target classification, it is total in the corresponding mark figure of each samples pictures which is corresponded into mark value
Quantity, the ratio for marking total number of pixels in figure corresponding with each samples pictures, is determined as the target category in sample graph
Distribution accounting in piece.
6. according to the method described in claim 1, it is characterized in that, determining sample from slice to be analyzed according to default screening conditions
Notebook data, including:
For according to specified preset window size the samples pictures are traversed with the slice to be analyzed generated, according to default
Screening conditions determine sample data from slice to be analyzed.
7. according to the method described in claim 6, it is characterized in that, distribution of the first object classification in the samples pictures accounts for
It is bigger than the difference between distribution accounting of second target category in the samples pictures, specify the big smallest number of preset window
It is more.
8. a kind of method of the sample data training pattern generated based on any one of claim 1 to 7, which is characterized in that institute
The method of stating includes:
Determine the mark sample data of sample data;
Sample data and mark sample data input semantic segmentation model are trained the semantic segmentation model;Its
In, the semantic segmentation model is for realizing semantic segmentation.
9. a kind of sample data generating means, which is characterized in that described device includes:
Samples pictures acquisition module includes multiple target categories in the samples pictures for obtaining samples pictures;
Target category determining module, for determining the first object for being distributed accounting in the samples pictures and being less than the first default accounting
Classification, and/or distribution accounting are more than the second target category of the second default accounting;
Slice acquisition module to be analyzed generates to be analyzed for being traversed to the samples pictures according to preset window size
Slice;
Sample data generation module, for determining sample data from slice to be analyzed according to default screening conditions so that obtain
Sample data meet following condition:
The case where for being screening foundation with first object classification, the sample data ratio of first object classification is included in sample data
Example increases;The case where for being screening foundation with the second target category, the sample number of the second target category is included in sample data
It is reduced according to ratio.
10. a kind of device of the sample data training pattern generated based on claim 9, which is characterized in that described device includes:
Labeled data determining module, the mark sample data for determining sample data;
Semantic segmentation model training module, for sample data and mark sample data to be inputted semantic segmentation model to described
Semantic segmentation model is trained;Wherein, the semantic segmentation model is for realizing semantic segmentation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810289135.2A CN108717547B (en) | 2018-03-30 | 2018-03-30 | Sample data generation method and device and model training method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810289135.2A CN108717547B (en) | 2018-03-30 | 2018-03-30 | Sample data generation method and device and model training method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108717547A true CN108717547A (en) | 2018-10-30 |
CN108717547B CN108717547B (en) | 2020-12-22 |
Family
ID=63898986
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810289135.2A Active CN108717547B (en) | 2018-03-30 | 2018-03-30 | Sample data generation method and device and model training method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108717547B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109409432A (en) * | 2018-10-31 | 2019-03-01 | 腾讯科技(深圳)有限公司 | A kind of image processing method, device and storage medium |
CN110162649A (en) * | 2019-05-24 | 2019-08-23 | 北京百度网讯科技有限公司 | Sample data acquisition methods obtain system, server and computer-readable medium |
CN110889462A (en) * | 2019-12-09 | 2020-03-17 | 秒针信息技术有限公司 | Data processing method, device, equipment and storage medium |
WO2020093718A1 (en) * | 2018-11-08 | 2020-05-14 | 北京字节跳动网络技术有限公司 | Training data re-sampling method and apparatus, and storage medium and electronic device |
CN111382569A (en) * | 2018-12-27 | 2020-07-07 | 深圳市优必选科技有限公司 | Method and device for recognizing entities in dialogue corpus and computer equipment |
CN113743431A (en) * | 2020-05-29 | 2021-12-03 | 阿里巴巴集团控股有限公司 | Data selection method and device |
CN111382569B (en) * | 2018-12-27 | 2024-05-03 | 深圳市优必选科技有限公司 | Method and device for identifying entity in dialogue corpus and computer equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105354595A (en) * | 2015-10-30 | 2016-02-24 | 苏州大学 | Robust visual image classification method and system |
CN105740894A (en) * | 2016-01-28 | 2016-07-06 | 北京航空航天大学 | Semantic annotation method for hyperspectral remote sensing image |
CN105844287A (en) * | 2016-03-15 | 2016-08-10 | 民政部国家减灾中心 | Domain self-adaptive method and system for remote sensing image classification |
CN106530305A (en) * | 2016-09-23 | 2017-03-22 | 北京市商汤科技开发有限公司 | Semantic segmentation model training and image segmentation method and device, and calculating equipment |
-
2018
- 2018-03-30 CN CN201810289135.2A patent/CN108717547B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105354595A (en) * | 2015-10-30 | 2016-02-24 | 苏州大学 | Robust visual image classification method and system |
CN105740894A (en) * | 2016-01-28 | 2016-07-06 | 北京航空航天大学 | Semantic annotation method for hyperspectral remote sensing image |
CN105844287A (en) * | 2016-03-15 | 2016-08-10 | 民政部国家减灾中心 | Domain self-adaptive method and system for remote sensing image classification |
CN106530305A (en) * | 2016-09-23 | 2017-03-22 | 北京市商汤科技开发有限公司 | Semantic segmentation model training and image segmentation method and device, and calculating equipment |
Non-Patent Citations (5)
Title |
---|
HAJIZADEH ET AL.: ""Semi-supervised rail defect detection from imbalanced image data"", 《PROCEEDINGS OF THE 14TH IFAC SYMPOSIUM ON CONTROL IN TRANSPORTATION SYSTEMS》 * |
刘胥影 等: ""一种基于级联模型的类别不平衡数据分类方法"", 《南京大学学报(自然科学)》 * |
姜枫 等: ""基于内容的图像分割方法综述"", 《软件学报》 * |
李勇 等: ""不平衡数据的集成分类算法综述"", 《计算机应用研究》 * |
钱洪波 等: ""非平衡类数据分类概述"", 《计算机工程与科学》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109409432A (en) * | 2018-10-31 | 2019-03-01 | 腾讯科技(深圳)有限公司 | A kind of image processing method, device and storage medium |
CN109409432B (en) * | 2018-10-31 | 2019-11-26 | 腾讯科技(深圳)有限公司 | A kind of image processing method, device and storage medium |
WO2020093718A1 (en) * | 2018-11-08 | 2020-05-14 | 北京字节跳动网络技术有限公司 | Training data re-sampling method and apparatus, and storage medium and electronic device |
CN111382569A (en) * | 2018-12-27 | 2020-07-07 | 深圳市优必选科技有限公司 | Method and device for recognizing entities in dialogue corpus and computer equipment |
CN111382569B (en) * | 2018-12-27 | 2024-05-03 | 深圳市优必选科技有限公司 | Method and device for identifying entity in dialogue corpus and computer equipment |
CN110162649A (en) * | 2019-05-24 | 2019-08-23 | 北京百度网讯科技有限公司 | Sample data acquisition methods obtain system, server and computer-readable medium |
CN110162649B (en) * | 2019-05-24 | 2021-06-18 | 北京百度网讯科技有限公司 | Sample data acquisition method, acquisition system, server and computer readable medium |
CN110889462A (en) * | 2019-12-09 | 2020-03-17 | 秒针信息技术有限公司 | Data processing method, device, equipment and storage medium |
CN110889462B (en) * | 2019-12-09 | 2023-05-02 | 秒针信息技术有限公司 | Data processing method, device, equipment and storage medium |
CN113743431A (en) * | 2020-05-29 | 2021-12-03 | 阿里巴巴集团控股有限公司 | Data selection method and device |
CN113743431B (en) * | 2020-05-29 | 2024-04-02 | 阿里巴巴集团控股有限公司 | Data selection method and device |
Also Published As
Publication number | Publication date |
---|---|
CN108717547B (en) | 2020-12-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108717547A (en) | The method and device of sample data generation method and device, training pattern | |
US11443190B2 (en) | Processing cell images using neural networks | |
CN110276066B (en) | Entity association relation analysis method and related device | |
US20230066050A1 (en) | Image classification modeling while maintaining data privacy compliance | |
CN109564575B (en) | Classifying images using machine learning models | |
CN111160569A (en) | Application development method and device based on machine learning model and electronic equipment | |
CN112215171B (en) | Target detection method, device, equipment and computer readable storage medium | |
CN114067119B (en) | Training method of panorama segmentation model, panorama segmentation method and device | |
CN107977624A (en) | A kind of semantic segmentation method, apparatus and system | |
CN108255689A (en) | A kind of Apache Spark application automation tuning methods based on historic task analysis | |
US11501655B2 (en) | Automated skill tagging, knowledge graph, and customized assessment and exercise generation | |
CN114330588A (en) | Picture classification method, picture classification model training method and related device | |
CN114730486A (en) | Generating training data for object detection | |
CN103544170B (en) | Browse appraisal procedure and the device of quality | |
CN110909768A (en) | Method and device for acquiring marked data | |
CN110363245B (en) | Online classroom highlight screening method, device and system | |
CN112527676A (en) | Model automation test method, device and storage medium | |
CN110765872A (en) | Online mathematical education resource classification method based on visual features | |
Asaoka et al. | Nonnegative/Binary matrix factorization for image classification using quantum annealing | |
CN113010687B (en) | Exercise label prediction method and device, storage medium and computer equipment | |
CN111428724B (en) | Examination paper handwriting statistics method, device and storage medium | |
US20210312223A1 (en) | Automated determination of textual overlap between classes for machine learning | |
CN113223017A (en) | Training method of target segmentation model, target segmentation method and device | |
CN114519404B (en) | Image sample classification labeling method, device, equipment and storage medium | |
Erichsen et al. | On Applying Machine Learning/Object Detection Models for Analysing Digitally Captured Physical Prototypes from Engineering Design Projects |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 101-8, 1st floor, building 31, area 1, 188 South Fourth Ring Road West, Fengtai District, Beijing Applicant after: Guoxin Youyi Data Co., Ltd Address before: 100070, No. 188, building 31, headquarters square, South Fourth Ring Road West, Fengtai District, Beijing Applicant before: SIC YOUE DATA Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |