CN108717547A - The method and device of sample data generation method and device, training pattern - Google Patents

The method and device of sample data generation method and device, training pattern Download PDF

Info

Publication number
CN108717547A
CN108717547A CN201810289135.2A CN201810289135A CN108717547A CN 108717547 A CN108717547 A CN 108717547A CN 201810289135 A CN201810289135 A CN 201810289135A CN 108717547 A CN108717547 A CN 108717547A
Authority
CN
China
Prior art keywords
sample data
slice
accounting
samples pictures
analyzed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810289135.2A
Other languages
Chinese (zh)
Other versions
CN108717547B (en
Inventor
刘萌
夏珺峥
李长升
孙源良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guoxin Youe Data Co Ltd
Original Assignee
Guoxin Youe Data Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guoxin Youe Data Co Ltd filed Critical Guoxin Youe Data Co Ltd
Priority to CN201810289135.2A priority Critical patent/CN108717547B/en
Publication of CN108717547A publication Critical patent/CN108717547A/en
Application granted granted Critical
Publication of CN108717547B publication Critical patent/CN108717547B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Abstract

This application provides a kind of sample data generation method and the method and devices of device, training pattern, wherein the sample data generation method includes:Samples pictures are obtained, include multiple target categories in samples pictures;It determines and is distributed the first object classification that accounting is less than the first default accounting in samples pictures, and/or distribution accounting is more than the second target category of the second default accounting;Samples pictures are traversed according to preset window size, generate slice to be analyzed;Sample data is determined from slice to be analyzed according to default screening conditions so that obtained sample data meets following condition:The case where for being screening foundation with first object classification, the sample data increasing proportion of first object classification is included in sample data;For being the case where screening foundation with the second target category, the sample data ratio comprising the second target category is reduced in sample data.The application avoids the problem of data nonbalance, and the precision of model training is improved by the equilibrium criterion of structure.

Description

The method and device of sample data generation method and device, training pattern
Technical field
This application involves technical field of data processing, in particular to a kind of sample data generation method and device, instruction Practice the method and device of model.
Background technology
For machine learning, especially deep learning, the operation of most of algorithms is required to a large amount of sample number Based on.The abundant degree and accuracy of sample data have very important significance for machine learning.
For example, based on deep learning realize semantic segmentation need using a large amount of sample data to neural network model into Row training, can just enable the neural network model after training obtain preferable semantic segmentation result.Wherein, above-mentioned sample data May include:A large amount of samples pictures and after carrying out Precise Semantics segmentation according to object category to the object in samples pictures Picture.
Although the data volume of above-mentioned samples pictures is particularly big, certain class sample data quantity is considerably less than other classes Sample data quantity, this unbalanced data are often difficult to avoid that in research work.Above-mentioned data nonbalance phenomenon and data Acquisition modes have inevitable contact, and in the related art, different application scenarios will get different original image collection, and The size for the original image that the original image is concentrated is usually very big and can not match the size of neural network model, generally can be right Above-mentioned original image is traversed according to preset window size, and the sample data corresponding to original image collection is obtained with slice.
However, in the related technology due to unintentionally carrying out picture cutting to carry out the acquisition of data, data nonbalance is asked Topic is serious, big in the composition of sample and two levels of characteristic dimension so as to cause small classification information (being likely to useful information) Classification information is covered so that semantic segmentation later be often difficult to learn to small classification information and cause the precision of model training compared with Difference.
Invention content
In view of this, the embodiment of the present application is designed to provide a kind of sample data generation method and device, training mould The method and device of type, avoids the problem that data nonbalance to a certain extent, and model training is improved by building equilibrium criterion Precision.
In a first aspect, the embodiment of the present application provides a kind of sample data generation method, the method includes:
Samples pictures are obtained, include multiple target categories in the samples pictures;
It determines and is distributed the first object classification that accounting is less than the first default accounting in the samples pictures, and/or distribution accounts for Than the second target category more than the second default accounting;
The samples pictures are traversed according to preset window size, generate slice to be analyzed;
Sample data is determined from slice to be analyzed according to default screening conditions so that obtained sample data meets as follows Condition:
The case where for being screening foundation with first object classification, the sample number of first object classification is included in sample data According to increasing proportion;The case where for being screening foundation with the second target category, the sample of the second target category is included in sample data Notebook data ratio is reduced.
With reference to first aspect, the embodiment of the present application provides the first possible embodiment of first aspect, wherein needle The case where to being screening foundation with first object classification, sample data is determined from slice to be analyzed according to default screening conditions, Including:
Determine distribution accounting of the first object classification included in slice to be analyzed in the slice to be analyzed;
If it is determined that distribution accounting be more than the first default slice accounting, then the slice to be analyzed is determined as sample data.
With reference to first aspect, the embodiment of the present application provides second of possible embodiment of first aspect, wherein needle The case where to being screening foundation with the second target category, sample data is determined from slice to be analyzed according to default screening conditions, Including:
Determine distribution accounting of the second target category included in slice to be analyzed in the slice to be analyzed;
If it is determined that distribution accounting be less than the second default slice accounting, then the slice to be analyzed is determined as sample data.
With reference to first aspect, appoint in the possible embodiment of the first of first aspect and second of possible embodiment One possible embodiment, the embodiment of the present application provide the third possible embodiment of first aspect, wherein also wrap It includes:
Pixel in the samples pictures is labeled, mark figure is obtained;Wherein, the pixel of same target category is constituted Mark value having the same;
According to the corresponding different labeled value of different target classification in the mark figure, determine each target category in the sample Distribution accounting in picture.
The third possible embodiment with reference to first aspect, the embodiment of the present application provide the 4th kind of first aspect Possible embodiment, wherein the case where being more than one for samples pictures quantity, according to different target classification in the mark figure Corresponding different labeled value determines distribution accounting of each target category in the samples pictures, including:
For different target classification, which is corresponded into mark value in the corresponding mark figure of each samples pictures Total quantity, it is corresponding with each samples pictures mark figure in total number of pixels ratio, be determined as the target category in sample Distribution accounting in this picture.
With reference to first aspect, the embodiment of the present application provides the 5th kind of possible embodiment of first aspect, wherein presses Sample data is determined from slice to be analyzed according to default screening conditions, including:
For according to specified preset window size the samples pictures are traversed with the slice to be analyzed generated, according to Default screening conditions determine sample data from slice to be analyzed.
The 5th kind of possible embodiment with reference to first aspect, the embodiment of the present application provide the 6th kind of first aspect Possible embodiment, wherein distribution accounting and second target category of the first object classification in the samples pictures are in institute The difference stated between the distribution accounting in samples pictures is bigger, specifies the big smallest number of preset window more.
Second aspect, the embodiment of the present application also provides the first a kind of based on first aspect, first aspect is possible The sample data training pattern that embodiment is generated to the possible embodiment of any one of the 6th kind of possible embodiment Method, the method includes:
Determine the mark sample data of sample data;
Sample data and mark sample data input semantic segmentation model are trained the semantic segmentation model; Wherein, the semantic segmentation model is for realizing semantic segmentation.
The third aspect, the embodiment of the present application also provides a kind of sample data generating means, described device includes:
Samples pictures acquisition module includes multiple target categories in the samples pictures for obtaining samples pictures;
Target category determining module is distributed first of accounting less than the first default accounting for determining in the samples pictures Target category, and/or distribution accounting are more than the second target category of the second default accounting;
Slice acquisition module to be analyzed, for being traversed to the samples pictures according to preset window size, generation waits for Analysis slice;
Sample data generation module, for determining sample data from slice to be analyzed according to default screening conditions so that Obtained sample data meets following condition:
The case where for being screening foundation with first object classification, the sample number of first object classification is included in sample data According to increasing proportion;The case where for being screening foundation with the second target category, the sample of the second target category is included in sample data Notebook data ratio is reduced.
Fourth aspect, the embodiment of the present application also provides a kind of sample data training patterns generated based on the third aspect Device, described device include:
Labeled data determining module, the mark sample data for determining sample data;
Semantic segmentation model training module, for sample data and mark sample data to be inputted semantic segmentation model pair The semantic segmentation model is trained;Wherein, the semantic segmentation model is for realizing semantic segmentation.
In the embodiment of the present application, the samples pictures based on acquisition determine first object classification and/or the second target therein Classification, and after being traversed to above-mentioned samples pictures according to preset window size, generate slice to be analyzed, with can according to point When safety pin is to being screening foundation with first object classification and the second target category, the sample data for including first object classification is determined Increasing proportion and screening conditions comprising the corresponding sample data ratio reduction of the second target from above-mentioned slice to be analyzed really Random sample notebook data, that is, the embodiment of the present application raising group target object (namely the corresponding first object class of first object type While ratio not) in slice to be analyzed, major class target object (namely corresponding second target of the second target type is reduced Classification) ratio, avoid the problem that data nonbalance to a certain extent, the sample data of structure reached major class target object and The balance of both group target objects, further such that the precision for the model that the sample data based on structure is trained compared with It is high.
To enable the above objects, features, and advantages of the application to be clearer and more comprehensible, preferred embodiment cited below particularly, and coordinate Appended attached drawing, is described in detail below.
Description of the drawings
It, below will be to needed in the embodiment attached in order to illustrate more clearly of the technical solution of the embodiment of the present application Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows a kind of flow chart for sample data generation method that the embodiment of the present application is provided;
Fig. 2 shows the flow charts for another sample data generation method that the embodiment of the present application is provided;
Fig. 3 shows the flow chart for another sample data generation method that the embodiment of the present application is provided;
Fig. 4 shows the flow chart for another sample data generation method that the embodiment of the present application is provided;
Fig. 5 shows a kind of method flow diagram for training pattern that the embodiment of the present application is provided;
Fig. 6 shows a kind of structural schematic diagram for sample data generating means that the embodiment of the present application is provided;
Fig. 7 shows a kind of structural schematic diagram for computer equipment that the embodiment of the present application is provided;
Fig. 8 shows a kind of schematic device for training pattern that the embodiment of the present application is provided;
Fig. 9 shows the structural schematic diagram for another computer equipment that the embodiment of the present application is provided.
Specific implementation mode
To keep the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application Middle attached drawing, technical solutions in the embodiments of the present application are clearly and completely described, it is clear that described embodiment is only It is some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is real Applying the component of example can be arranged and designed with a variety of different configurations.Therefore, below to the application's for providing in the accompanying drawings The detailed description of embodiment is not intended to limit claimed scope of the present application, but is merely representative of the selected reality of the application Apply example.Based on embodiments herein, institute that those skilled in the art are obtained without making creative work There is other embodiment, shall fall in the protection scope of this application.
In view of due to unintentionally carrying out picture cutting to carry out the acquisition of data, data nonbalance is asked in the related technology Topic is serious, is covered by big classification information in two levels of the composition of sample and characteristic dimension so as to cause small classification information so that it Semantic segmentation afterwards is often difficult to learn to cause the precision of model training poor to small classification information.Based on this, the application one A kind of sample data generation method is proposed in kind embodiment, avoids the problem that data nonbalance to a certain extent, and pass through structure Equilibrium criterion is built to improve the precision of model training, embodiment as described below.
Referring to Fig. 1, it is the flow chart of sample data generation method provided by the embodiments of the present application, is set applied to computer Standby, above-mentioned sample data generation method includes the following steps:
S101, samples pictures are obtained, includes multiple target categories in samples pictures.
Here, samples pictures can be that image capture device (such as camera, video camera) absorbs obtained picture, can be with It is the picture that remote sensor scans.In the embodiment of the present application, in addition to can be from above-mentioned image capture device and remote sensor Samples pictures are directly acquired, can also be the samples pictures obtained by modes such as data-interface or web crawlers.For number For interface level, samples pictures can be accurately open from internet site (such as China Resource Satellite Applied Center website) Data-interface obtained, for web crawlers, web crawlers technology, such as python may be used in the embodiment of the present application (a kind of explanation type computer programming language of object-oriented) realizes the function of reptile, in the source code desired acquisition Samples pictures crawl local computer equipment.
S102, it determines that distribution accounting is less than the first object classification of the first default accounting in samples pictures, and/or is distributed and accounts for Than the second target category more than the second default accounting.
Here, for the samples pictures of above-mentioned acquisition, in the multiple target categories that can include from the samples pictures, only It determines that distribution accounts for smaller first object classification, can also only determine and be distributed larger the second target category of accounting, it can be with Not only it had determined first object classification but also had determined the second target category.Wherein, it can be one that distribution, which accounts for smaller first object classification, A or multiple, similar, the second larger target category of distribution accounting can be one or multiple, the application Embodiment does not do this specific limitation.
In addition, the embodiment of the present application can determine first mesh based on the corresponding mark value of first object classification in mark figure The distribution accounting for marking classification, is also based on the corresponding mark value of the second target category in mark figure and determines second target category Distribution accounting.Wherein, above-mentioned mark figure is obtained after being labeled to the pixel in samples pictures.
For example, it is respectively 10% and 90% to determine that the distribution accounting of two target categories is by mark figure, if first Default accounting is 15%, and the second default accounting is 85%, in this way, determining that the target category that distribution accounting is 10% is first object Classification determines that the target category that distribution accounting is 90% is the second target category.
S103, samples pictures are traversed according to preset window size, generates slice to be analyzed.
Here, the traversal in the embodiment of the present application can be multiple, and traversing the preset window size that uses every time can be with Difference, in this way, traversal can generate the slice to be analyzed corresponding to preset window size every time.
Wherein, distribution accounting in samples pictures of the specified quantity of above-mentioned preset window size and first object classification and Difference between distribution accounting of second target category in samples pictures is related.When difference is bigger, specify preset window big Smallest number is more, and hour is got in difference, specifies the big smallest number of preset window fewer.
S104, sample data is determined from slice to be analyzed according to default screening conditions so that obtained sample data symbol Close following condition:
The case where for being screening foundation with first object classification, the sample number of first object classification is included in sample data According to increasing proportion;The case where for being screening foundation with the second target category, the sample of the second target category is included in sample data Notebook data ratio is reduced.
Here, sample data is filtered out from the slice to be analyzed that traversal obtains.In order to ensure distribution account for it is smaller First object classification and distribution account for more second target category and can reach data balancing, the embodiment of the present application can divide The following two kinds situation carries out the screening of sample data.
The first situation:Using first object classification as screening foundation, due to the first object classification be distribution accounting compared with Small target category, that is, the first object classification corresponds to disadvantage classification.In order to realize data balancing, the embodiment of the present application It can make the sample data increasing proportion for including first object classification from the sample data determined in slice to be analyzed.
The second situation:Using the second target category as screening foundation, due to second target category be distribution accounting compared with More target category, that is, second target category corresponds to advantage classification.In order to realize data balancing, the embodiment of the present application It can so that the sample data ratio comprising the second target category is reduced from the sample data determined in slice to be analyzed.
Furthermore it is also possible to simultaneously using first object classification and the second target category as screening foundation, even if sample data In include the sample data increasing proportion of first object classification, meanwhile, the sample number of the second target category is included in sample data It is reduced according to ratio.
For the first above-mentioned situation, referring to Fig. 2, the corresponding sample data generating process of above-mentioned S104 especially by Following steps are realized:
S201, distribution accounting of the first object classification included in slice to be analyzed in the slice to be analyzed is determined;
S202, if it is determined that distribution accounting be more than the first default slice accounting, then the slice to be analyzed is determined as sample Data.
Here, the embodiment of the present application determines the distribution accounting for the first object classification for including in slice to be analyzed first, so Afterwards when determining that distribution accounting is more than the first default slice accounting, which is determined as sample data, in determination When going out to be distributed accounting less than or equal to the first default slice accounting, give up the slice to be analyzed.It that is to say, for slice to be analyzed For, the distribution accounting of the disadvantage classification (i.e. first object classification) only in the slice to be analyzed is sufficiently large (to be more than first Default slice accounting) when, just retain corresponding slice to be analyzed, to improve the corresponding sample data ratio of disadvantage classification.
Wherein, the above-mentioned first default slice accounting can be determined based on the first default accounting, the above-mentioned second default slice Accounting can be determined based on the second default accounting.
For above-mentioned the second situation, referring to Fig. 3, the corresponding sample data generating process of above-mentioned S104 especially by Following steps are realized:
S301, distribution accounting of the second target category included in slice to be analyzed in the slice to be analyzed is determined;
S302, if it is determined that distribution accounting be less than the second default slice accounting, then the slice to be analyzed is determined as sample Data.
Here, the embodiment of the present application determines the distribution accounting for the second target category for including in slice to be analyzed first, so Afterwards when determining that distribution accounting is less than the second default slice accounting, which is determined as sample data, in determination When going out to be distributed accounting more than or equal to the second default slice accounting, give up the slice to be analyzed.It that is to say, for slice to be analyzed For, the distribution accounting of the advantage classification (i.e. the second target category) only in the slice to be analyzed is sufficiently small (to be less than second Default slice accounting) when, just retain corresponding slice to be analyzed, to reduce the corresponding sample data ratio of advantage classification.
Further, complex chart 2 and method shown in Fig. 3, the corresponding sample data generating process of above-mentioned S104 can also It is achieved by the steps of:
Step 1: determining that distribution of the first object classification in the slice to be analyzed included in slice to be analyzed accounts for Than;
Step 2: determining that distribution of second target category in the slice to be analyzed included in slice to be analyzed accounts for Than;
Step 1: the no strict sequence of the execution of step 2.
Step 3: if it is determined that distribution accounting be more than the first default slice accounting, and less than the second default slice accounting, then The slice to be analyzed is determined as sample data.
As shown in figure 4, sample data generation method provided by the embodiments of the present application determines each target class as follows Distribution accounting not in samples pictures:
S401, the pixel in samples pictures is labeled, obtains mark figure;Wherein, the picture of same target category is constituted Element mark value having the same;
S402, according to mark figure in the corresponding different labeled value of different target classification, determine each target category in sample graph Distribution accounting in piece.
Here, the embodiment of the present application is labeled samples pictures according to pixel scale, obtains corresponding mark figure, then Based on the corresponding different labeled value of different target classification in mark figure, determine that distribution of each target category in samples pictures accounts for Than.Next to determining that it is detailed that the process of the distribution accounting of first object classification and the second target category in samples pictures carries out It illustrates.
For first object classification, schemed according to the corresponding mark value of first object classification in above-mentioned mark figure and mark Pixel quantity between ratio, determine distribution accounting of the first object classification in samples pictures.In addition, in samples pictures Quantity is more than for the moment, and it is total in the corresponding mark figure of each samples pictures that mark value can be corresponded to according to first object classification Quantity, the ratio for marking total number of pixels in figure corresponding with each samples pictures, is determined as above-mentioned first object classification and exists Distribution accounting in samples pictures.
For the second target category, schemed according to the corresponding mark value of the second target category in above-mentioned mark figure and mark Pixel quantity between ratio, determine distribution accounting of second target category in samples pictures.In addition, in samples pictures Quantity is more than for the moment, and it is total in the corresponding mark figure of each samples pictures that mark value can be corresponded to according to the second target category Quantity, the ratio for marking total number of pixels in figure corresponding with each samples pictures, is determined as above-mentioned second target category and exists Distribution accounting in samples pictures.
Sample data generation method provided by the embodiments of the present application can be directed to according to specified preset window size to sample Picture is traversed the slice to be analyzed generated, and sample data is determined from slice to be analyzed according to default screening conditions.
Specifically, in the embodiment of the present application, can be traversed according to multiple preset window sizes multiple.In this way, for the first time When traversal, obtained slice to be analyzed is all left sample data, in the embodiment of the present application, can be opened when traversing for second Beginning discard portion slice to be analyzed can also start discard portion slice to be analyzed when third time traverses, can also be the 4th Secondary, the 5th inferior traversal is to start to give up.If distribution accounting and second target category of the first object classification in samples pictures The difference between distribution accounting in samples pictures is bigger namely the difference of advantage classification and disadvantage classification is bigger, corresponding Data nonbalance phenomenon is also more serious, may begin to discard portion slice to be analyzed when traversing for second at this time, that The big smallest number of preset window of corresponding instruction is also more.If distribution accounting and of the first object classification in samples pictures Difference between distribution accounting of two target categories in samples pictures is smaller namely the difference of advantage classification and disadvantage classification not Greatly, just start discard portion slice to be analyzed when may be traversed at this time at the 5th time, then the preset window size of corresponding instruction Quantity is also fewer.
Based on the sample data that above-described embodiment generates, the embodiment of the present application also provides a kind of method of training pattern, As shown in figure 5, the flow chart of the method for training pattern provided by the embodiments of the present application, is applied to computer equipment, above-mentioned instruction The method for practicing model includes the following steps:
S501, the mark sample data for determining sample data;
S502, sample data and mark sample data input semantic segmentation model instruct semantic segmentation model Practice;Wherein, semantic segmentation model is for realizing semantic segmentation.
Here, above using sample data as the input feature vector of semantic segmentation model to be trained in model training stage It states the corresponding mark sample data of sample data and obtains the parameter information of semantic segmentation model as a result, training as output, namely Obtain trained semantic segmentation model.Neural network model may be used as semantic segmentation model, mould in the embodiment of the present application The type training stage namely trains the process of some unknown parameter informations in neural network model.Later, so that it may to be based on this Semantic segmentation model is to provide semantic segmentation to service, and only needs target data inputting trained semantic segmentation model at this time In can be obtained corresponding label target data, can restore the effect after semantic segmentation based on the label target data Figure.
Based on same inventive concept, sample number corresponding with sample data generation method is additionally provided in the embodiment of the present application According to generating means, since the principle that the device in the embodiment of the present application solves the problems, such as is given birth to the above-mentioned sample data of the embodiment of the present application It is similar at method, therefore the implementation of device may refer to the implementation of method, overlaps will not be repeated.As shown in fig. 6, being this Shen Please the structural schematic diagram of sample data generating means that is provided of embodiment, which includes:
Samples pictures acquisition module 61 includes multiple target categories in samples pictures for obtaining samples pictures;
Target category determining module 62, for determining the first mesh for being distributed accounting in samples pictures and being less than the first default accounting Classification is marked, and/or distribution accounting is more than the second target category of the second default accounting;
Slice acquisition module 63 to be analyzed is generated and is waited for point for being traversed to samples pictures according to preset window size Analysis slice;
Sample data generation module 64 makes for determining sample data from slice to be analyzed according to default screening conditions The sample data that must be obtained meets following condition:
The case where for being screening foundation with first object classification, the sample number of first object classification is included in sample data According to increasing proportion;The case where for being screening foundation with the second target category, the sample of the second target category is included in sample data Notebook data ratio is reduced.
In a kind of possible embodiment, above-mentioned sample data generation module 64 is specifically used for determining slice to be analyzed Included in distribution accounting of the first object classification in the slice to be analyzed;If it is determined that distribution accounting to be more than first default It is sliced accounting, then the slice to be analyzed is determined as sample data.
In alternatively possible embodiment, above-mentioned sample data generation module 64 is cut specifically for determination is to be analyzed Distribution accounting of second target category included in piece in the slice to be analyzed;If it is determined that distribution accounting to be less than second pre- If being sliced accounting, then the slice to be analyzed is determined as sample data.
Above-mentioned sample data generating means further include:
Pixel labeling module 65 obtains mark figure for being labeled to the pixel in samples pictures;Wherein, it constitutes same The pixel mark value having the same of one target category;
It is distributed accounting determining module 66, for according to the corresponding different labeled value of different target classification in mark figure, determining Distribution accounting of each target category in samples pictures.
In specific implementation, the case where being more than one for samples pictures quantity, above-mentioned distribution accounting determining module 66, specifically For being directed to different target classification, it is total in the corresponding mark figure of each samples pictures which is corresponded into mark value Quantity, the ratio for marking total number of pixels in figure corresponding with each samples pictures, is determined as the target category in sample graph Distribution accounting in piece.
In another possible embodiment, above-mentioned sample data generation module 64 is specifically used for for according to specified Preset window size traverses samples pictures the slice to be analyzed generated, according to default screening conditions from slice to be analyzed Middle determining sample data.
In specific implementation, distribution accounting and second target category of the first object classification in samples pictures are in sample graph The difference between distribution accounting in piece is bigger, specifies the big smallest number of preset window more.
As shown in fig. 7, a kind of structural schematic diagram of the computer equipment provided by the embodiment of the present application, the computer are set It is standby to include:Processor 71, memory 72 and bus 73, memory 72 storage execute instruction, when device run, processor 71 and It is communicated by bus 73 between memory 72, what is stored in the execution memory 72 of processor 71 executes instruction as follows:
Samples pictures are obtained, include multiple target categories in samples pictures;
It determines and is distributed the first object classification that accounting is less than the first default accounting in samples pictures, and/or distribution accounting is big In the second target category of the second default accounting;
Samples pictures are traversed according to preset window size, generate slice to be analyzed;
Sample data is determined from slice to be analyzed according to default screening conditions so that obtained sample data meets as follows Condition:
The case where for being screening foundation with first object classification, the sample number of first object classification is included in sample data According to increasing proportion;The case where for being screening foundation with the second target category, the sample of the second target category is included in sample data Notebook data ratio is reduced.
In a kind of possible embodiment, above-mentioned processor 71 is specifically used for determining included in slice to be analyzed Distribution accounting of the first object classification in the slice to be analyzed;If it is determined that distribution accounting be more than the first default slice accounting, The slice to be analyzed is then determined as sample data.
In alternatively possible embodiment, above-mentioned processor 71 is specifically used for determining included in slice to be analyzed Distribution accounting of second target category in the slice to be analyzed;If it is determined that distribution accounting be less than the second default slice and account for Than the slice to be analyzed is then determined as sample data.
Above-mentioned processor 71 is additionally operable to, and is labeled to the pixel in samples pictures, and mark figure is obtained;Wherein, it constitutes same The pixel mark value having the same of one target category;According to the corresponding different labeled value of different target classification in mark figure, really Fixed distribution accounting of each target category in samples pictures.
In specific implementation, the case where being more than one for samples pictures quantity, above-mentioned processor 71 are specifically used for for not The target category is corresponded to total quantity of the mark value in the corresponding mark figure of each samples pictures by same target category, and each The ratio of total number of pixels, is determined as distribution of the target category in samples pictures in the corresponding mark figure of samples pictures Accounting.
In another possible embodiment, above-mentioned processor 71 is specifically used for for big according to specified preset window It is small that samples pictures are traversed with the slice to be analyzed generated, sample is determined from slice to be analyzed according to default screening conditions Data.
In specific implementation, distribution accounting and second target category of the first object classification in samples pictures are in sample graph The difference between distribution accounting in piece is bigger, and above-mentioned processor 71 specifies the big smallest number of preset window more.
Based on same inventive concept, training pattern corresponding with the method for training pattern is additionally provided in the embodiment of the present application Device, due to the method for principle and the above-mentioned training pattern of the embodiment of the present application that the device in the embodiment of the present application solves the problems, such as It is similar, therefore the implementation of device may refer to the implementation of method, overlaps will not be repeated.As shown in figure 8, implementing for the application The schematic device for the training pattern that example is provided, the device of the training pattern include:
Labeled data determining module 81, the mark sample data for determining sample data;
Semantic segmentation model training module 82, for sample data and mark sample data to be inputted semantic segmentation model Semantic segmentation model is trained;Wherein, semantic segmentation model is for realizing semantic segmentation.
As shown in figure 9, a kind of structural schematic diagram of the computer equipment provided by the embodiment of the present application, the computer are set It is standby to include:Processor 91, memory 92 and bus 93, memory 92 storage execute instruction, when device run, processor 91 and It is communicated by bus 93 between memory 92, what is stored in the execution memory 92 of processor 91 executes instruction as follows:
Determine the mark sample data of sample data;
Sample data and mark sample data input semantic segmentation model are trained semantic segmentation model;Its In, semantic segmentation model is for realizing semantic segmentation.
The computer program product of sample data generation method, the method for training pattern that the embodiment of the present application is provided, Computer readable storage medium including storing program code, the instruction that program code includes can be used for executing previous methods real The method in example is applied, specific implementation can be found in embodiment of the method, and details are not described herein.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
If function is realized in the form of SFU software functional unit and when sold or used as an independent product, can store In a computer read/write memory medium.Based on this understanding, the technical solution of the application is substantially in other words to existing There is the part for the part or the technical solution that technology contributes that can be expressed in the form of software products, the computer Software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be personal meter Calculation machine, server or network equipment etc.) execute each embodiment method of the application all or part of step.And it is above-mentioned Storage medium includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory The various media that can store program code such as (RAM, Random Access Memory), magnetic disc or CD.
More than, the only specific implementation mode of the application, but the protection domain of the application is not limited thereto, and it is any to be familiar with Those skilled in the art can easily think of the change or the replacement in the technical scope that the application discloses, and should all cover Within the protection domain of the application.Therefore, the protection domain of the application should be subject to the protection scope in claims.

Claims (10)

1. a kind of sample data generation method, which is characterized in that the method includes:
Samples pictures are obtained, include multiple target categories in the samples pictures;
It determines and is distributed the first object classification that accounting is less than the first default accounting in the samples pictures, and/or distribution accounting is big In the second target category of the second default accounting;
The samples pictures are traversed according to preset window size, generate slice to be analyzed;
Sample data is determined from slice to be analyzed according to default screening conditions so that obtained sample data meets following item Part:
The case where for being screening foundation with first object classification, the sample data ratio of first object classification is included in sample data Example increases;The case where for being screening foundation with the second target category, the sample number of the second target category is included in sample data It is reduced according to ratio.
2. according to the method described in claim 1, it is characterized in that, for first object classification be screening foundation the case where, Sample data is determined from slice to be analyzed according to default screening conditions, including:
Determine distribution accounting of the first object classification included in slice to be analyzed in the slice to be analyzed;
If it is determined that distribution accounting be more than the first default slice accounting, then the slice to be analyzed is determined as sample data.
3. according to the method described in claim 1, it is characterized in that, for the second target category be screening foundation the case where, Sample data is determined from slice to be analyzed according to default screening conditions, including:
Determine distribution accounting of the second target category included in slice to be analyzed in the slice to be analyzed;
If it is determined that distribution accounting be less than the second default slice accounting, then the slice to be analyzed is determined as sample data.
4. according to claim 1-3 any one of them methods, which is characterized in that further include:
Pixel in the samples pictures is labeled, mark figure is obtained;Wherein, the pixel for constituting same target category has Identical mark value;
According to the corresponding different labeled value of different target classification in the mark figure, determine each target category in the samples pictures In distribution accounting.
5. according to the method described in claim 4, it is characterized in that, the case where samples pictures quantity is more than one is directed to, according to institute The corresponding different labeled value of different target classification in mark figure is stated, determines that distribution of each target category in the samples pictures accounts for Than, including:
For different target classification, it is total in the corresponding mark figure of each samples pictures which is corresponded into mark value Quantity, the ratio for marking total number of pixels in figure corresponding with each samples pictures, is determined as the target category in sample graph Distribution accounting in piece.
6. according to the method described in claim 1, it is characterized in that, determining sample from slice to be analyzed according to default screening conditions Notebook data, including:
For according to specified preset window size the samples pictures are traversed with the slice to be analyzed generated, according to default Screening conditions determine sample data from slice to be analyzed.
7. according to the method described in claim 6, it is characterized in that, distribution of the first object classification in the samples pictures accounts for It is bigger than the difference between distribution accounting of second target category in the samples pictures, specify the big smallest number of preset window It is more.
8. a kind of method of the sample data training pattern generated based on any one of claim 1 to 7, which is characterized in that institute The method of stating includes:
Determine the mark sample data of sample data;
Sample data and mark sample data input semantic segmentation model are trained the semantic segmentation model;Its In, the semantic segmentation model is for realizing semantic segmentation.
9. a kind of sample data generating means, which is characterized in that described device includes:
Samples pictures acquisition module includes multiple target categories in the samples pictures for obtaining samples pictures;
Target category determining module, for determining the first object for being distributed accounting in the samples pictures and being less than the first default accounting Classification, and/or distribution accounting are more than the second target category of the second default accounting;
Slice acquisition module to be analyzed generates to be analyzed for being traversed to the samples pictures according to preset window size Slice;
Sample data generation module, for determining sample data from slice to be analyzed according to default screening conditions so that obtain Sample data meet following condition:
The case where for being screening foundation with first object classification, the sample data ratio of first object classification is included in sample data Example increases;The case where for being screening foundation with the second target category, the sample number of the second target category is included in sample data It is reduced according to ratio.
10. a kind of device of the sample data training pattern generated based on claim 9, which is characterized in that described device includes:
Labeled data determining module, the mark sample data for determining sample data;
Semantic segmentation model training module, for sample data and mark sample data to be inputted semantic segmentation model to described Semantic segmentation model is trained;Wherein, the semantic segmentation model is for realizing semantic segmentation.
CN201810289135.2A 2018-03-30 2018-03-30 Sample data generation method and device and model training method and device Active CN108717547B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810289135.2A CN108717547B (en) 2018-03-30 2018-03-30 Sample data generation method and device and model training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810289135.2A CN108717547B (en) 2018-03-30 2018-03-30 Sample data generation method and device and model training method and device

Publications (2)

Publication Number Publication Date
CN108717547A true CN108717547A (en) 2018-10-30
CN108717547B CN108717547B (en) 2020-12-22

Family

ID=63898986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810289135.2A Active CN108717547B (en) 2018-03-30 2018-03-30 Sample data generation method and device and model training method and device

Country Status (1)

Country Link
CN (1) CN108717547B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409432A (en) * 2018-10-31 2019-03-01 腾讯科技(深圳)有限公司 A kind of image processing method, device and storage medium
CN110162649A (en) * 2019-05-24 2019-08-23 北京百度网讯科技有限公司 Sample data acquisition methods obtain system, server and computer-readable medium
CN110889462A (en) * 2019-12-09 2020-03-17 秒针信息技术有限公司 Data processing method, device, equipment and storage medium
WO2020093718A1 (en) * 2018-11-08 2020-05-14 北京字节跳动网络技术有限公司 Training data re-sampling method and apparatus, and storage medium and electronic device
CN111382569A (en) * 2018-12-27 2020-07-07 深圳市优必选科技有限公司 Method and device for recognizing entities in dialogue corpus and computer equipment
CN113743431A (en) * 2020-05-29 2021-12-03 阿里巴巴集团控股有限公司 Data selection method and device
CN111382569B (en) * 2018-12-27 2024-05-03 深圳市优必选科技有限公司 Method and device for identifying entity in dialogue corpus and computer equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354595A (en) * 2015-10-30 2016-02-24 苏州大学 Robust visual image classification method and system
CN105740894A (en) * 2016-01-28 2016-07-06 北京航空航天大学 Semantic annotation method for hyperspectral remote sensing image
CN105844287A (en) * 2016-03-15 2016-08-10 民政部国家减灾中心 Domain self-adaptive method and system for remote sensing image classification
CN106530305A (en) * 2016-09-23 2017-03-22 北京市商汤科技开发有限公司 Semantic segmentation model training and image segmentation method and device, and calculating equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354595A (en) * 2015-10-30 2016-02-24 苏州大学 Robust visual image classification method and system
CN105740894A (en) * 2016-01-28 2016-07-06 北京航空航天大学 Semantic annotation method for hyperspectral remote sensing image
CN105844287A (en) * 2016-03-15 2016-08-10 民政部国家减灾中心 Domain self-adaptive method and system for remote sensing image classification
CN106530305A (en) * 2016-09-23 2017-03-22 北京市商汤科技开发有限公司 Semantic segmentation model training and image segmentation method and device, and calculating equipment

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HAJIZADEH ET AL.: ""Semi-supervised rail defect detection from imbalanced image data"", 《PROCEEDINGS OF THE 14TH IFAC SYMPOSIUM ON CONTROL IN TRANSPORTATION SYSTEMS》 *
刘胥影 等: ""一种基于级联模型的类别不平衡数据分类方法"", 《南京大学学报(自然科学)》 *
姜枫 等: ""基于内容的图像分割方法综述"", 《软件学报》 *
李勇 等: ""不平衡数据的集成分类算法综述"", 《计算机应用研究》 *
钱洪波 等: ""非平衡类数据分类概述"", 《计算机工程与科学》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409432A (en) * 2018-10-31 2019-03-01 腾讯科技(深圳)有限公司 A kind of image processing method, device and storage medium
CN109409432B (en) * 2018-10-31 2019-11-26 腾讯科技(深圳)有限公司 A kind of image processing method, device and storage medium
WO2020093718A1 (en) * 2018-11-08 2020-05-14 北京字节跳动网络技术有限公司 Training data re-sampling method and apparatus, and storage medium and electronic device
CN111382569A (en) * 2018-12-27 2020-07-07 深圳市优必选科技有限公司 Method and device for recognizing entities in dialogue corpus and computer equipment
CN111382569B (en) * 2018-12-27 2024-05-03 深圳市优必选科技有限公司 Method and device for identifying entity in dialogue corpus and computer equipment
CN110162649A (en) * 2019-05-24 2019-08-23 北京百度网讯科技有限公司 Sample data acquisition methods obtain system, server and computer-readable medium
CN110162649B (en) * 2019-05-24 2021-06-18 北京百度网讯科技有限公司 Sample data acquisition method, acquisition system, server and computer readable medium
CN110889462A (en) * 2019-12-09 2020-03-17 秒针信息技术有限公司 Data processing method, device, equipment and storage medium
CN110889462B (en) * 2019-12-09 2023-05-02 秒针信息技术有限公司 Data processing method, device, equipment and storage medium
CN113743431A (en) * 2020-05-29 2021-12-03 阿里巴巴集团控股有限公司 Data selection method and device
CN113743431B (en) * 2020-05-29 2024-04-02 阿里巴巴集团控股有限公司 Data selection method and device

Also Published As

Publication number Publication date
CN108717547B (en) 2020-12-22

Similar Documents

Publication Publication Date Title
CN108717547A (en) The method and device of sample data generation method and device, training pattern
US11443190B2 (en) Processing cell images using neural networks
CN110276066B (en) Entity association relation analysis method and related device
US20230066050A1 (en) Image classification modeling while maintaining data privacy compliance
CN109564575B (en) Classifying images using machine learning models
CN111160569A (en) Application development method and device based on machine learning model and electronic equipment
CN112215171B (en) Target detection method, device, equipment and computer readable storage medium
CN114067119B (en) Training method of panorama segmentation model, panorama segmentation method and device
CN107977624A (en) A kind of semantic segmentation method, apparatus and system
CN108255689A (en) A kind of Apache Spark application automation tuning methods based on historic task analysis
US11501655B2 (en) Automated skill tagging, knowledge graph, and customized assessment and exercise generation
CN114330588A (en) Picture classification method, picture classification model training method and related device
CN114730486A (en) Generating training data for object detection
CN103544170B (en) Browse appraisal procedure and the device of quality
CN110909768A (en) Method and device for acquiring marked data
CN110363245B (en) Online classroom highlight screening method, device and system
CN112527676A (en) Model automation test method, device and storage medium
CN110765872A (en) Online mathematical education resource classification method based on visual features
Asaoka et al. Nonnegative/Binary matrix factorization for image classification using quantum annealing
CN113010687B (en) Exercise label prediction method and device, storage medium and computer equipment
CN111428724B (en) Examination paper handwriting statistics method, device and storage medium
US20210312223A1 (en) Automated determination of textual overlap between classes for machine learning
CN113223017A (en) Training method of target segmentation model, target segmentation method and device
CN114519404B (en) Image sample classification labeling method, device, equipment and storage medium
Erichsen et al. On Applying Machine Learning/Object Detection Models for Analysing Digitally Captured Physical Prototypes from Engineering Design Projects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 101-8, 1st floor, building 31, area 1, 188 South Fourth Ring Road West, Fengtai District, Beijing

Applicant after: Guoxin Youyi Data Co., Ltd

Address before: 100070, No. 188, building 31, headquarters square, South Fourth Ring Road West, Fengtai District, Beijing

Applicant before: SIC YOUE DATA Co.,Ltd.

GR01 Patent grant
GR01 Patent grant