CN108197668A - The method for building up and cloud system of model data collection - Google Patents
The method for building up and cloud system of model data collection Download PDFInfo
- Publication number
- CN108197668A CN108197668A CN201810096270.5A CN201810096270A CN108197668A CN 108197668 A CN108197668 A CN 108197668A CN 201810096270 A CN201810096270 A CN 201810096270A CN 108197668 A CN108197668 A CN 108197668A
- Authority
- CN
- China
- Prior art keywords
- data
- classification
- model
- disaggregated model
- trained
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application provides the method for building up and cloud system of model data collection, the method includes:It is clustered according to the data that the data characteristics of selection concentrates data, and classification marker is carried out to the data in the data set according to cluster result;Initialization disaggregated model is trained according to the data set after classification marker, obtains trained disaggregated model;Trained disaggregated model is tested, and model data collection is established according to test result.The application can utilize the finally determining model data collection for being used to implement Classification and Identification, remove artificial mark from and its verify spent manpower and time cost, so as to fulfill the automatic marking to model data collection, while effectively promote the efficiency and accuracy of Classification and Identification.
Description
Technical field
This application involves the method for building up and cloud system in depth learning technology field, more particularly to model data collection.
Background technology
In recent years, had on classifying quality aobvious compared to traditional sorting technique based on the sorting technique of deep learning
The breakthrough of work, and classification accuracy is higher, with ResNet, DenseNet even depth learning networks are constantly suggested, and are based on
The sorting technique of deep learning is increasingly becoming the main trend of classification application.
Sorting technique based on deep learning mainly by huge training set, in disaggregated model by forward conduction and
The continuous training pattern parameter of reverse transfer, obtains trained disaggregated model, to reach ideal classifying quality, and preferably divides
Class effect depends on the representativeness of generic and the accuracy of its corresponding label in training set.In order to ensure the standard of label
True property, current training set label determine the classification belonging to sample data, but for more multiple by the way of manually marking
Miscellaneous classification task, the data volume in training set are mostly 100,000 or even 1,011,000 order of magnitude, and the mode manually marked causes
The manpower and time cost of consuming are higher, for example, Magenet image classification contests, the artificial mark of training set label is to rely on
What MTurk crowdsourcings platform was realized.
Prior art deficiency is, since artificial notation methods are there are certain subjectivity, to ensure the visitor of annotation results
The property seen and accuracy usually also need to supervise annotation process or carry out screening to annotation results, so as to cause artificial
The cost higher of mark.Therefore, fixed training set is usually relied on to the training of disaggregated model, and included according to training set
Classification realizes Classification and Identification, if desired builds training set to realize the identification to particular category according to specific demand, then causes
Manpower and time cost spent by artificial mark and its verification is higher.As it can be seen that base is limited to the dependence of artificial notation methods
In the sorting technique all-round popularization in practical applications of deep learning.
Invention content
In view of this, it is existing to solve an embodiment of the present invention is intended to provide the method for building up and cloud system of model data collection
Sorting technique based on deep learning excessively relies on artificial notation methods, cause manually to mark and its manpower that verification is spent and
The technical issues of time cost is higher.
In one aspect, the embodiment of the present application provides a kind of method for building up of model data collection, including:
It is clustered according to the data that the data characteristics of selection concentrates data, and according to cluster result to the data set
In data carry out classification marker;
Initialization disaggregated model is trained according to the data set after classification marker, obtains trained disaggregated model;
Trained disaggregated model is tested, and model data collection is established according to test result.
On the other hand, what the embodiment of the present application provided a kind of model data collection establishes cloud system, including:
Server is clustered, the data concentrated for the data characteristics according to selection to data cluster, and according to cluster
As a result classification marker is carried out to the data in the data set;
Training server for being trained according to the data set after classification marker to initialization disaggregated model, is instructed
The disaggregated model perfected;
Test server for testing trained disaggregated model, and establishes model data according to test result
Collection.
On the other hand, the embodiment of the present application provides a kind of electronic equipment, and the electronic equipment includes:
Transceiver, memory, one or more processors;And
One or more modules, one or more of modules are stored in the memory, and are configured to by institute
One or more processors execution is stated, one or more of modules include the finger for performing each step in the above method
It enables.
On the other hand, the embodiment of the present application provides a kind of computer program production being used in combination with electronic equipment
Product, the computer program product include computer-readable storage medium and are embedded in computer program mechanism therein, institute
It states computer program mechanism and includes the instruction for performing each step in the above method.
In order to achieve the above objectives, the technical solution of the embodiment of the present invention is realized in:
In the present embodiment, the data concentrated using the data characteristics of selection to data are clustered, and according to cluster result
Classification marker is carried out to the data in the data set, initialization disaggregated model is instructed using the data set after classification marker
Practice, obtain trained disaggregated model, and trained disaggregated model is tested, determine eventually for realization Classification and Identification
Model data collection, so as to remove artificial mark and its spent manpower and time cost of verification from, realize to model data collection
Automatic marking, while effectively promoted Classification and Identification efficiency and accuracy.
Description of the drawings
The specific embodiment of the application is described below with reference to accompanying drawings, wherein:
Fig. 1 is the method schematic that model data collection is established in the embodiment of the present application one;
Fig. 2 is the flow diagram that model data collection is established in the embodiment of the present application one;
Fig. 3 is the cloud system Organization Chart that model data collection is established in the embodiment of the present application two;
Fig. 4 is the structure diagram of electronic equipment in the embodiment of the present application three.
Specific embodiment
Below by way of specific example, the essence for embodiment technical solution that the present invention is furture elucidated.
In order to which the technical solution of the application and advantage is more clearly understood, below in conjunction with attached drawing to the exemplary of the application
Embodiment is described in more detail, it is clear that described embodiment be only the application part of the embodiment rather than
The exhaustion of all embodiments.And in the absence of conflict, the feature in the embodiment and embodiment in this explanation can be mutual
It is combined.
Inventor notices during invention:
The foundation of training set based on artificial notation methods, it usually needs supervised to annotation process or tied to mark
Fruit carries out screening, leads to the cost higher manually marked, and for needing to build training set according to specific demand to realize pair
The identification of particular category, by causing, the manpower and time cost that manually mark and its verification is spent are higher.As it can be seen that based on depth
The sorting technique of study is higher to the dependence manually marked.
Against the above deficiency/and based on this, the embodiment of the present application is proposed to be carried by the data progress feature concentrated to data
It takes and clusters, establish data set automatically, the training set part in data set is trained, and root initialization disaggregated model
The classification accuracy of trained disaggregated model is tested according to the test set part in data set, to ensure based on depth
The objectivity of the model data intensive data classification of habit.
For the ease of the implementation of the application, Examples below illustrates.
Embodiment 1
Fig. 1 shows the method schematic that model data collection is established in the embodiment of the present application one, as shown in Figure 1, this method
Including:
Step 101:It is clustered according to the data that the data characteristics of selection concentrates data, and according to cluster result to institute
The data stated in data set carry out classification marker.
Step 102:Initialization disaggregated model is trained according to the data set after classification marker, obtains trained point
Class model.
Step 103:Trained disaggregated model is tested, and model data collection is established according to test result.
In implementation, the executive agent of above-mentioned steps can be cloud server, and cloud server is according to preset feature database
In feature data that data are concentrated carry out feature extraction, the data characteristics of extraction is clustered using clustering algorithm, root
According to cluster result, to data characteristics, corresponding data carry out classification marker and according to the data after classification marker to being based on automatically
The disaggregated model of deep learning is trained, and trained disaggregated model is tested, if test result satisfaction judges item
Part then shows the classification success to data set, directly using the data set after classification marker as model data collection, for being based on depth
It spends in the disaggregated model of study, to realize the precise classification of data;If test result is unsatisfactory for Rule of judgment, show to data
The classification failure of collection, reacquires new feature, and repeat whole process from preset feature database, until test result meets
Rule of judgment establishes model data collection, realizes the precise classification of data.
In implementation, model data collection can be applied to the automatic foundation of image data set, can also be according to actual conditions
The automatic foundation for other types data set is needed, for example, the automatic foundation of text data set, this implementation is not to model data
The type of intensive data is specifically limited.
In the present embodiment, the data that the data characteristics according to selection concentrates data cluster, including:
The data characteristics as cluster foundation is chosen from preset feature set;
According to selected data characteristics, the data characteristics of data intensive data is extracted;
The data characteristics of extraction is clustered.
In the present embodiment, the data characteristics in the preset feature set includes characterizing color of image, edge, line
The artificial setting feature of one or more of reason and the output feature of each layer of disaggregated model.
In implementation, feature set establishes process specifically, color histogram, HOG, Haar etc. are used to characterize image face
Artificial each layer of the disaggregated model of setting feature and VGG16, ResNet etc. based on deep learning of color, edge, texture etc.
Feature is exported, is added in feature database together, feature database is expressed as { f1, f2..., fk, k is characterized the data characteristics that library includes
Quantity.
In implementation, chosen from preset feature set as the data characteristics of cluster foundation and according to selected number
According to feature, the data characteristics of data intensive data is extracted, realization process is specially:
1) cluster foundation is randomly selected:Data characteristics f is randomly selected in feature databaseiCluster as data classification marker
Foundation, the data characteristics f that will be choseniIt is deleted from feature database, feature database is expressed as { f at this time1, f2..., fi-1, fi+1...,
fk}。
2) data characteristics in data set is extracted:Classification marker is carried out to the data in the data set according to cluster result
Process specifically, according to the cluster foundation randomly selected, to the feature f for each data that data are concentratediIt extracts, if with
The data characteristics f that machine is choseniFor histograms of oriented gradients (HOG:Histogram of Oriented Gradient) etc. it is artificial
Feature is set, then is directly extracted according to the extracting method of data characteristics;If the data characteristics f randomly selectediFor mould of classifying
The output feature of a certain layer of type, then imported into the disaggregated model based on deep learning using the data in data set as input terminal
In, and extract feature of the output feature of respective layer as the data.
In implementation, the data characteristics of extraction is clustered, and according to cluster result to the data in the data set into
Row classification marker, realization process are specially:
1) data characteristics clusters:The data characteristics of extraction is clustered using K-Means clustering algorithms, wherein, cluster
Centric quantity can be set according to actual needs, be set as m=10 herein, this implementation does not have cluster centre quantity
Body limits.
2) classification marker:Automatic classification marker is carried out to the data x that data are concentrated according to cluster result, if data x is corresponded to
Feature f be divided in the n-th class, then data x is marked as the n-th class.
In the present embodiment, the data set after the classification marker includes training set, the number according to after classification marker
It is trained according to set pair initialization disaggregated model to be trained according to the training set to initialization disaggregated model.
In implementation, the data set after automatic label is divided into training set and test set, such as randomly select in data set
90% data are as training set, and the part of remainder 10% is as test set, according in the preceding classification results marked automatically, utilization
Training set part is trained the initialization disaggregated model based on deep learning, obtains trained disaggregated model.Wherein, it instructs
Practicing collection and the selection of test set accounting can be set according to actual conditions, this implementation not to the accounting of training set and test set into
Row is specific to be limited.
In the present embodiment, the data set after the classification marker includes test set, described to trained disaggregated model
It is tested, and model data collection is established according to test result, including:
Trained disaggregated model is tested according to the test set, the classification for obtaining trained disaggregated model is accurate
True rate;
Model data collection is established according to the classification accuracy.
In the present embodiment, it is described that trained disaggregated model is tested according to the test set, it is trained
Disaggregated model classification accuracy, including:
Classified using trained disaggregated model to the data in the test set, obtain the classification results of data;
The classification results with the classification marker of the test intensive data are compared, obtain trained classification mould
The classification accuracy of type.
In implementation, the test process of disaggregated model is specifically, using obtained disaggregated model is trained to the number in test set
It is compared according to classifying, and by testing classification result with testing the automatic labeled bracketing result of intensive data, if data x
Testing classification result is identical with automatic labeled bracketing result, then it is assumed that data x classification is correct, otherwise it is assumed that data x classification is wrong
Accidentally.
Further, according to the testing classification result of data all in test set and automatic labeled bracketing as a result, calculating
The obtained disaggregated model of training to the classification accuracy b of entire test set, wherein, classification accuracy can according in test set just
The ratio calculation of data count obtains in the data bulk and test set really classified, can also be accurate to classifying according to actual conditions
The computational methods of rate are defined, this implementation does not limit the computational methods of classification accuracy specifically.
In the present embodiment, it is described that model data collection is established according to the classification accuracy, including:
If the classification accuracy is more than setting value, pattern number is generated according to the classification marker of the test intensive data
According to collection;
If the classification accuracy is less than or equal to setting value, the data characteristics as cluster foundation is chosen again.
In implementation, the realization process of model data collection is established according to the classification accuracy specifically, will be calculated
Classification accuracy b is compared with preset threshold value a, if b>A then generates model data according to automatic labeled bracketing result
Collection;Otherwise, from deleting data characteristics fiFeature database { f1, f2..., fi-1, fi+1..., fkIn choose again data characteristics work
For the cluster foundation of data classification marker, and whole process is repeated, until test result meets b>A generates model data collection.
The application is by taking the application scenarios established automatically of image data set as an example, and Fig. 2 shows in the embodiment of the present application one
The flow diagram that model data collection is established, as shown in Fig. 2, the embodiment of the present application 1 is described in detail.
The embodiment of the present application application range includes but not limited to the automatic foundation based on image data set, with image data set
It is automatic establish for, idiographic flow is as follows:
Step 201:Establish characteristics of image library.Will artificial setting feature and each layer of disaggregated model output feature, one
And be added in characteristics of image library, characteristics of image library is expressed as { f1, f2..., fk, k is the image data that characteristics of image library includes
The quantity of feature.
Step 202:By randomly selecting cluster foundation, extraction image data concentrates the feature of image data.It specifically includes:
1) image data feature is randomly selected:Image data feature f is randomly selected in characteristics of image libraryiAs data point
The cluster foundation of class label, by the image data feature f of selectioniIt is deleted from characteristics of image library, characteristics of image library represents at this time
For { f1, f2..., fi-1, fi+1..., fk}。
2) the image data feature that extraction image data is concentrated:According to the cluster foundation randomly selected, to image data set
In each image data feature fiIt extracts.
Step 203:The image data feature of extraction is clustered, and is classified according to cluster result to image data
Label.It specifically includes:
1) feature clustering:The image data feature of extraction is clustered using K-Means clustering algorithms.
2) classification marker:Automatic classification marker is carried out to image data according to cluster result, if the corresponding figures of image data x
As data characteristics f is divided in the n-th class, then image data x is marked as the n-th class.
Step 204:Image classification model training.Image data set after automatic label is divided into training set and test set,
According in the preceding classification results marked automatically, initialisation image disaggregated model is trained using training set part, is instructed
The image classification model perfected.
Step 205:The image classification model obtained to training is tested, and the classification for obtaining image classification model is accurate
Rate, and final model data collection is determined according to classification accuracy.It specifically includes:
1) image classification model measurement:The image data in test set is carried out using the image classification model that training obtains
Classification, and testing classification result is compared with automatic labeled bracketing result, if the testing classification result of image data x and oneself
Dynamic labeled bracketing result is identical, then it is assumed that and image data x classification is correct, otherwise it is assumed that image data x classification errors, thus into
Classification accuracy b of the image classification model to entire test set is calculated in one step.
2) judged by the classification accuracy of image classification model, determine final model data collection:By what is be calculated
Classification accuracy b is compared with preset threshold value a, if b>A then generates model data according to automatic labeled bracketing result
Collection;Otherwise, return to step 202, from deleting image data feature fiFeature database { f1, f2..., fi-1, fi+1..., fkIn weight
The new cluster foundation for choosing image data feature as data classification marker.
The preferred embodiment of the above, only the application is not intended to limit the protection domain of the application.
Embodiment 2
Based on same inventive concept, a kind of model data collection is additionally provided in the embodiment of the present application establishes cloud system, by
It is similar to a kind of method for building up of model data collection in the principle that these equipment solve the problems, such as, therefore the implementation of these equipment can be with
Referring to the implementation of method, overlaps will not be repeated.
What Fig. 3 showed model data collection in the embodiment of the present application two establishes cloud system Organization Chart, as shown in figure 3, model
Data set is established cloud system 300 and can be included:
Server 301 is clustered, the data concentrated for the data characteristics according to selection to data cluster, and according to poly-
Class result carries out classification marker to the data in the data set;
Training server 302 for being trained according to the data set after classification marker to initialization disaggregated model, obtains
Trained disaggregated model;
Test server 303 for testing trained disaggregated model, and establishes pattern number according to test result
According to collection.
In the present embodiment, the cluster server 301 includes:
The data characteristics as cluster foundation is chosen from preset feature set;
According to selected data characteristics, the data characteristics of data intensive data is extracted;
The data characteristics of extraction is clustered.
In the present embodiment, the data characteristics in the preset feature set includes characterizing color of image, edge, line
The artificial setting feature of one or more of reason and the output feature of each layer of disaggregated model.
In the present embodiment, the data set after the classification marker includes training set, and the training server 302 includes:
Initialization disaggregated model is trained according to the training set.
In the present embodiment, the data set after the classification marker includes test set, and the test server 303 includes:
Trained disaggregated model is tested according to the test set, the classification for obtaining trained disaggregated model is accurate
True rate;
Model data collection is established according to the classification accuracy.
In the present embodiment, it is described that trained disaggregated model is tested according to the test set, it is trained
Disaggregated model classification accuracy, including:
Classified using trained disaggregated model to the data in the test set, obtain the classification results of data;
The classification results with the classification marker of the test intensive data are compared, obtain trained classification mould
The classification accuracy of type.
In the present embodiment, it is described that model data collection is established according to the classification accuracy, including:
If the classification accuracy is more than setting value, pattern number is generated according to the classification marker of the test intensive data
According to collection;
If the classification accuracy is less than or equal to setting value, the data characteristics as cluster foundation is chosen again.
Embodiment 3
Based on same inventive concept, a kind of electronic equipment is additionally provided in the embodiment of the present application, due to its principle and one kind
Establishing for model data collection is similar, therefore its implementation may refer to the implementation of method, and overlaps will not be repeated.
Fig. 4 shows the structure diagram of electronic equipment in the embodiment of the present application three, as shown in figure 4, the electronic equipment
Including:Transceiver 401, memory 402, one or more processors 403;And one or more modules, it is one or
Multiple modules are stored in the memory, and are configured to be performed by one or more of processors, it is one or
Multiple modules include the instruction for performing each step in any above method.
Embodiment 4
Based on same inventive concept, the embodiment of the present application additionally provides a kind of computer journey being used in combination with electronic equipment
Sequence product since its principle is similar to a kind of method for building up of model data collection, is implemented to may refer to the implementation of method,
Overlaps will not be repeated.The computer program product includes computer-readable storage medium and is embedded in calculating therein
Machine procedure mechanism, the computer program mechanism include the instruction for performing each step in any above method.
For convenience of description, each section of apparatus described above is divided into various modules with function and describes respectively.Certainly, exist
Implement each module or the function of unit can be realized in same or multiple softwares or hardware during the application.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program
Product.Therefore, the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware can be used in the application
Apply the form of example.Moreover, the computer for wherein including computer usable program code in one or more can be used in the application
The computer program production that usable storage medium is implemented on (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of product.
The application is with reference to the flow according to the method for the embodiment of the present application, equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that it can be realized by computer program instructions every first-class in flowchart and/or the block diagram
The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided
The processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that the instruction performed by computer or the processor of other programmable data processing devices is generated for real
The device of function specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction generation being stored in the computer-readable memory includes referring to
Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or
The function of being specified in multiple boxes.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted
Series of operation steps are performed on calculation machine or other programmable devices to generate computer implemented processing, so as in computer or
The instruction offer performed on other programmable devices is used to implement in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment of the application has been described, those skilled in the art once know basic creation
Property concept, then additional changes and modifications may be made to these embodiments.So appended claims be intended to be construed to include it is excellent
It selects embodiment and falls into all change and modification of the application range.
Claims (10)
1. a kind of method for building up of model data collection, which is characterized in that including:
It is clustered according to the data that the data characteristics of selection concentrates data, and according to cluster result in the data set
Data carry out classification marker;
Initialization disaggregated model is trained according to the data set after classification marker, obtains trained disaggregated model;
Trained disaggregated model is tested, and model data collection is established according to test result.
2. the method as described in claim 1, which is characterized in that the data that the data characteristics according to selection concentrates data
It is clustered, including:
The data characteristics as cluster foundation is chosen from preset feature set;
According to selected data characteristics, the data characteristics of data intensive data is extracted;
The data characteristics of extraction is clustered.
3. method as claimed in claim 2, which is characterized in that the data characteristics in the preset feature set is included for table
Levy the artificial setting feature of one or more of color of image, edge, texture and the output spy of each layer of disaggregated model
Sign.
4. the method as described in claim 1, which is characterized in that the data set after the classification marker includes training set, described
Initialization disaggregated model is trained for according to the training set to initialization classification mould according to the data set after classification marker
Type is trained.
5. method as described in claim 1 or 4, which is characterized in that the data set after the classification marker includes test set, institute
It states and trained disaggregated model is tested, and model data collection is established according to test result, including:
Trained disaggregated model is tested according to the test set, the classification for obtaining trained disaggregated model is accurate
Rate;
Model data collection is established according to the classification accuracy.
6. method as claimed in claim 5, which is characterized in that it is described according to the test set to trained disaggregated model into
Row test, obtains the classification accuracy of trained disaggregated model, including:
Classified using trained disaggregated model to the data in the test set, obtain the classification results of data;
The classification results with the classification marker of the test intensive data are compared, obtain trained disaggregated model
Classification accuracy.
7. method as claimed in claim 5, which is characterized in that it is described that model data collection is established according to the classification accuracy,
Including:
If the classification accuracy is more than setting value, model data is generated according to the classification marker of the test intensive data
Collection;
If the classification accuracy is less than or equal to setting value, the data characteristics as cluster foundation is chosen again.
8. a kind of model data collection establishes cloud system, which is characterized in that including:
Server is clustered, the data concentrated for the data characteristics according to selection to data cluster, and according to cluster result
Classification marker is carried out to the data in the data set;
Training server for being trained according to the data set after classification marker to initialization disaggregated model, is trained
Disaggregated model;
Test server for testing trained disaggregated model, and establishes model data collection according to test result.
9. a kind of electronic equipment, which is characterized in that the electronic equipment includes:
Transceiver, memory, one or more processors;And
One or more modules, one or more of modules are stored in the memory, and are configured to by described one
A or multiple processors perform, and one or more of modules are included in any the method in perform claim requirement 1-7
The instruction of each step.
10. a kind of computer program product being used in combination with electronic equipment, the computer program product can including computer
The storage medium of reading includes wanting for perform claim with computer program mechanism therein, the computer program mechanism is embedded in
Ask the instruction of each step in any the method in 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810096270.5A CN108197668A (en) | 2018-01-31 | 2018-01-31 | The method for building up and cloud system of model data collection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810096270.5A CN108197668A (en) | 2018-01-31 | 2018-01-31 | The method for building up and cloud system of model data collection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108197668A true CN108197668A (en) | 2018-06-22 |
Family
ID=62591635
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810096270.5A Pending CN108197668A (en) | 2018-01-31 | 2018-01-31 | The method for building up and cloud system of model data collection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108197668A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108985341A (en) * | 2018-06-26 | 2018-12-11 | 四川斐讯信息技术有限公司 | A kind of the training set appraisal procedure and system of neural network model |
CN108985344A (en) * | 2018-06-26 | 2018-12-11 | 四川斐讯信息技术有限公司 | A kind of the training set optimization method and system of neural network model |
CN109241997A (en) * | 2018-08-03 | 2019-01-18 | 硕橙(厦门)科技有限公司 | A kind of method and device generating training set |
CN109299271A (en) * | 2018-10-30 | 2019-02-01 | 腾讯科技(深圳)有限公司 | Training sample generation, text data, public sentiment event category method and relevant device |
CN109656795A (en) * | 2018-12-11 | 2019-04-19 | 北京安和瑞福信息技术有限公司 | Test method and device |
CN110288007A (en) * | 2019-06-05 | 2019-09-27 | 北京三快在线科技有限公司 | The method, apparatus and electronic equipment of data mark |
CN110443310A (en) * | 2019-08-07 | 2019-11-12 | 浙江大华技术股份有限公司 | Compare update method, server and the computer storage medium of analysis system |
CN110569856A (en) * | 2018-08-24 | 2019-12-13 | 阿里巴巴集团控股有限公司 | sample labeling method and device, and damage category identification method and device |
CN111027507A (en) * | 2019-12-20 | 2020-04-17 | 中国建设银行股份有限公司 | Training data set generation method and device based on video data identification |
CN111079653A (en) * | 2019-12-18 | 2020-04-28 | 中国工商银行股份有限公司 | Automatic database sorting method and device |
CN111598120A (en) * | 2020-03-31 | 2020-08-28 | 宁波吉利汽车研究开发有限公司 | Data labeling method, equipment and device |
CN112464966A (en) * | 2019-09-06 | 2021-03-09 | 富士通株式会社 | Robustness estimation method, data processing method, and information processing apparatus |
WO2023207184A1 (en) * | 2022-04-29 | 2023-11-02 | 上海概伦电子股份有限公司 | Data selection method, system and apparatus for extracting device model parameters of integrated circuit |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102682601A (en) * | 2012-05-04 | 2012-09-19 | 南京大学 | Expressway traffic incident detection method based on optimized support vector machine (SVM) |
CN103150454A (en) * | 2013-03-27 | 2013-06-12 | 山东大学 | Dynamic machine learning modeling method based on sample recommending and labeling |
CN103793444A (en) * | 2012-11-05 | 2014-05-14 | 江苏苏大大数据科技有限公司 | Method for acquiring user requirements |
CN107169001A (en) * | 2017-03-31 | 2017-09-15 | 华东师范大学 | A kind of textual classification model optimization method based on mass-rent feedback and Active Learning |
CN107480696A (en) * | 2017-07-12 | 2017-12-15 | 深圳信息职业技术学院 | A kind of disaggregated model construction method, device and terminal device |
-
2018
- 2018-01-31 CN CN201810096270.5A patent/CN108197668A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102682601A (en) * | 2012-05-04 | 2012-09-19 | 南京大学 | Expressway traffic incident detection method based on optimized support vector machine (SVM) |
CN103793444A (en) * | 2012-11-05 | 2014-05-14 | 江苏苏大大数据科技有限公司 | Method for acquiring user requirements |
CN103150454A (en) * | 2013-03-27 | 2013-06-12 | 山东大学 | Dynamic machine learning modeling method based on sample recommending and labeling |
CN107169001A (en) * | 2017-03-31 | 2017-09-15 | 华东师范大学 | A kind of textual classification model optimization method based on mass-rent feedback and Active Learning |
CN107480696A (en) * | 2017-07-12 | 2017-12-15 | 深圳信息职业技术学院 | A kind of disaggregated model construction method, device and terminal device |
Non-Patent Citations (2)
Title |
---|
何俐珺: "基于K-means特征学习的杂草识别研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
张瑜 等: "《多媒体技术与应用》", 31 May 2015 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108985344A (en) * | 2018-06-26 | 2018-12-11 | 四川斐讯信息技术有限公司 | A kind of the training set optimization method and system of neural network model |
CN108985341A (en) * | 2018-06-26 | 2018-12-11 | 四川斐讯信息技术有限公司 | A kind of the training set appraisal procedure and system of neural network model |
CN109241997B (en) * | 2018-08-03 | 2022-03-22 | 硕橙(厦门)科技有限公司 | Method and device for generating training set |
CN109241997A (en) * | 2018-08-03 | 2019-01-18 | 硕橙(厦门)科技有限公司 | A kind of method and device generating training set |
CN110569856B (en) * | 2018-08-24 | 2020-07-21 | 阿里巴巴集团控股有限公司 | Sample labeling method and device, and damage category identification method and device |
CN110569856A (en) * | 2018-08-24 | 2019-12-13 | 阿里巴巴集团控股有限公司 | sample labeling method and device, and damage category identification method and device |
CN109299271A (en) * | 2018-10-30 | 2019-02-01 | 腾讯科技(深圳)有限公司 | Training sample generation, text data, public sentiment event category method and relevant device |
CN109299271B (en) * | 2018-10-30 | 2022-04-05 | 腾讯科技(深圳)有限公司 | Training sample generation method, text data method, public opinion event classification method and related equipment |
CN109656795B (en) * | 2018-12-11 | 2022-06-28 | 北京安和瑞福信息技术有限公司 | Test method and device |
CN109656795A (en) * | 2018-12-11 | 2019-04-19 | 北京安和瑞福信息技术有限公司 | Test method and device |
CN110288007A (en) * | 2019-06-05 | 2019-09-27 | 北京三快在线科技有限公司 | The method, apparatus and electronic equipment of data mark |
CN110443310B (en) * | 2019-08-07 | 2022-08-09 | 浙江大华技术股份有限公司 | Updating method of comparison analysis system, server and computer storage medium |
CN110443310A (en) * | 2019-08-07 | 2019-11-12 | 浙江大华技术股份有限公司 | Compare update method, server and the computer storage medium of analysis system |
CN112464966A (en) * | 2019-09-06 | 2021-03-09 | 富士通株式会社 | Robustness estimation method, data processing method, and information processing apparatus |
CN111079653A (en) * | 2019-12-18 | 2020-04-28 | 中国工商银行股份有限公司 | Automatic database sorting method and device |
CN111079653B (en) * | 2019-12-18 | 2024-03-22 | 中国工商银行股份有限公司 | Automatic database separation method and device |
CN111027507A (en) * | 2019-12-20 | 2020-04-17 | 中国建设银行股份有限公司 | Training data set generation method and device based on video data identification |
CN111598120A (en) * | 2020-03-31 | 2020-08-28 | 宁波吉利汽车研究开发有限公司 | Data labeling method, equipment and device |
WO2023207184A1 (en) * | 2022-04-29 | 2023-11-02 | 上海概伦电子股份有限公司 | Data selection method, system and apparatus for extracting device model parameters of integrated circuit |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108197668A (en) | The method for building up and cloud system of model data collection | |
CN110610193A (en) | Method and device for processing labeled data | |
CN110472665A (en) | Model training method, file classification method and relevant apparatus | |
WO2017088537A1 (en) | Component classification method and apparatus | |
CN111723856B (en) | Image data processing method, device, equipment and readable storage medium | |
CN110378343A (en) | A kind of finance reimbursement data processing method, apparatus and system | |
CN107545038B (en) | Text classification method and equipment | |
CN104796300B (en) | A kind of packet feature extracting method and device | |
CN105989001B (en) | Image search method and device, image search system | |
CN105678344A (en) | Intelligent classification method for power instrument equipment | |
CN110264274A (en) | Objective group's division methods, model generating method, device, equipment and storage medium | |
CN106203103A (en) | The method for detecting virus of file and device | |
CN108241892A (en) | A kind of Data Modeling Method and device | |
CN113961473A (en) | Data testing method and device, electronic equipment and computer readable storage medium | |
CN112036166A (en) | Data labeling method and device, storage medium and computer equipment | |
CN114066848A (en) | FPCA appearance defect visual inspection system | |
CN110851817A (en) | Terminal type identification method and device | |
CN112926621A (en) | Data labeling method and device, electronic equipment and storage medium | |
CN111353689A (en) | Risk assessment method and device | |
CN111414930B (en) | Deep learning model training method and device, electronic equipment and storage medium | |
CN114639152A (en) | Multi-modal voice interaction method, device, equipment and medium based on face recognition | |
CN117216051A (en) | Method and device for determining data labeling quality for training large language model | |
CN111325255B (en) | Specific crowd delineating method and device, electronic equipment and storage medium | |
CN116152609B (en) | Distributed model training method, system, device and computer readable medium | |
CN108427968A (en) | Augmented reality implementation method applied to wechat small routine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180622 |