CN108197668A - The method for building up and cloud system of model data collection - Google Patents

The method for building up and cloud system of model data collection Download PDF

Info

Publication number
CN108197668A
CN108197668A CN201810096270.5A CN201810096270A CN108197668A CN 108197668 A CN108197668 A CN 108197668A CN 201810096270 A CN201810096270 A CN 201810096270A CN 108197668 A CN108197668 A CN 108197668A
Authority
CN
China
Prior art keywords
data
classification
model
disaggregated model
trained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810096270.5A
Other languages
Chinese (zh)
Inventor
梁昊
南冰
南一冰
廉士国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
As Science And Technology (beijing) Co Ltd
Cloudminds Beijing Technologies Co Ltd
Original Assignee
As Science And Technology (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by As Science And Technology (beijing) Co Ltd filed Critical As Science And Technology (beijing) Co Ltd
Priority to CN201810096270.5A priority Critical patent/CN108197668A/en
Publication of CN108197668A publication Critical patent/CN108197668A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application provides the method for building up and cloud system of model data collection, the method includes:It is clustered according to the data that the data characteristics of selection concentrates data, and classification marker is carried out to the data in the data set according to cluster result;Initialization disaggregated model is trained according to the data set after classification marker, obtains trained disaggregated model;Trained disaggregated model is tested, and model data collection is established according to test result.The application can utilize the finally determining model data collection for being used to implement Classification and Identification, remove artificial mark from and its verify spent manpower and time cost, so as to fulfill the automatic marking to model data collection, while effectively promote the efficiency and accuracy of Classification and Identification.

Description

The method for building up and cloud system of model data collection
Technical field
This application involves the method for building up and cloud system in depth learning technology field, more particularly to model data collection.
Background technology
In recent years, had on classifying quality aobvious compared to traditional sorting technique based on the sorting technique of deep learning The breakthrough of work, and classification accuracy is higher, with ResNet, DenseNet even depth learning networks are constantly suggested, and are based on The sorting technique of deep learning is increasingly becoming the main trend of classification application.
Sorting technique based on deep learning mainly by huge training set, in disaggregated model by forward conduction and The continuous training pattern parameter of reverse transfer, obtains trained disaggregated model, to reach ideal classifying quality, and preferably divides Class effect depends on the representativeness of generic and the accuracy of its corresponding label in training set.In order to ensure the standard of label True property, current training set label determine the classification belonging to sample data, but for more multiple by the way of manually marking Miscellaneous classification task, the data volume in training set are mostly 100,000 or even 1,011,000 order of magnitude, and the mode manually marked causes The manpower and time cost of consuming are higher, for example, Magenet image classification contests, the artificial mark of training set label is to rely on What MTurk crowdsourcings platform was realized.
Prior art deficiency is, since artificial notation methods are there are certain subjectivity, to ensure the visitor of annotation results The property seen and accuracy usually also need to supervise annotation process or carry out screening to annotation results, so as to cause artificial The cost higher of mark.Therefore, fixed training set is usually relied on to the training of disaggregated model, and included according to training set Classification realizes Classification and Identification, if desired builds training set to realize the identification to particular category according to specific demand, then causes Manpower and time cost spent by artificial mark and its verification is higher.As it can be seen that base is limited to the dependence of artificial notation methods In the sorting technique all-round popularization in practical applications of deep learning.
Invention content
In view of this, it is existing to solve an embodiment of the present invention is intended to provide the method for building up and cloud system of model data collection Sorting technique based on deep learning excessively relies on artificial notation methods, cause manually to mark and its manpower that verification is spent and The technical issues of time cost is higher.
In one aspect, the embodiment of the present application provides a kind of method for building up of model data collection, including:
It is clustered according to the data that the data characteristics of selection concentrates data, and according to cluster result to the data set In data carry out classification marker;
Initialization disaggregated model is trained according to the data set after classification marker, obtains trained disaggregated model;
Trained disaggregated model is tested, and model data collection is established according to test result.
On the other hand, what the embodiment of the present application provided a kind of model data collection establishes cloud system, including:
Server is clustered, the data concentrated for the data characteristics according to selection to data cluster, and according to cluster As a result classification marker is carried out to the data in the data set;
Training server for being trained according to the data set after classification marker to initialization disaggregated model, is instructed The disaggregated model perfected;
Test server for testing trained disaggregated model, and establishes model data according to test result Collection.
On the other hand, the embodiment of the present application provides a kind of electronic equipment, and the electronic equipment includes:
Transceiver, memory, one or more processors;And
One or more modules, one or more of modules are stored in the memory, and are configured to by institute One or more processors execution is stated, one or more of modules include the finger for performing each step in the above method It enables.
On the other hand, the embodiment of the present application provides a kind of computer program production being used in combination with electronic equipment Product, the computer program product include computer-readable storage medium and are embedded in computer program mechanism therein, institute It states computer program mechanism and includes the instruction for performing each step in the above method.
In order to achieve the above objectives, the technical solution of the embodiment of the present invention is realized in:
In the present embodiment, the data concentrated using the data characteristics of selection to data are clustered, and according to cluster result Classification marker is carried out to the data in the data set, initialization disaggregated model is instructed using the data set after classification marker Practice, obtain trained disaggregated model, and trained disaggregated model is tested, determine eventually for realization Classification and Identification Model data collection, so as to remove artificial mark and its spent manpower and time cost of verification from, realize to model data collection Automatic marking, while effectively promoted Classification and Identification efficiency and accuracy.
Description of the drawings
The specific embodiment of the application is described below with reference to accompanying drawings, wherein:
Fig. 1 is the method schematic that model data collection is established in the embodiment of the present application one;
Fig. 2 is the flow diagram that model data collection is established in the embodiment of the present application one;
Fig. 3 is the cloud system Organization Chart that model data collection is established in the embodiment of the present application two;
Fig. 4 is the structure diagram of electronic equipment in the embodiment of the present application three.
Specific embodiment
Below by way of specific example, the essence for embodiment technical solution that the present invention is furture elucidated.
In order to which the technical solution of the application and advantage is more clearly understood, below in conjunction with attached drawing to the exemplary of the application Embodiment is described in more detail, it is clear that described embodiment be only the application part of the embodiment rather than The exhaustion of all embodiments.And in the absence of conflict, the feature in the embodiment and embodiment in this explanation can be mutual It is combined.
Inventor notices during invention:
The foundation of training set based on artificial notation methods, it usually needs supervised to annotation process or tied to mark Fruit carries out screening, leads to the cost higher manually marked, and for needing to build training set according to specific demand to realize pair The identification of particular category, by causing, the manpower and time cost that manually mark and its verification is spent are higher.As it can be seen that based on depth The sorting technique of study is higher to the dependence manually marked.
Against the above deficiency/and based on this, the embodiment of the present application is proposed to be carried by the data progress feature concentrated to data It takes and clusters, establish data set automatically, the training set part in data set is trained, and root initialization disaggregated model The classification accuracy of trained disaggregated model is tested according to the test set part in data set, to ensure based on depth The objectivity of the model data intensive data classification of habit.
For the ease of the implementation of the application, Examples below illustrates.
Embodiment 1
Fig. 1 shows the method schematic that model data collection is established in the embodiment of the present application one, as shown in Figure 1, this method Including:
Step 101:It is clustered according to the data that the data characteristics of selection concentrates data, and according to cluster result to institute The data stated in data set carry out classification marker.
Step 102:Initialization disaggregated model is trained according to the data set after classification marker, obtains trained point Class model.
Step 103:Trained disaggregated model is tested, and model data collection is established according to test result.
In implementation, the executive agent of above-mentioned steps can be cloud server, and cloud server is according to preset feature database In feature data that data are concentrated carry out feature extraction, the data characteristics of extraction is clustered using clustering algorithm, root According to cluster result, to data characteristics, corresponding data carry out classification marker and according to the data after classification marker to being based on automatically The disaggregated model of deep learning is trained, and trained disaggregated model is tested, if test result satisfaction judges item Part then shows the classification success to data set, directly using the data set after classification marker as model data collection, for being based on depth It spends in the disaggregated model of study, to realize the precise classification of data;If test result is unsatisfactory for Rule of judgment, show to data The classification failure of collection, reacquires new feature, and repeat whole process from preset feature database, until test result meets Rule of judgment establishes model data collection, realizes the precise classification of data.
In implementation, model data collection can be applied to the automatic foundation of image data set, can also be according to actual conditions The automatic foundation for other types data set is needed, for example, the automatic foundation of text data set, this implementation is not to model data The type of intensive data is specifically limited.
In the present embodiment, the data that the data characteristics according to selection concentrates data cluster, including:
The data characteristics as cluster foundation is chosen from preset feature set;
According to selected data characteristics, the data characteristics of data intensive data is extracted;
The data characteristics of extraction is clustered.
In the present embodiment, the data characteristics in the preset feature set includes characterizing color of image, edge, line The artificial setting feature of one or more of reason and the output feature of each layer of disaggregated model.
In implementation, feature set establishes process specifically, color histogram, HOG, Haar etc. are used to characterize image face Artificial each layer of the disaggregated model of setting feature and VGG16, ResNet etc. based on deep learning of color, edge, texture etc. Feature is exported, is added in feature database together, feature database is expressed as { f1, f2..., fk, k is characterized the data characteristics that library includes Quantity.
In implementation, chosen from preset feature set as the data characteristics of cluster foundation and according to selected number According to feature, the data characteristics of data intensive data is extracted, realization process is specially:
1) cluster foundation is randomly selected:Data characteristics f is randomly selected in feature databaseiCluster as data classification marker Foundation, the data characteristics f that will be choseniIt is deleted from feature database, feature database is expressed as { f at this time1, f2..., fi-1, fi+1..., fk}。
2) data characteristics in data set is extracted:Classification marker is carried out to the data in the data set according to cluster result Process specifically, according to the cluster foundation randomly selected, to the feature f for each data that data are concentratediIt extracts, if with The data characteristics f that machine is choseniFor histograms of oriented gradients (HOG:Histogram of Oriented Gradient) etc. it is artificial Feature is set, then is directly extracted according to the extracting method of data characteristics;If the data characteristics f randomly selectediFor mould of classifying The output feature of a certain layer of type, then imported into the disaggregated model based on deep learning using the data in data set as input terminal In, and extract feature of the output feature of respective layer as the data.
In implementation, the data characteristics of extraction is clustered, and according to cluster result to the data in the data set into Row classification marker, realization process are specially:
1) data characteristics clusters:The data characteristics of extraction is clustered using K-Means clustering algorithms, wherein, cluster Centric quantity can be set according to actual needs, be set as m=10 herein, this implementation does not have cluster centre quantity Body limits.
2) classification marker:Automatic classification marker is carried out to the data x that data are concentrated according to cluster result, if data x is corresponded to Feature f be divided in the n-th class, then data x is marked as the n-th class.
In the present embodiment, the data set after the classification marker includes training set, the number according to after classification marker It is trained according to set pair initialization disaggregated model to be trained according to the training set to initialization disaggregated model.
In implementation, the data set after automatic label is divided into training set and test set, such as randomly select in data set 90% data are as training set, and the part of remainder 10% is as test set, according in the preceding classification results marked automatically, utilization Training set part is trained the initialization disaggregated model based on deep learning, obtains trained disaggregated model.Wherein, it instructs Practicing collection and the selection of test set accounting can be set according to actual conditions, this implementation not to the accounting of training set and test set into Row is specific to be limited.
In the present embodiment, the data set after the classification marker includes test set, described to trained disaggregated model It is tested, and model data collection is established according to test result, including:
Trained disaggregated model is tested according to the test set, the classification for obtaining trained disaggregated model is accurate True rate;
Model data collection is established according to the classification accuracy.
In the present embodiment, it is described that trained disaggregated model is tested according to the test set, it is trained Disaggregated model classification accuracy, including:
Classified using trained disaggregated model to the data in the test set, obtain the classification results of data;
The classification results with the classification marker of the test intensive data are compared, obtain trained classification mould The classification accuracy of type.
In implementation, the test process of disaggregated model is specifically, using obtained disaggregated model is trained to the number in test set It is compared according to classifying, and by testing classification result with testing the automatic labeled bracketing result of intensive data, if data x Testing classification result is identical with automatic labeled bracketing result, then it is assumed that data x classification is correct, otherwise it is assumed that data x classification is wrong Accidentally.
Further, according to the testing classification result of data all in test set and automatic labeled bracketing as a result, calculating The obtained disaggregated model of training to the classification accuracy b of entire test set, wherein, classification accuracy can according in test set just The ratio calculation of data count obtains in the data bulk and test set really classified, can also be accurate to classifying according to actual conditions The computational methods of rate are defined, this implementation does not limit the computational methods of classification accuracy specifically.
In the present embodiment, it is described that model data collection is established according to the classification accuracy, including:
If the classification accuracy is more than setting value, pattern number is generated according to the classification marker of the test intensive data According to collection;
If the classification accuracy is less than or equal to setting value, the data characteristics as cluster foundation is chosen again.
In implementation, the realization process of model data collection is established according to the classification accuracy specifically, will be calculated Classification accuracy b is compared with preset threshold value a, if b>A then generates model data according to automatic labeled bracketing result Collection;Otherwise, from deleting data characteristics fiFeature database { f1, f2..., fi-1, fi+1..., fkIn choose again data characteristics work For the cluster foundation of data classification marker, and whole process is repeated, until test result meets b>A generates model data collection.
The application is by taking the application scenarios established automatically of image data set as an example, and Fig. 2 shows in the embodiment of the present application one The flow diagram that model data collection is established, as shown in Fig. 2, the embodiment of the present application 1 is described in detail.
The embodiment of the present application application range includes but not limited to the automatic foundation based on image data set, with image data set It is automatic establish for, idiographic flow is as follows:
Step 201:Establish characteristics of image library.Will artificial setting feature and each layer of disaggregated model output feature, one And be added in characteristics of image library, characteristics of image library is expressed as { f1, f2..., fk, k is the image data that characteristics of image library includes The quantity of feature.
Step 202:By randomly selecting cluster foundation, extraction image data concentrates the feature of image data.It specifically includes:
1) image data feature is randomly selected:Image data feature f is randomly selected in characteristics of image libraryiAs data point The cluster foundation of class label, by the image data feature f of selectioniIt is deleted from characteristics of image library, characteristics of image library represents at this time For { f1, f2..., fi-1, fi+1..., fk}。
2) the image data feature that extraction image data is concentrated:According to the cluster foundation randomly selected, to image data set In each image data feature fiIt extracts.
Step 203:The image data feature of extraction is clustered, and is classified according to cluster result to image data Label.It specifically includes:
1) feature clustering:The image data feature of extraction is clustered using K-Means clustering algorithms.
2) classification marker:Automatic classification marker is carried out to image data according to cluster result, if the corresponding figures of image data x As data characteristics f is divided in the n-th class, then image data x is marked as the n-th class.
Step 204:Image classification model training.Image data set after automatic label is divided into training set and test set, According in the preceding classification results marked automatically, initialisation image disaggregated model is trained using training set part, is instructed The image classification model perfected.
Step 205:The image classification model obtained to training is tested, and the classification for obtaining image classification model is accurate Rate, and final model data collection is determined according to classification accuracy.It specifically includes:
1) image classification model measurement:The image data in test set is carried out using the image classification model that training obtains Classification, and testing classification result is compared with automatic labeled bracketing result, if the testing classification result of image data x and oneself Dynamic labeled bracketing result is identical, then it is assumed that and image data x classification is correct, otherwise it is assumed that image data x classification errors, thus into Classification accuracy b of the image classification model to entire test set is calculated in one step.
2) judged by the classification accuracy of image classification model, determine final model data collection:By what is be calculated Classification accuracy b is compared with preset threshold value a, if b>A then generates model data according to automatic labeled bracketing result Collection;Otherwise, return to step 202, from deleting image data feature fiFeature database { f1, f2..., fi-1, fi+1..., fkIn weight The new cluster foundation for choosing image data feature as data classification marker.
The preferred embodiment of the above, only the application is not intended to limit the protection domain of the application.
Embodiment 2
Based on same inventive concept, a kind of model data collection is additionally provided in the embodiment of the present application establishes cloud system, by It is similar to a kind of method for building up of model data collection in the principle that these equipment solve the problems, such as, therefore the implementation of these equipment can be with Referring to the implementation of method, overlaps will not be repeated.
What Fig. 3 showed model data collection in the embodiment of the present application two establishes cloud system Organization Chart, as shown in figure 3, model Data set is established cloud system 300 and can be included:
Server 301 is clustered, the data concentrated for the data characteristics according to selection to data cluster, and according to poly- Class result carries out classification marker to the data in the data set;
Training server 302 for being trained according to the data set after classification marker to initialization disaggregated model, obtains Trained disaggregated model;
Test server 303 for testing trained disaggregated model, and establishes pattern number according to test result According to collection.
In the present embodiment, the cluster server 301 includes:
The data characteristics as cluster foundation is chosen from preset feature set;
According to selected data characteristics, the data characteristics of data intensive data is extracted;
The data characteristics of extraction is clustered.
In the present embodiment, the data characteristics in the preset feature set includes characterizing color of image, edge, line The artificial setting feature of one or more of reason and the output feature of each layer of disaggregated model.
In the present embodiment, the data set after the classification marker includes training set, and the training server 302 includes: Initialization disaggregated model is trained according to the training set.
In the present embodiment, the data set after the classification marker includes test set, and the test server 303 includes:
Trained disaggregated model is tested according to the test set, the classification for obtaining trained disaggregated model is accurate True rate;
Model data collection is established according to the classification accuracy.
In the present embodiment, it is described that trained disaggregated model is tested according to the test set, it is trained Disaggregated model classification accuracy, including:
Classified using trained disaggregated model to the data in the test set, obtain the classification results of data;
The classification results with the classification marker of the test intensive data are compared, obtain trained classification mould The classification accuracy of type.
In the present embodiment, it is described that model data collection is established according to the classification accuracy, including:
If the classification accuracy is more than setting value, pattern number is generated according to the classification marker of the test intensive data According to collection;
If the classification accuracy is less than or equal to setting value, the data characteristics as cluster foundation is chosen again.
Embodiment 3
Based on same inventive concept, a kind of electronic equipment is additionally provided in the embodiment of the present application, due to its principle and one kind Establishing for model data collection is similar, therefore its implementation may refer to the implementation of method, and overlaps will not be repeated.
Fig. 4 shows the structure diagram of electronic equipment in the embodiment of the present application three, as shown in figure 4, the electronic equipment Including:Transceiver 401, memory 402, one or more processors 403;And one or more modules, it is one or Multiple modules are stored in the memory, and are configured to be performed by one or more of processors, it is one or Multiple modules include the instruction for performing each step in any above method.
Embodiment 4
Based on same inventive concept, the embodiment of the present application additionally provides a kind of computer journey being used in combination with electronic equipment Sequence product since its principle is similar to a kind of method for building up of model data collection, is implemented to may refer to the implementation of method, Overlaps will not be repeated.The computer program product includes computer-readable storage medium and is embedded in calculating therein Machine procedure mechanism, the computer program mechanism include the instruction for performing each step in any above method.
For convenience of description, each section of apparatus described above is divided into various modules with function and describes respectively.Certainly, exist Implement each module or the function of unit can be realized in same or multiple softwares or hardware during the application.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program Product.Therefore, the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware can be used in the application Apply the form of example.Moreover, the computer for wherein including computer usable program code in one or more can be used in the application The computer program production that usable storage medium is implemented on (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The application is with reference to the flow according to the method for the embodiment of the present application, equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that it can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided The processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that the instruction performed by computer or the processor of other programmable data processing devices is generated for real The device of function specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction generation being stored in the computer-readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps are performed on calculation machine or other programmable devices to generate computer implemented processing, so as in computer or The instruction offer performed on other programmable devices is used to implement in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment of the application has been described, those skilled in the art once know basic creation Property concept, then additional changes and modifications may be made to these embodiments.So appended claims be intended to be construed to include it is excellent It selects embodiment and falls into all change and modification of the application range.

Claims (10)

1. a kind of method for building up of model data collection, which is characterized in that including:
It is clustered according to the data that the data characteristics of selection concentrates data, and according to cluster result in the data set Data carry out classification marker;
Initialization disaggregated model is trained according to the data set after classification marker, obtains trained disaggregated model;
Trained disaggregated model is tested, and model data collection is established according to test result.
2. the method as described in claim 1, which is characterized in that the data that the data characteristics according to selection concentrates data It is clustered, including:
The data characteristics as cluster foundation is chosen from preset feature set;
According to selected data characteristics, the data characteristics of data intensive data is extracted;
The data characteristics of extraction is clustered.
3. method as claimed in claim 2, which is characterized in that the data characteristics in the preset feature set is included for table Levy the artificial setting feature of one or more of color of image, edge, texture and the output spy of each layer of disaggregated model Sign.
4. the method as described in claim 1, which is characterized in that the data set after the classification marker includes training set, described Initialization disaggregated model is trained for according to the training set to initialization classification mould according to the data set after classification marker Type is trained.
5. method as described in claim 1 or 4, which is characterized in that the data set after the classification marker includes test set, institute It states and trained disaggregated model is tested, and model data collection is established according to test result, including:
Trained disaggregated model is tested according to the test set, the classification for obtaining trained disaggregated model is accurate Rate;
Model data collection is established according to the classification accuracy.
6. method as claimed in claim 5, which is characterized in that it is described according to the test set to trained disaggregated model into Row test, obtains the classification accuracy of trained disaggregated model, including:
Classified using trained disaggregated model to the data in the test set, obtain the classification results of data;
The classification results with the classification marker of the test intensive data are compared, obtain trained disaggregated model Classification accuracy.
7. method as claimed in claim 5, which is characterized in that it is described that model data collection is established according to the classification accuracy, Including:
If the classification accuracy is more than setting value, model data is generated according to the classification marker of the test intensive data Collection;
If the classification accuracy is less than or equal to setting value, the data characteristics as cluster foundation is chosen again.
8. a kind of model data collection establishes cloud system, which is characterized in that including:
Server is clustered, the data concentrated for the data characteristics according to selection to data cluster, and according to cluster result Classification marker is carried out to the data in the data set;
Training server for being trained according to the data set after classification marker to initialization disaggregated model, is trained Disaggregated model;
Test server for testing trained disaggregated model, and establishes model data collection according to test result.
9. a kind of electronic equipment, which is characterized in that the electronic equipment includes:
Transceiver, memory, one or more processors;And
One or more modules, one or more of modules are stored in the memory, and are configured to by described one A or multiple processors perform, and one or more of modules are included in any the method in perform claim requirement 1-7 The instruction of each step.
10. a kind of computer program product being used in combination with electronic equipment, the computer program product can including computer The storage medium of reading includes wanting for perform claim with computer program mechanism therein, the computer program mechanism is embedded in Ask the instruction of each step in any the method in 1-7.
CN201810096270.5A 2018-01-31 2018-01-31 The method for building up and cloud system of model data collection Pending CN108197668A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810096270.5A CN108197668A (en) 2018-01-31 2018-01-31 The method for building up and cloud system of model data collection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810096270.5A CN108197668A (en) 2018-01-31 2018-01-31 The method for building up and cloud system of model data collection

Publications (1)

Publication Number Publication Date
CN108197668A true CN108197668A (en) 2018-06-22

Family

ID=62591635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810096270.5A Pending CN108197668A (en) 2018-01-31 2018-01-31 The method for building up and cloud system of model data collection

Country Status (1)

Country Link
CN (1) CN108197668A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985341A (en) * 2018-06-26 2018-12-11 四川斐讯信息技术有限公司 A kind of the training set appraisal procedure and system of neural network model
CN108985344A (en) * 2018-06-26 2018-12-11 四川斐讯信息技术有限公司 A kind of the training set optimization method and system of neural network model
CN109241997A (en) * 2018-08-03 2019-01-18 硕橙(厦门)科技有限公司 A kind of method and device generating training set
CN109299271A (en) * 2018-10-30 2019-02-01 腾讯科技(深圳)有限公司 Training sample generation, text data, public sentiment event category method and relevant device
CN109656795A (en) * 2018-12-11 2019-04-19 北京安和瑞福信息技术有限公司 Test method and device
CN110288007A (en) * 2019-06-05 2019-09-27 北京三快在线科技有限公司 The method, apparatus and electronic equipment of data mark
CN110443310A (en) * 2019-08-07 2019-11-12 浙江大华技术股份有限公司 Compare update method, server and the computer storage medium of analysis system
CN110569856A (en) * 2018-08-24 2019-12-13 阿里巴巴集团控股有限公司 sample labeling method and device, and damage category identification method and device
CN111027507A (en) * 2019-12-20 2020-04-17 中国建设银行股份有限公司 Training data set generation method and device based on video data identification
CN111079653A (en) * 2019-12-18 2020-04-28 中国工商银行股份有限公司 Automatic database sorting method and device
CN111598120A (en) * 2020-03-31 2020-08-28 宁波吉利汽车研究开发有限公司 Data labeling method, equipment and device
CN112464966A (en) * 2019-09-06 2021-03-09 富士通株式会社 Robustness estimation method, data processing method, and information processing apparatus
WO2023207184A1 (en) * 2022-04-29 2023-11-02 上海概伦电子股份有限公司 Data selection method, system and apparatus for extracting device model parameters of integrated circuit

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682601A (en) * 2012-05-04 2012-09-19 南京大学 Expressway traffic incident detection method based on optimized support vector machine (SVM)
CN103150454A (en) * 2013-03-27 2013-06-12 山东大学 Dynamic machine learning modeling method based on sample recommending and labeling
CN103793444A (en) * 2012-11-05 2014-05-14 江苏苏大大数据科技有限公司 Method for acquiring user requirements
CN107169001A (en) * 2017-03-31 2017-09-15 华东师范大学 A kind of textual classification model optimization method based on mass-rent feedback and Active Learning
CN107480696A (en) * 2017-07-12 2017-12-15 深圳信息职业技术学院 A kind of disaggregated model construction method, device and terminal device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682601A (en) * 2012-05-04 2012-09-19 南京大学 Expressway traffic incident detection method based on optimized support vector machine (SVM)
CN103793444A (en) * 2012-11-05 2014-05-14 江苏苏大大数据科技有限公司 Method for acquiring user requirements
CN103150454A (en) * 2013-03-27 2013-06-12 山东大学 Dynamic machine learning modeling method based on sample recommending and labeling
CN107169001A (en) * 2017-03-31 2017-09-15 华东师范大学 A kind of textual classification model optimization method based on mass-rent feedback and Active Learning
CN107480696A (en) * 2017-07-12 2017-12-15 深圳信息职业技术学院 A kind of disaggregated model construction method, device and terminal device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何俐珺: "基于K-means特征学习的杂草识别研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
张瑜 等: "《多媒体技术与应用》", 31 May 2015 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985344A (en) * 2018-06-26 2018-12-11 四川斐讯信息技术有限公司 A kind of the training set optimization method and system of neural network model
CN108985341A (en) * 2018-06-26 2018-12-11 四川斐讯信息技术有限公司 A kind of the training set appraisal procedure and system of neural network model
CN109241997B (en) * 2018-08-03 2022-03-22 硕橙(厦门)科技有限公司 Method and device for generating training set
CN109241997A (en) * 2018-08-03 2019-01-18 硕橙(厦门)科技有限公司 A kind of method and device generating training set
CN110569856B (en) * 2018-08-24 2020-07-21 阿里巴巴集团控股有限公司 Sample labeling method and device, and damage category identification method and device
CN110569856A (en) * 2018-08-24 2019-12-13 阿里巴巴集团控股有限公司 sample labeling method and device, and damage category identification method and device
CN109299271A (en) * 2018-10-30 2019-02-01 腾讯科技(深圳)有限公司 Training sample generation, text data, public sentiment event category method and relevant device
CN109299271B (en) * 2018-10-30 2022-04-05 腾讯科技(深圳)有限公司 Training sample generation method, text data method, public opinion event classification method and related equipment
CN109656795B (en) * 2018-12-11 2022-06-28 北京安和瑞福信息技术有限公司 Test method and device
CN109656795A (en) * 2018-12-11 2019-04-19 北京安和瑞福信息技术有限公司 Test method and device
CN110288007A (en) * 2019-06-05 2019-09-27 北京三快在线科技有限公司 The method, apparatus and electronic equipment of data mark
CN110443310B (en) * 2019-08-07 2022-08-09 浙江大华技术股份有限公司 Updating method of comparison analysis system, server and computer storage medium
CN110443310A (en) * 2019-08-07 2019-11-12 浙江大华技术股份有限公司 Compare update method, server and the computer storage medium of analysis system
CN112464966A (en) * 2019-09-06 2021-03-09 富士通株式会社 Robustness estimation method, data processing method, and information processing apparatus
CN111079653A (en) * 2019-12-18 2020-04-28 中国工商银行股份有限公司 Automatic database sorting method and device
CN111079653B (en) * 2019-12-18 2024-03-22 中国工商银行股份有限公司 Automatic database separation method and device
CN111027507A (en) * 2019-12-20 2020-04-17 中国建设银行股份有限公司 Training data set generation method and device based on video data identification
CN111598120A (en) * 2020-03-31 2020-08-28 宁波吉利汽车研究开发有限公司 Data labeling method, equipment and device
WO2023207184A1 (en) * 2022-04-29 2023-11-02 上海概伦电子股份有限公司 Data selection method, system and apparatus for extracting device model parameters of integrated circuit

Similar Documents

Publication Publication Date Title
CN108197668A (en) The method for building up and cloud system of model data collection
CN110610193A (en) Method and device for processing labeled data
CN110472665A (en) Model training method, file classification method and relevant apparatus
WO2017088537A1 (en) Component classification method and apparatus
CN111723856B (en) Image data processing method, device, equipment and readable storage medium
CN110378343A (en) A kind of finance reimbursement data processing method, apparatus and system
CN107545038B (en) Text classification method and equipment
CN104796300B (en) A kind of packet feature extracting method and device
CN105989001B (en) Image search method and device, image search system
CN105678344A (en) Intelligent classification method for power instrument equipment
CN110264274A (en) Objective group's division methods, model generating method, device, equipment and storage medium
CN106203103A (en) The method for detecting virus of file and device
CN108241892A (en) A kind of Data Modeling Method and device
CN113961473A (en) Data testing method and device, electronic equipment and computer readable storage medium
CN112036166A (en) Data labeling method and device, storage medium and computer equipment
CN114066848A (en) FPCA appearance defect visual inspection system
CN110851817A (en) Terminal type identification method and device
CN112926621A (en) Data labeling method and device, electronic equipment and storage medium
CN111353689A (en) Risk assessment method and device
CN111414930B (en) Deep learning model training method and device, electronic equipment and storage medium
CN114639152A (en) Multi-modal voice interaction method, device, equipment and medium based on face recognition
CN117216051A (en) Method and device for determining data labeling quality for training large language model
CN111325255B (en) Specific crowd delineating method and device, electronic equipment and storage medium
CN116152609B (en) Distributed model training method, system, device and computer readable medium
CN108427968A (en) Augmented reality implementation method applied to wechat small routine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180622