CN104063474A - Sample data collection system - Google Patents

Sample data collection system Download PDF

Info

Publication number
CN104063474A
CN104063474A CN201410307397.9A CN201410307397A CN104063474A CN 104063474 A CN104063474 A CN 104063474A CN 201410307397 A CN201410307397 A CN 201410307397A CN 104063474 A CN104063474 A CN 104063474A
Authority
CN
China
Prior art keywords
sample data
database
sample
characteristic
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410307397.9A
Other languages
Chinese (zh)
Inventor
张鹏
张美琦
张爱华
张朝阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing 58 Information Technology Co Ltd
Original Assignee
Beijing 58 Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing 58 Information Technology Co Ltd filed Critical Beijing 58 Information Technology Co Ltd
Priority to CN201410307397.9A priority Critical patent/CN104063474A/en
Publication of CN104063474A publication Critical patent/CN104063474A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Abstract

The invention discloses a sample data collection system. The system comprises a database interface, a database and a feature extraction module; the database interface is used for providing an interface for accessing the database; the database, which is connected with the database interface, is used for storing received sample data via the database interface and sending corresponding sample data according to the request of a user; the feature extraction module, which is connected with the database interface, is used for obtaining the sample data in the database via the database interface, extracting the sample data as feature data according to a preset logic, storing the feature data and sending the corresponding feature data according to the request of the user. By means of the technical scheme of the invention, the sample data collection system can reduce the cost of obtaining sample data, save the time of development and further increase the accuracy and effectiveness of sample data.

Description

Sample Data Collection system
Technical field
The present invention relates to field of computer technology, particularly relate to a kind of Sample Data Collection system.
Background technology
At present, classified information website can be identified by a lot of different systems in the time of identification low-quality information, the method difference of the use of each system, and be by different staff developments.But identifying information howsoever, the unique channel that finds recognition methods is exactly to analyze concrete sample, but finds in real work, how to find representative sample can ensure that the freshness of sample is a very difficult thing simultaneously.
Can provide representative sample by Sample Data Collection, and ensure the abundance of sample size, fresh and accurate, reduce very large workload to follow-up identification work.Sample collection means of the prior art are to collect respectively corresponding sample according to the various problems that need identification.These samples may come from some information out of database, history log, excavation or user's complaint.And each in developing or analyzing, due to the fresh sample of needs, need at every turn by database, history log, excavates some information out or user's complaint and again derives.
As mentioned above, there is following problem in Sample Data Collection of the prior art: the collection that is 1, nonsystematic due to existing collect means, and therefore sample data accuracy rate can not get ensureing; 2 and the sample that utilizes existing collect means to obtain ageing poor, how can not obtain in real time up-to-date sample; 3, because existing sample collection means are to collect corresponding sample for the problem of needs identification, therefore may need the code of taking time and arranging and developing sample drawn for each technology and product; 4, the taxonomic hierarchies imperfection of sample.
Summary of the invention
In view of the above problems, the present invention has been proposed to a kind of Sample Data Collection system that overcomes the problems referred to above or address the above problem is at least in part provided.
The invention provides a kind of Sample Data Collection system, comprising: database interface, for the interface of accessing database is provided; Database, is connected in database interface, for the sample data arriving by database interface storing received, and sends corresponding sample data according to user's request; Feature extraction module, is connected in database interface, for obtain the sample data of database by database interface, according to the logic setting in advance, sample data is extracted as characteristic, and stores, and sends corresponding characteristic according to user's request.
Preferably, said system further comprises: supplementary module, be connected in database, and add up and manage for the sample data that database is stored.
Preferably, above-mentioned supplementary module specifically comprises: revise submodule, for automatically the inaccurate sample data of database being revised; Overtime data are deleted submodule, for to database, storage exceedes the sample data of the schedule time and deletes automatically; Old data are deleted submodule, for automatically database having been extracted and deleted for the sample data of characteristic; Statistics submodule, for moving, the different classes of sample data of database regularly being added up, and sample data amount lower than set in advance threshold value time, send the prompting of sample data quantity not sufficient from trend user.
Preferably, above-mentioned old data delete submodule specifically for: compared the writing time of the respective sample data of storing in the writing time of characteristic and database; If the writing time of characteristic is early than the writing time of sample data, determine retain sample data, if be later than the writing time of sample data the writing time of characteristic, further can judgement supplement new characteristic for this sample data, if can supplement, retain sample data, otherwise delete sample data.
Preferably, said system further comprises: Back Administration Module, be connected in database interface, and for by database interface, the sample data of database being classified, and/or create new sample data classification.
Preferably, above-mentioned Back Administration Module is further used for: according to user's operation, sample data is proofreaded, and revise inaccurate sample data.
Preferably, above-mentioned feature extraction module specifically for: according to the logic setting in advance, collect and/or extract one or more characteristics of sample data, and sample data characteristic of correspondence data are concluded to storage.
Preferably, above-mentioned sample data comprises: sample identification ID and sample classification.
Preferably, above-mentioned characteristic comprises: the description to certain class behavior and the cycle to this class behavior sampling.
Preferably, above-mentioned Sample Data Collection system is used to the information identification of classified information website that sample data and characteristic are provided.
Beneficial effect of the present invention is as follows:
By means of the Sample Data Collection system of the embodiment of the present invention, can reduce the cost that obtains sample data, save the time of exploitation, and accuracy and the actual effect of sample data are further improved, the quantity of sample has also been had to guarantee, for latter products and technology analyze and exploitation in a large amount of help is provided.
Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to better understand technological means of the present invention, and can be implemented according to the content of instructions, and for above and other objects of the present invention, feature and advantage can be become apparent, below especially exemplified by the specific embodiment of the present invention.
Brief description of the drawings
By reading below detailed description of the preferred embodiment, various other advantage and benefits will become cheer and bright for those of ordinary skill in the art.Accompanying drawing is only for the object of preferred implementation is shown, and do not think limitation of the present invention.And in whole accompanying drawing, represent identical parts by identical reference symbol.In the accompanying drawings:
Fig. 1 is the structural representation of the Sample Data Collection system of the embodiment of the present invention;
Fig. 2 is the preferred structure schematic diagram of the Sample Data Collection system of the embodiment of the present invention.
Embodiment
Exemplary embodiment of the present disclosure is described below with reference to accompanying drawings in more detail.Although shown exemplary embodiment of the present disclosure in accompanying drawing, but should be appreciated that and can realize the disclosure and the embodiment that should do not set forth limits here with various forms.On the contrary, it is in order more thoroughly to understand the disclosure that these embodiment are provided, and can be by the those skilled in the art that conveys to complete the scope of the present disclosure.
Ageing poor and may need to take time for each technology and product and arrange and the problem of the code of exploitation sample drawn in order to solve sample data accuracy rate sample low, that utilize existing collect means to obtain that collection that Sample Data Collection means of the prior art are nonsystematic causes, the invention provides a kind of Sample Data Collection system, below in conjunction with accompanying drawing and embodiment, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, does not limit the present invention.
According to embodiments of the invention, a kind of Sample Data Collection system is provided, the Sample Data Collection system of the embodiment of the present invention can be used to the information identification of classified information website that sample data and characteristic are provided.Fig. 1 is the structural representation of the Sample Data Collection system of the embodiment of the present invention, as shown in Figure 1, comprise according to the Sample Data Collection system of the embodiment of the present invention: database interface 10, database 12 and feature extraction module 14, below be described in detail the modules of the embodiment of the present invention.
Database interface 10, for providing the interface of accessing database 12;
Database 12, is connected in database interface 10, for the sample data arriving by database interface 10 storing receiveds, and sends corresponding sample data according to user's request; Wherein, sample data comprises: sample identification ID and sample classification.
Feature extraction module 14, be connected in database interface 10, for obtain the sample data of database 12 by database interface 10, according to the logic setting in advance, sample data extracted as characteristic, and store, send corresponding characteristic according to user's request.Wherein, characteristic comprises: the description to certain class behavior and the cycle to this class behavior sampling.
Particularly, feature extraction module 14 specifically for: according to the logic setting in advance, collect and/or extract one or more characteristics of sample data, and sample data characteristic of correspondence data are concluded to storage.
In order better sample data to be managed, the system of the embodiment of the present invention also comprises supplementary module and Back Administration Module, particularly:
Supplementary module, is connected in database 12, for the sample data of database 12 storages is added up and managed.
Back Administration Module, is connected in database interface 10, for by database interface 10, the sample data of database 12 being classified, and/or creates new sample data classification.
Wherein, Back Administration Module is further used for: according to user's operation, sample data is proofreaded, and revise inaccurate sample data.
Above-mentioned supplementary module specifically comprises:
Revise submodule, for automatically the inaccurate sample data of database 12 being revised;
Overtime data are deleted submodule, for to database, 12 storages exceed the sample data of the schedule time and delete automatically;
Old data are deleted submodule, for automatically database 12 having been extracted and deleted for the sample data of characteristic; Wherein, old data delete submodule specifically for:
Compared the writing time of the respective sample data of storage in the writing time of characteristic and database 12; If the writing time of characteristic is early than the writing time of sample data, determine retain sample data, if be later than the writing time of sample data the writing time of characteristic, further can judgement supplement new characteristic for this sample data, if can supplement, retain sample data, otherwise delete sample data.
Statistics submodule, for automatically the different classes of sample data of database 12 regularly being added up, and sample data amount lower than set in advance threshold value time, send the prompting of sample data quantity not sufficient from trend user.
Below in conjunction with accompanying drawing, the technical scheme of the embodiment of the present invention is described in detail.
Fig. 2 is the preferred structure schematic diagram of the Sample Data Collection system of the embodiment of the present invention, as shown in Figure 2, in embodiments of the present invention, native system is made up of 5 modules, respectively: database, database interface, feature extraction module, supplementary module, Back Administration Module.
Database is connected with database interface and supplementary module, for storing sample information, and such as sample id, sample classification etc.
Database interface is connected with feature extraction module, Back Administration Module and database, for the interface of accessing database is provided, to operations such as database add, deletes, changes, looks into, for example, by database interface, sample information write into Databasce, party in request are obtained to the sample information that is stored in database by database interface.
Feature extraction module, is connected with database interface, obtains the sample information being stored in database by database interface.Feature extraction module by the information of a concrete sample, transforms, concludes or is extracted as feature (this feature can be one or more) according to default logic, such as being extracted as feature A.Wherein a certain feature represents the behavior of a certain class and the cycle to this class behavior sampling etc.Feature extraction module can will transform, conclude or extract the characteristic storage that obtain in the inner in the storage space of portion.Party in request can obtain the feature that is stored in feature extraction module.
For example, in the time that a sample information writes Sample Storehouse, it is only an information, by feature extraction module, collect the different feature of this sample, and transform, conclude or extract and obtain these features, by these characteristic storage, make follow-uply can judge as soon as possible when sample is analyzed.
Supplementary module, is connected with database, adds up and cleans for the sample information that database is stored.Concrete:
1, supplementary module can be revised inaccurate sample data.For example sample information 1 is written in database, but think that through identification this sample information should not be written in database, supplementary module receives delete instruction (comprising this sample information in delete instruction, such as sample id, sample classification etc.) this sample information 1 is deleted from database;
2, supplementary module can be eliminated old data.For example, sample information 1 has been stored and has been exceeded 30 in database, and supplementary module can be deleted this sample information 1.Wherein in the time writing the information of sample, can record the time that this sample information writes.In embodiments of the present invention, can mark this sample information with time marking.
3, when feature extraction module obtains new feature by modes such as conversion, conclusion or extractions, some old data has lost meaning, and supplementary module will be eliminated old data.In actual applications, can eliminate old data by following operation, particularly: supplementary module is by the writing time of the sample information of storing in the writing time of comparative feature and database.If the writing time of feature is early than the writing time of sample information, think that this sample information can retain, as being later than the writing time of sample information the writing time of feature, can need to judge for this feature by relevant information (for example, based on the new characteristic information of described sample data) polishing, if polishing retains this sample information, otherwise delete this sample information.Wherein the method for polishing comprises: write relevant information to feature extraction module.
4, supplementary module is added up sample data.Regularly the data volume of sample is added up for different sample class, and statistics is sent to each party in request according to demand, also can send to party in request the prompting of data volume deficiency.
Back Administration Module, is connected with database interface, and for managing the classification of sample and the information of sample being proofreaded, inaccurate data are revised.Particularly: for example found new sample class.This Back Administration Module is for manually data storehouse being operated interface is provided, classification that can labor management sample by Back Administration Module and the information of sample is proofreaded, and inaccurate data are revised.
In sum, by means of the technical scheme of the embodiment of the present invention, can reduce the cost that obtains sample data, save the time of exploitation, and accuracy and the actual effect of sample data are further improved, the quantity of sample has also been had to guarantee, for latter products and technology analyze and exploitation in a large amount of help is provided.
Obviously, those skilled in the art can carry out various changes and modification and not depart from the spirit and scope of the present invention the present invention.Like this, if these amendments of the present invention and within modification belongs to the scope of the claims in the present invention and equivalent technologies thereof, the present invention is also intended to comprise these changes and modification interior.

Claims (10)

1. a Sample Data Collection system, is characterized in that, comprising:
Database interface, for providing the interface of accessing database;
Database, is connected in described database interface, for the sample data arriving by described database interface storing received, and sends corresponding sample data according to user's request;
Feature extraction module, be connected in described database interface, for obtain the described sample data of described database by described database interface, according to the logic setting in advance, described sample data extracted as characteristic, and store, send corresponding characteristic according to user's request.
2. the system as claimed in claim 1, is characterized in that, described system further comprises:
Supplementary module, is connected in described database, adds up and manages for the described sample data that described database is stored.
3. system as claimed in claim 2, is characterized in that, described supplementary module specifically comprises:
Revise submodule, for automatically the inaccurate sample data of described database being revised;
Overtime data are deleted submodule, delete for the sample data that automatically described database storage is exceeded to the schedule time;
Old data are deleted submodule, for automatically described database having been extracted and deleted for the sample data of characteristic;
Statistics submodule, for automatically the different classes of sample data of described database regularly being added up, and sample data amount lower than set in advance threshold value time, send the prompting of sample data quantity not sufficient from trend user.
4. system as claimed in claim 3, is characterized in that, described old data delete submodule specifically for:
Compared the writing time of the respective sample data of storing in the writing time of characteristic and described database; If the writing time of described characteristic is early than the writing time of described sample data, determine and retain described sample data, if be later than the writing time of described sample data the writing time of described characteristic, further can judgement supplement new characteristic for this sample data, if can supplement, retain described sample data, otherwise delete described sample data.
5. the system as claimed in claim 1, is characterized in that, described system further comprises:
Back Administration Module, is connected in described database interface, for by described database interface, the described sample data of described database being classified, and/or creates new sample data classification.
6. system as claimed in claim 5, is characterized in that, Back Administration Module is further used for: according to user's operation, described sample data is proofreaded, and revise inaccurate sample data.
7. the system as described in any one in claim 1 to 6, it is characterized in that, feature extraction module specifically for: according to the logic setting in advance, collect and/or extract one or more characteristics of described sample data, and described sample data characteristic of correspondence data are concluded to storage.
8. the system as described in any one in claim 1 to 6, is characterized in that, described sample data comprises: sample identification ID and sample classification.
9. the system as described in any one in claim 1 to 6, is characterized in that, described characteristic comprises: the description to certain class behavior and the cycle to this class behavior sampling.
10. the system as described in any one in claim 1 to 6, is characterized in that, described Sample Data Collection system is used to the information identification of classified information website that described sample data and described characteristic are provided.
CN201410307397.9A 2014-06-30 2014-06-30 Sample data collection system Pending CN104063474A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410307397.9A CN104063474A (en) 2014-06-30 2014-06-30 Sample data collection system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410307397.9A CN104063474A (en) 2014-06-30 2014-06-30 Sample data collection system

Publications (1)

Publication Number Publication Date
CN104063474A true CN104063474A (en) 2014-09-24

Family

ID=51551188

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410307397.9A Pending CN104063474A (en) 2014-06-30 2014-06-30 Sample data collection system

Country Status (1)

Country Link
CN (1) CN104063474A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844553A (en) * 2016-12-30 2017-06-13 晶赞广告(上海)有限公司 Data snooping and extending method and device based on sample data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1952929A (en) * 2005-10-20 2007-04-25 关涛 Extraction method and system of structured data of internet based on sample & faced to regime
CN101710331A (en) * 2008-10-23 2010-05-19 中国科学院地理科学与资源研究所 System and method for layering population sample survey sample
CN102142082A (en) * 2011-04-08 2011-08-03 南京邮电大学 Virtual sample based kernel discrimination method for face recognition
CN103763124A (en) * 2013-12-26 2014-04-30 孙伟力 Internet user behavior analyzing and early-warning system and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1952929A (en) * 2005-10-20 2007-04-25 关涛 Extraction method and system of structured data of internet based on sample & faced to regime
CN101710331A (en) * 2008-10-23 2010-05-19 中国科学院地理科学与资源研究所 System and method for layering population sample survey sample
CN102142082A (en) * 2011-04-08 2011-08-03 南京邮电大学 Virtual sample based kernel discrimination method for face recognition
CN103763124A (en) * 2013-12-26 2014-04-30 孙伟力 Internet user behavior analyzing and early-warning system and method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844553A (en) * 2016-12-30 2017-06-13 晶赞广告(上海)有限公司 Data snooping and extending method and device based on sample data
CN106844553B (en) * 2016-12-30 2020-05-01 晶赞广告(上海)有限公司 Data detection and expansion method and device based on sample data

Similar Documents

Publication Publication Date Title
CN105701098B (en) The method and apparatus for generating index for the table in database
CN102902752A (en) Method and system for monitoring log
CN103902653A (en) Method and device for creating data warehouse table blood relationship graph
CN106708912B (en) Junk file identification and management method, identification device, management device and terminal
CN104969181B (en) Repair system, method and the storage equipment that driver variable for damage records
CN104866619A (en) Data monitoring method and system for data warehouse
CN101675415B (en) Program pattern analyzer, pattern appearance status information production method, pattern information generating device, and program
CN113032105B (en) Kubernetes cluster access control method, system and related equipment
CN102591864A (en) Data updating method and device in comparison system
CN103778239A (en) Multi-database data management method and system
CN101882135A (en) Data processing method and device
CN105159950A (en) Mass data real-time sorting and inquiring method and system
US10423580B2 (en) Storage and compression of an aggregation file
US10528534B2 (en) Method and system for deduplicating data
CN104933077B (en) Rule-based multifile information analysis method
CN103136215A (en) Data read-write method and device of storage system
CN101989322B (en) Method and system for automatically extracting memory features of malicious code
CN105183949A (en) Railway main data cleaning method and system
CN104090924B (en) The method for cleaning and device of a kind of private data
CN104063474A (en) Sample data collection system
CN107430546A (en) A kind of file updating method and storage device
CN104317955A (en) File scanning method and device for storage space of mobile terminal
CN104240107A (en) Community data screening system and method thereof
CN106227502A (en) A kind of method and device obtaining hard disk firmware version
CN104992136A (en) Bar code identifying method and apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20140924

RJ01 Rejection of invention patent application after publication