CN107515908A - A kind of data processing method and device - Google Patents
A kind of data processing method and device Download PDFInfo
- Publication number
- CN107515908A CN107515908A CN201710683566.2A CN201710683566A CN107515908A CN 107515908 A CN107515908 A CN 107515908A CN 201710683566 A CN201710683566 A CN 201710683566A CN 107515908 A CN107515908 A CN 107515908A
- Authority
- CN
- China
- Prior art keywords
- cluster
- row
- information
- combination
- field
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of data processing method and device, this method includes:Obtain pending spread-sheet data;The row for including nonnumerical information are inquired about in pending spread-sheet data, the row inquired are digitized processing;Multigroup clustering combination is obtained according to pending spread-sheet data, wherein combination includes at least one cluster field per group cluster;Group cluster combination is extracted, according to corresponding informance in the clustering combination inquiry processing spread-sheet data, corresponding informance is subjected to cluster analysis, obtains the multiple cluster samples specified, counts after each cluster sample accounts for the ratio of corresponding informance and is preserved;Corresponding informance in Statistical Clustering Analysis combination under each cluster field is preserved after ratio shared in each cluster sample respectively;The operation for performing extraction one group cluster combination is returned to, until default multigroup clustering combination is all disposed.It is readable that the present invention improves Data Mining efficiency, statistical efficiency, batch automatic processing capabilities and information.
Description
Technical field
The present embodiments relate to data mining technology, more particularly to a kind of data processing method and device.
Background technology
Recently as the development of big data rapid technological improvement, mining data value be business and government trade management not
The part that can or lack.At present, mining data value generally has both of which:Traditional statistical analysis and new engineering
Practise.
Statistical analysis is exactly common packet and Macro or mass analysis, the result of statistics generally include " and ", " poor ", " average
Value " and the statistics content such as " distribution probability ", " coefficient correlation ", it will usually business decision is supplied in the form of statistical report form
Data foundation of the layer as decision-making.Cluster analysis is unsupervised machine learning algorithm, belongs to the data analysing method of exploration,
Generally, would look like unordered object using cluster analysis to be grouped, sort out, to reach the mesh for more fully understanding research object
's.Objects similarity is higher in cluster result requirement group, and objects similarity is relatively low between group.Subjective, the hardly possible of statistical analysis
To carry out prospective analysis, the usually not quantitative analysis of the result of cluster analysis, further, since lacking specific data point
Analysis, cluster result are difficult to directly instruct decision-making.
In view of the above-mentioned problems, not yet propose effective solution at present.
The content of the invention
The present invention provides a kind of data processing method and device, and Data Mining efficiency, statistical efficiency, batch are improved to realize
Automatic processing capabilities and information are readable.
In a first aspect, the embodiments of the invention provide a kind of data processing method, including:
Obtain pending spread-sheet data;
The row for including nonnumerical information in the pending spread-sheet data are inquired about, the row inquired are digitized
Processing, generation processing spread-sheet data;
Multigroup clustering combination is obtained according to the pending spread-sheet data, wherein combination includes at least one per group cluster
Individual cluster field, each field that clusters is a row field in the pending spread-sheet data;
Group cluster combination is extracted, corresponding informance in spread-sheet data is handled according to clustering combination inquiry is described,
The corresponding informance is subjected to cluster analysis, obtains the multiple cluster samples specified, each cluster sample is counted and accounts for the correspondence
Preserved after the ratio of information;The corresponding informance in the clustering combination under each cluster field is counted respectively described each
Preserved in cluster sample after shared ratio;
The operation for performing extraction one group cluster combination is returned to, until default multigroup clustering combination is all disposed.
Second aspect, the embodiment of the present invention additionally provide a kind of data processing equipment, including:
Pending spread-sheet data acquisition module, for obtaining pending spread-sheet data;
Spread-sheet data digital processing module, for inquiring about in the pending spread-sheet data comprising nonnumeric
The row of information, the row inquired are digitized processing, generation processing spread-sheet data;
Clustering combination acquisition module, for obtaining multigroup clustering combination according to the pending spread-sheet data, wherein
Per group cluster, combination includes at least one cluster field, and each field that clusters is one in the pending spread-sheet data
Row field;
Class statistic analysis module, for extracting group cluster combination, according to the clustering combination inquiry processing electricity
Corresponding informance in sub-table data, the corresponding informance is subjected to cluster analysis, obtains the multiple cluster samples specified, statistics is every
Individual cluster sample is preserved after accounting for the ratio of the corresponding informance;Count pair each clustered in the clustering combination under field
Information is answered to be preserved respectively after ratio shared in each cluster sample;
Loop module, the operation of extraction one group cluster combination is performed for returning, until default multigroup clustering combination is complete
Portion is disposed.
The present invention analyzes process by the way that nonnumerical information in spread-sheet data is digitized into processing, in class statistic
The middle circulation multigroup clustering combination of batch processing, and statistical analysis is carried out for cluster analysis result, solve data results
Readable poor low with all clustering combination efficiency are traveled through present in cluster analysis manually and cluster analysis result lacks quantitative
The problem of analysis and further statistical analysis, improve Data Mining efficiency, statistical efficiency, batch automatic processing capabilities and information
It is readable.
Brief description of the drawings
Fig. 1 a are a kind of flow charts of data processing method in the embodiment of the present invention one;
Fig. 1 b are a kind of schematic diagrames of typical original electron list data in the embodiment of the present invention one;
Fig. 1 c are a kind of schematic diagrames of pending spread-sheet data in the embodiment of the present invention one;
Fig. 1 d are a kind of schematic diagrames of clustering combination in the embodiment of the present invention one;
Fig. 1 e are a kind of schematic diagrames of class statistic analysis result in the embodiment of the present invention one;
Fig. 1 f are a kind of schematic diagrames of class statistic analysis result in the embodiment of the present invention one;
Fig. 1 g are a kind of schematic diagrames of book name form in the embodiment of the present invention one;
Fig. 1 h are that a kind of second worksheet in the embodiment of the present invention one is preserved to the schematic diagram of the first book;
Fig. 1 i are that a kind of 3rd worksheet in the embodiment of the present invention one is preserved to the schematic diagram of the first book;
Fig. 2 is a kind of flow chart of preferable data processing method in the embodiment of the present invention two;
Fig. 3 is a kind of structural representation of data processing equipment in the embodiment of the present invention three.
Embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention, rather than limitation of the invention.It also should be noted that in order to just
Part related to the present invention rather than entire infrastructure are illustrate only in description, accompanying drawing.
Embodiment one
Fig. 1 a are a kind of flow chart for data processing method that the embodiment of the present invention one provides, and the present embodiment is applicable to count
According in excavation, this method can be performed by data processing equipment, and described device is performed by software and/or hardware, this implementation
The technical scheme of example specifically comprises the following steps:
S110, obtain pending spread-sheet data.
Wherein, pending spread-sheet data obtains to original electron list data after pretreatment, pending
Spread-sheet data is the follow-up data basis for carrying out cluster analysis and statistical analysis, and original electron list data is needed into line number
Data preprocess forms pending electronic data, wherein, specific data prediction step is as follows:
Step 1, obtain original electron list data.
Exemplary, it is a typical original electron list data as shown in Figure 1 b, wherein, the first behavior header line,
It show in particular the implication of each row field, and often row represents initial data actual in each sample below.
Specifically, case information is have recorded in Fig. 1 b, wherein, each row field represents to make from left to right successively in header line
Case state, case locations and regions, lost value, means feature, case state, position and incidence of criminal offenses region, below every capable basis
Actual conditions, according to the implication of respective column field, corresponding information is inserted wherein, forms a raw sample data.By upper
N number of (N >=1) raw sample data that the mode of stating is formed just constitutes original electron list data.
Step 2, the row field and the row that class statistic analysis will need not be carried out in the original electron list data
Information deletion corresponding to field.
Exemplary, case feature is analyzed according to original electron list data, judges whether each row field is entered in Fig. 1 b
Row class statistic is analyzed.Because case state and case feature relevance are relatively low, through judging, case state is without above-mentioned point
Analysis, so as to which corresponding information under the row field and the field be deleted from original electron list data.
By judging whether each row field carries out class statistic analysis in advance, reduce follow-up data treating capacity, improve
Data-handling efficiency.
Step 3, by the original electron list data include multiple subfields row field split, generate and wait to locate
Manage spread-sheet data.
It is exemplary, as illustrated in figure 1 c, according to case analysis demand, it is necessary to using the crime time as primary study object,
Carry out specific case characteristic analysis.So as to, further, by crime time row field in Fig. 1 b be specifically split as year, month, day,
Week, hour, field of grading.
By carrying out further deconsolidation process to the row field comprising abundant information so that the class statistic analysis knot of acquisition
Fruit is more accurate.
The row of nonnumerical information are included in S120, the inquiry pending spread-sheet data, the row inquired are carried out
Digitized processing, generation processing spread-sheet data.
Wherein, nonnumerical information includes Chinese character information, null information and null character string information, and digitized processing is by non-number
Word information is converted into digital information.Specific transfer process comprises the following steps:
Step 1, the Chinese character information is grouped according to field, deletes duplicate message in every group, and will not repeat to believe
Breath is ranked up.
Exemplary, as illustrated in figure 1 c, first, by the row of the nonnumerical information in pending spread-sheet data according to work
Case state, means feature, position, crime time _ week are grouped, and delete duplicate message in every group, for example, in crime
Between _ day of week sub-field in, there are a duplicate message appearance on Tuesday, Thursday, Saturday, delete duplicate message, corresponding only to retain one
Item information.Then, according to block form, all nonnumerical informations without duplicate message are ranked up, wherein, sequence
Can with but be not limited to according to phonetic transcriptions of Chinese characters sequencing carry out.For example, according to phonetic transcriptions of Chinese characters sequencing to crime time _ star
Phase field is ranked up, and is ordered as Tuesday, Saturday, Sunday, Thursday, Friday, Monday.Equally, according to Chinese character
Phonetic sequencing is ranked up to means feature field, is ordered as stealing means, is looked for object, preparation means, organizational form.
Step 2, duplicate message does not carry out digital number according to sequence by described in, and the digital number is preserved to the second word
Allusion quotation table variable.
Wherein, dictionary be it is a kind of store data mode, dictionary equivalent to two row array, two row be referred to as keys and
Items, wherein, keys can not be repeated, and items can be repeated.
Exemplary, as illustrated in figure 1 c, row field crime state, means feature, crime time _ week etc. are keys, its
The serial number items of respective column.Optionally, according to the sequence in step 1:Accomplished offence reference numeral is 1 under mode field of committing a crime;Hand
Means are stolen under Duan Tedian fields, look for object, preparation means, organizational form to correspond to coding and be followed successively by 1,2,3,4;During crime
Between _ day of week sub-field next week two, Saturday, Sunday, Thursday, Friday, Monday, reference numeral 1,2,3,4,5,6.
Above-mentioned digital number is preserved to the second dictionary table variable, in the second dictionary table variable, have recorded pending electrical form number
Which row includes the corresponding relation of Chinese character information and digital number in Chinese character information and these row in.
Step 3, the information in the second dictionary table variable, corresponding digital number is converted to by Chinese character information.
Step 4, the null information and null character string information be converted into optional network specific digit.
Exemplary, all null information and null character string information in nonnumerical information are converted to stationary digital, for example,
There is null value in crime mode field and bit field in Fig. 1 c, be just converted into stationary digital, it is preferred that stationary digital
- 1000 are could be arranged to, -2000, -10000, and be not especially limited, ensure that it is differed with digital number used.By
Which row in pending spread-sheet data have recorded and include Chinese character information and this for the second dictionary table variable obtained in step 2
The corresponding relation of Chinese character information and digital number in a little row.Thus, according to the information of the second dictionary table variable, by Chinese character information,
Corresponding digital number is converted to according to the digital number in step 2.As illustrated in figure 1 c, the institute that will occur in mode field of committing a crime
There is accomplished offence information according to the digital number in step 2, be converted into 2, other Chinese character informations of appearance do same place
Reason.
By being digitized processing to the non-mathematical information in pending spread-sheet data, avoid and be additionally required people
Work carries out the work of data processing, simplifies the difficulty of artificial treatment data.Meanwhile directly nonnumerical information is gone according to packet
Corresponding digital number is converted into after weight, and is stored into dictionary corresponding form.According to dictionary information, in the follow-up cluster system of output
During the result that meter analysis obtains, the above-mentioned digital information changed into can return to Chinese character or other nonnumerical informations corresponding to output
(such as null value and null character string), improve the readability of the class statistic analysis result finally obtained.Further, since realize complete
Portion's Data Digital, provide data for follow-up class statistic analysis and support.
Further, the row of nonnumerical information are included in the pending electrical form is inquired about, the row inquired are entered
Before digitized processing, in addition to:
Row field and its sequence number of corresponding row in header line in the pending spread-sheet data are preserved to
One dictionary table variable.
Exemplary, as illustrated in figure 1 c, row field crime state, means feature, position etc. are keys, the sequence of its respective column
Number it is items.By row field crime state, crime locations and regions, lost value, means feature, position, incidence of criminal offenses region, crime
Time _ year, the crime time _ moon, the crime time _ day, the crime time _ week, the crime time _ hour, the crime time _ point, share N
=12 row fields, and its sequence number 1,2 ... of corresponding row, 11,12 preserve to the first dictionary table variable.
Row field information storage in header line into corresponding dictionary format, there are into two effects, first, being easy in down-stream
It is middle to be called using when clustering field composition clustering combination, wherein, cluster field is row field, is specifically, using clustering field
The sequence number composition clustering combination of corresponding row;Second, being easy to call when exporting class statistic analysis result, believe according to dictionary
Breath, corresponding row field information is translated back to automatically original Chinese character row name information so that row field is all in final analysis result
It is original Chinese character row name, improves the readability of analysis result.
S130, multigroup clustering combination obtained according to the pending spread-sheet data, wherein combination includes per group cluster
At least one cluster field, each field that clusters is a row field in the pending spread-sheet data.
Wherein, multigroup clustering combination is obtained according to the pending spread-sheet data, specifically comprised the following steps:
Multiple row fields in step 1, acquisition the first dictionary table variable.
Exemplary, the first dictionary table variable includes crime state, crime locations and regions, lost value, means feature, portion
Position, incidence of criminal offenses region, the crime time _ year, the crime time _ moon, the crime time _ day, the crime time _ week, the crime time _ hour,
The crime time _ point, shared N=12 row field, and its sequence number 1,2 ... of corresponding row, 11,12.
Step 2, the multiple row field is carried out to the multigroup clustering combination of various combination generation, by multigroup clustering combination
Preserve to the first worksheet.
Exemplary,, in a program will be described more with reference to actual conditions according to the N=12 row field obtained in step 1
Row sequence number corresponding to individual row field carries out various forms of combinations, it is assumed that retrievable clustering combination number is T, then 1≤T≤
2N- 1, wherein,As shown in Figure 1 d, it is poly- by the T of acquisition
Class combination is preserved to the first worksheet, is read for the ease of user, the clustering combination presented in the table for each row sequence number from
The corresponding row field that turn is translated.Follow-up cluster analysis will read the worksheet, all clustering combinations of circulation batch processing.
Compared in the prior art, adopting manually, clustering combination is configured one by one, entered one by one
Row cluster analysis, present embodiments provide a kind of by reading the worksheet generated by multigroup clustering combination, circulation batch processing
The mode of all clustering combinations, which greatly accelerate the Data Mining efficiency of cluster analysis.
S140, extraction one group cluster combination, according to corresponding in the clustering combination inquiry processing spread-sheet data
Information, the corresponding informance is subjected to cluster analysis, obtains the multiple cluster samples specified, each cluster sample of statistics accounts for described
Preserved after the ratio of corresponding informance;The corresponding informance in the clustering combination under each cluster field is counted respectively described
Preserved in each cluster sample after shared ratio.
Wherein, cluster can be defined as follows:In data space A, training sample set X is by M given training sample group
Into, wherein, X=(x1,x2,...,xi,...,xM-1,xM), each training sample xi=(xi1, xi2..., xij..., xiN-1,
xiN), i=1,2 ..., M, j=1,2 ..., N, i represent training sample, j represents row field, that is, clusters field.Training sample set
Equivalent to one M × N matrix of X, the final purpose of cluster are that training sample set X is divided into K class, and the foundation of division is training
Similarity between sample.The specific index for representing similarity includes similarity factor and range index, wherein, range index includes
Euclidean distance, Euclidean distance square, manhatton distance, Chebyshev distance, card side apart from etc.." distance " smaller sample is got over
With similitude, " coefficient correlation " bigger sample more has similitude.Clustering method include but is not limited to K-means,
K-medoids, CLARA (Clustering LARge Application), FCM.
Exemplary, the technical scheme of the present embodiment uses K-means++ algorithms, and the corresponding informance is carried out into cluster point
Analysis, obtains the multiple cluster samples specified, specifically comprises the following steps:
Step 1, randomly select from the corresponding informance K and be used as cluster centre.
Step 2, the other information in the corresponding informance distributed into closest cluster according to minimal distance principle
Clustered corresponding to the heart, obtain K cluster sample.
Step 3, the sample average for clustering described K in sample in each cluster sample are as new cluster centre.
Step 4, return to perform and distribute the other information in the corresponding informance to closest according to minimal distance principle
Cluster centre corresponding to cluster, obtain K cluster sample operation, up to cluster centre no longer change when, obtain current K
Individual cluster sample is as the multiple cluster samples specified.
Exemplary, as shown in fig. le, K=5, training sample M=2745 are set, a group cluster of extraction is combined as by gathering
Class field " A=crime times " and " B=crimes locations and regions composition ", after K-mean++ algorithm process, division result is:The
1 class clusters sample, quantity 525;2nd class clusters sample, quantity 496;3rd class clusters sample, quantity 498;4th birdss of the same feather flock together
Class sample, quantity 472;5th class clusters sample, quantity 754.Accordingly, each cluster sample size of statistics accounts for training sample
This ratio, result of calculation are followed successively by 19.12568%, 18.06922%, 18.14208%, 17.1949%, 27.46812%,
Above-mentioned statistical result is preserved into the second worksheet.
As shown in Figure 1 f, the 2nd class of selection cluster sample, quantity 496, wherein, include 11 correspondences under cluster field A
Information, quantity are followed successively by 35,50,64,64,61,52,36,51,31,29,23.Accordingly, the correspondence under Statistical Clustering Analysis field A
Information accounts for the ratio of each cluster sample respectively, and result of calculation is followed successively by 7.056452%, 10.08065%,
12.90323%, 12.90323%, 10.48387%, 7.258065%, 10.28226%, 6.25%, 5.846774%,
4.637097%.2 corresponding informances are included under cluster field B, quantity is followed successively by 262,234.Accordingly, Statistical Clustering Analysis field B
Under corresponding informance account for the ratio of each cluster sample respectively, result of calculation is followed successively by 52.82258%, 47.17742%,
Above-mentioned statistical result is preserved into the 3rd worksheet.
Read for the ease of user, what is presented in above-mentioned second worksheet, the 3rd worksheet is the numeral being converted into
Information returns to Chinese character or other nonnumerical informations (such as null value and null character string) corresponding to output.
Also need to further carry out follow-up statistical compared to analysis is carried out for cluster analysis result in the prior art
Analysis, present embodiments provides method that is a kind of while obtaining cluster analysis and statistic analysis result, by cluster analysis and statistical
Analysis perfectly combines, so as to improve data analysis efficiency.
S150, the operation for performing extraction one group cluster combination is returned to, until default multigroup clustering combination has all been handled
Finish.
The technical scheme of the present embodiment, by the way that nonnumerical information in spread-sheet data is digitized into processing, poly-
The multigroup clustering combination of batch processing is circulated during class statistical analysis, and statistical analysis is carried out for cluster analysis result, is solved
Data results are readable poor low with all clustering combination efficiency are traveled through present in cluster analysis manually and cluster is divided
The problem of result lacks quantitative analysis and further statistical analysis is analysed, it is automatic to improve Data Mining efficiency, statistical efficiency, batch
Disposal ability and information are readable.
On the basis of above-mentioned technical proposal, the corresponding informance point under each cluster field in counting the clustering combination
After not preserved after ratio shared in each cluster sample to the 3rd worksheet, also comprise the following steps:
Second worksheet and the 3rd worksheet are preserved to the first book by each cluster field name
In.
Wherein, the name form of the first book is:Cluster field 1_ cluster field 2_..._ cluster field N_ clusters
Number.
Exemplary, as shown in Figure 1 g, according to work of the name form of book to preservation class statistic analysis result
Book is named, for example, crime time _ crime locations and regions _ 5.
It is named using above-mentioned name form so that during all clustering combinations of circulation batch processing, own
Book will not repeat, meanwhile, the particular content that book includes just can be understood that according to name.
Exemplary, as shown in figure 1h, a group cluster of extraction is combined as by cluster field " A=crime times " and " B=
Crime locations and regions are formed ".Cluster Sheet1 and represent the second worksheet, save each cluster sample size and account for training sample
Ratio.As shown in figure 1i, the 2nd class represents the 3rd worksheet, and the corresponding informance saved under cluster field A and B accounts for the 2nd class respectively
Cluster the ratio of sample.Above-mentioned worksheet Sheet1, the class of worksheet the 2nd are preserved to being named as " the crime time _ crime place area
In the book of domain _ 5 ".
The class statistic generated by multiple clustering combinations analysis as shown in Figure 1 g can be produced under specified saving contents
Book, above-mentioned whole process are that automatic batch is handled, so that user disposably quick and convenient can check cluster
Content in statistic analysis book, greatly enhance data analysis efficiency.
Further, since input and output are electronic form files so that compatibility and readability are all very good.
Embodiment two
Fig. 2 show a kind of flow chart of preferable data processing method of the offer of the embodiment of the present invention two, the present embodiment
It is applicable in data mining, this method can be performed by data processing equipment, and described device is held by software and/or hardware
OK, the technical scheme of the present embodiment specifically comprises the following steps:
S210, original electron list data progress data prediction is obtained, generate pending spread-sheet data.
By being pre-processed to initial data, judge whether each row field carries out class statistic analysis in advance, reduce
Follow-up data treating capacity, improves data-handling efficiency;Further deconsolidation process is carried out to the row field comprising abundant information,
So that the class statistic analysis result obtained is more accurate.
S220, row field in header line in the pending spread-sheet data and its sequence number of corresponding row preserved
To the first dictionary table variable.
Row field information storage in header line into corresponding dictionary format, there are into two effects, first, being easy in down-stream
It is middle to be called using when clustering field composition clustering combination, wherein, cluster field is row field, is specifically, using clustering field
The sequence number composition clustering combination of corresponding row;Second, being easy to call when exporting class statistic analysis result, believe according to dictionary
Breath, corresponding row field information is translated back to automatically original Chinese character row name information so that row field is all in final analysis result
It is original Chinese character row name, improves the readability of analysis result.
The row of nonnumerical information are included in S230, the inquiry pending spread-sheet data, by null information and empty word
Symbol string information replaces with -1000 and Chinese character information is converted into digital number according to the second dictionary table variable.
By being digitized processing to the non-mathematical information in pending spread-sheet data, avoid and be additionally required people
Work carries out the work of data processing, simplifies the difficulty of artificial treatment data.Meanwhile directly nonnumerical information is gone according to packet
Corresponding digital number is converted into after weight, and is stored into dictionary corresponding form.According to dictionary information, in the follow-up cluster system of output
During the result that meter analysis obtains, the above-mentioned digital information changed into can return to Chinese character or other nonnumerical informations corresponding to output
(such as null value and null character string), improve the readability of the class statistic analysis result finally obtained.Further, since realize complete
Portion's Data Digital, provide data for follow-up class statistic analysis and support.
Multiple row fields in S240, acquisition the first dictionary table variable, different groups are carried out by the multiple row field
Symphysis is preserved to the first worksheet into multigroup clustering combination.
Read for the ease of user, the corresponding row that the clustering combination presented in the table is translated automatically for the sequence number of each row
Field.Follow-up cluster analysis will read the worksheet, all clustering combinations of circulation batch processing.
Compared in the prior art, adopting manually, clustering combination is configured one by one, entered one by one
Row cluster analysis, present embodiments provide a kind of by reading the worksheet generated by multigroup clustering combination, circulation batch processing
The mode of all clustering combinations, which greatly accelerate the Data Mining efficiency of cluster analysis.
S250, extraction one group cluster combination, according to corresponding in the clustering combination inquiry processing spread-sheet data
Information, the corresponding informance is subjected to cluster analysis, obtains the multiple cluster samples specified.
The ratio that the corresponding informance is accounted in each cluster sample of S260, statistics is preserved to the second worksheet.
Corresponding informance in S270, the statistics clustering combination under each cluster field is respectively in each cluster sample
In preserved to the 3rd worksheet after shared ratio.
Read for the ease of user, what is presented in above-mentioned second worksheet, the 3rd worksheet is the numeral being converted into
Information returns to Chinese character or other nonnumerical informations (such as null value and null character string) corresponding to output.
Also need to further carry out follow-up statistical compared to analysis is carried out for cluster analysis result in the prior art
Analysis, present embodiments provides method that is a kind of while obtaining cluster analysis and statistic analysis result, by cluster analysis and statistical
Analysis perfectly combines, so as to improve data analysis efficiency.
S280, by second worksheet and the 3rd worksheet preserve to by it is each cluster field name the first work
Make book.
It is named using above-mentioned name form so that during all clustering combinations of circulation batch processing, own
Book will not repeat, meanwhile, the particular content that book includes just can be understood that according to name.Above-mentioned whole process
Be that automatic batch is handled, so that user can disposably it is quick and convenient check class statistic analysis book in it is interior
Hold, greatly enhance data analysis efficiency.Further, since input and output are electronic form files so that compatible
Property and it is readable all very good.
S290, the operation for performing extraction one group cluster combination is returned to, until default multigroup clustering combination has all been handled
Finish.
The technical scheme of the present embodiment, by the way that nonnumerical information in spread-sheet data is digitized into processing, poly-
The multigroup clustering combination of batch processing is circulated during class statistical analysis, and statistical analysis is carried out for cluster analysis result, is solved
Data results are readable poor low with all clustering combination efficiency are traveled through present in cluster analysis manually and cluster is divided
The problem of result lacks quantitative analysis and further statistical analysis is analysed, it is automatic to improve Data Mining efficiency, statistical efficiency, batch
Disposal ability and information are readable.
Embodiment three
Fig. 3 show a kind of structural representation of data processing equipment of the offer of the embodiment of the present invention three, the tool of the device
Body structure is as follows:
Pending spread-sheet data acquisition module 310, for obtaining pending spread-sheet data.
Exemplary, the pending spread-sheet data acquisition module 310, it is specifically used for:
Obtain original electron list data.
The row field and the row field pair of class statistic analysis will need not be carried out in the original electron list data
The information deletion answered.
The row field for including multiple subfields in the original electron list data is split, generates pending electronics
List data.
By judging whether each row field carries out class statistic analysis in advance, reduce follow-up data treating capacity, improve
Data-handling efficiency.By carrying out further deconsolidation process to the row field comprising abundant information so that the cluster system of acquisition
It is more accurate to count analysis result.
Spread-sheet data digital processing module 320, for inquiring about in the pending spread-sheet data comprising non-
The row of digital information, the row inquired are digitized processing, generation processing spread-sheet data.
Wherein, the nonnumerical information includes Chinese character information, null information and null character string information.
Exemplary, the spread-sheet data digital processing module 320, it is specifically used for:
The Chinese character information is grouped according to field, deletes duplicate message in every group, and duplicate message is carried out
Sequence.
Duplicate message does not carry out digital number according to sequence by described in, and the digital number is preserved to the second dictionary table and become
Amount.
According to the information in the second dictionary table variable, Chinese character information is converted into corresponding digital number.
The null information and null character string information are converted into optional network specific digit.
By being digitized processing to the non-mathematical information in pending spread-sheet data, avoid and be additionally required people
Work carries out the work of data processing, simplifies the difficulty of artificial treatment data.Meanwhile directly nonnumerical information is gone according to packet
Corresponding digital number is converted into after weight, and is stored into dictionary corresponding form.According to dictionary information, in the follow-up cluster system of output
During the result that meter analysis obtains, the above-mentioned digital information changed into can return to Chinese character or other nonnumerical informations corresponding to output
(such as null value and null character string), improve the readability of the class statistic analysis result finally obtained.Further, since realize complete
Portion's Data Digital, provide data for follow-up class statistic analysis and support.In the spread-sheet data digitized processing
The row of nonnumerical information are included in pending spread-sheet data described in module polls, the row inquired are digitized processing
Before, in addition to the first dictionary table generation module 300, it is specifically used for:
Row field and its sequence number of corresponding row in header line in the pending spread-sheet data are preserved to
One dictionary table variable.
Row field information storage in header line into corresponding dictionary format, there are into two effects, first, being easy in down-stream
It is middle to be called using when clustering field composition clustering combination, wherein, cluster field is row field, is specifically, using clustering field
The sequence number composition clustering combination of corresponding row;Second, being easy to call when exporting class statistic analysis result, believe according to dictionary
Breath, corresponding row field information is translated back to automatically original Chinese character row name information so that row field is all in final analysis result
It is original Chinese character row name, improves the readability of analysis result.
Clustering combination acquisition module 330, for obtaining multigroup clustering combination according to the pending spread-sheet data, its
In per group cluster, combination include at least one cluster field, each cluster field is one in the pending spread-sheet data
Individual row field.
Exemplary, the clustering combination acquisition module 330, it is specifically used for:
Obtain multiple row fields in the first dictionary table variable.
The multiple row field is subjected to various combination and generates multigroup clustering combination, by multigroup clustering combination preserve to
First worksheet.
Compared in the prior art, adopting manually, clustering combination is configured one by one, entered one by one
Row cluster analysis, present embodiments provide a kind of by reading the worksheet generated by multigroup clustering combination, circulation batch processing
The mode of all clustering combinations, which greatly accelerate the Data Mining efficiency of cluster analysis.
Class statistic analysis module 340, for extracting group cluster combination, the processing is inquired about according to the clustering combination
Corresponding informance in spread-sheet data, the corresponding informance is subjected to cluster analysis, obtains the multiple cluster samples specified, is counted
Each cluster sample is preserved after accounting for the ratio of the corresponding informance;Count in the clustering combination and each cluster under field
Corresponding informance is preserved after ratio shared in each cluster sample respectively.
Exemplary, the class statistic analysis module 340, it is specifically used for:
K are randomly selected from the corresponding informance and is used as cluster centre.
Other information in the corresponding informance is distributed to closest cluster centre according to minimal distance principle and corresponded to
Cluster, obtain K cluster sample.
The sample average that described K is clustered in sample in each cluster sample is as new cluster centre.
Return to perform and distribute the other information in the corresponding informance to closest cluster according to minimal distance principle
Clustered corresponding to center, obtain the operation of K cluster sample, until when cluster centre no longer changes, obtain K current cluster
Sample is as the multiple cluster samples specified.
Exemplary, the class statistic analysis module 340, it is additionally operable to:
The ratio that each cluster sample of statistics accounts for the corresponding informance is preserved to the second worksheet;By described in statistics
Each ratio shared in each cluster sample preserves to the corresponding informance under cluster field respectively in clustering combination
Three worksheets.
Read for the ease of user, what is presented in above-mentioned second worksheet, the 3rd worksheet is the numeral being converted into
Information returns to Chinese character or other nonnumerical informations (such as null value and null character string) corresponding to output.
Also need to further carry out follow-up statistical compared to analysis is carried out for cluster analysis result in the prior art
Analysis, present embodiments provides method that is a kind of while obtaining cluster analysis and statistic analysis result, by cluster analysis and statistical
Analysis perfectly combines, so as to improve data analysis efficiency.
Loop module 350, the operation of extraction one group cluster combination is performed for returning, until default multigroup clustering combination
All it is disposed.
The technical scheme of the present embodiment, by the way that nonnumerical information in spread-sheet data is digitized into processing, poly-
The multigroup clustering combination of batch processing is circulated during class statistical analysis, and statistical analysis is carried out for cluster analysis result, is solved
Data results are readable poor low with all clustering combination efficiency are traveled through present in cluster analysis manually and cluster is divided
The problem of result lacks quantitative analysis and further statistical analysis is analysed, it is automatic to improve Data Mining efficiency, statistical efficiency, batch
Disposal ability and information are readable.
In the technology of above-mentioned technical proposal, the class statistic analysis module 350, it is additionally operable to:
Corresponding informance under each cluster field in counting the clustering combination is respectively in each cluster sample
Preserved after shared ratio to the 3rd worksheet, second worksheet and the 3rd worksheet are preserved to by each
In the first book for clustering field name.
It is named using above-mentioned name form so that during all clustering combinations of circulation batch processing, own
Book will not repeat, meanwhile, the particular content that book includes just can be understood that according to name.In specified preservation
The class statistic generated by multiple clustering combinations can be produced under catalogue and analyzes book, above-mentioned whole process is at automatic batch
Reason, so that user disposably quick and convenient can check that class statistic analyzes the content in book, largely
Improve data analysis efficiency.
Further, since input and output are electronic form files so that compatibility and readability are all very good.
Pay attention to, above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that
The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art various obvious changes,
Readjust and substitute without departing from protection scope of the present invention.Therefore, although being carried out by above example to the present invention
It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also
Other more equivalent embodiments can be included, and the scope of the present invention is determined by scope of the appended claims.
Claims (12)
- A kind of 1. data processing method, it is characterised in that including:Obtain pending spread-sheet data;The row for including nonnumerical information in the pending spread-sheet data are inquired about, the row inquired are digitized place Reason, generation processing spread-sheet data;Multigroup clustering combination is obtained according to the pending spread-sheet data, wherein combination is comprising at least one poly- per group cluster Class field, each field that clusters is a row field in the pending spread-sheet data;Group cluster combination is extracted, according to corresponding informance in the clustering combination inquiry processing spread-sheet data, by institute State corresponding informance and carry out cluster analysis, obtain the multiple cluster samples specified, count each cluster sample and account for the corresponding informance Ratio after preserved;The corresponding informance each clustered in the clustering combination under field is counted each to cluster described respectively Preserved in sample after shared ratio;The operation for performing extraction one group cluster combination is returned to, until default multigroup clustering combination is all disposed.
- 2. according to the method for claim 1, it is characterised in that pending spread-sheet data is obtained, including:Obtain original electron list data;It need not will be carried out in the original electron list data corresponding to the row field and the row field of class statistic analysis Information deletion;The row field for including multiple subfields in the original electron list data is split, generates pending electrical form Data.
- 3. according to the method for claim 1, it is characterised in that include non-number in the inquiry pending spread-sheet data The row of word information, before the row inquired are digitized into processing, in addition to:Row field and its sequence number of corresponding row in header line in the pending spread-sheet data are preserved to first Dictionary table variable;Multigroup clustering combination is obtained according to the pending spread-sheet data, including:Obtain multiple row fields in the first dictionary table variable;The multiple row field is subjected to various combination and generates multigroup clustering combination, multigroup clustering combination is preserved to first Worksheet.
- 4. according to the method for claim 1, it is characterised in that the nonnumerical information includes Chinese character information, null information With null character string information;The row for including nonnumerical information in the pending spread-sheet data are inquired about, the row inquired are digitized place Reason, generation processing spread-sheet data, including:The Chinese character information is grouped according to field, deletes duplicate message in every group, and duplicate message is ranked up;Duplicate message does not carry out digital number according to sequence by described in, and the digital number is preserved to the second dictionary table variable;According to the information in the second dictionary table variable, Chinese character information is converted into corresponding digital number;The null information and null character string information are converted into optional network specific digit.
- 5. according to the method for claim 1, it is characterised in that the corresponding informance is subjected to cluster analysis, obtains and specifies Multiple cluster samples, including:K are randomly selected from the corresponding informance and is used as cluster centre;Other information in the corresponding informance is distributed to corresponding to closest cluster centre according to minimal distance principle and gathered Class, obtain K cluster sample;The sample average that described K is clustered in sample in each cluster sample is as new cluster centre;Return to perform and distribute the other information in the corresponding informance to closest cluster centre according to minimal distance principle Corresponding cluster, the operation of K cluster sample is obtained, until when cluster centre no longer changes, obtain K current cluster sample As specified multiple cluster samples.
- 6. according to the method for claim 1, it is characterised in that each cluster sample of statistics is accounted for into the corresponding informance Ratio is preserved to the second worksheet;By the corresponding informance each clustered in the clustering combination of statistics under field respectively described Shared ratio is preserved to the 3rd worksheet in each cluster sample;Corresponding informance under each cluster field in counting the clustering combination is shared in each cluster sample respectively Ratio after preserve to after the 3rd worksheet, in addition to:Second worksheet and the 3rd worksheet are preserved into the first book by each cluster field name.
- A kind of 7. data processing equipment, it is characterised in that including:Pending spread-sheet data acquisition module, for obtaining pending spread-sheet data;Spread-sheet data digital processing module, nonnumerical information is included in the pending spread-sheet data for inquiring about Row, the row inquired are digitized processing, generation processing spread-sheet data;Clustering combination acquisition module, for obtaining multigroup clustering combination according to the pending spread-sheet data, wherein every group Clustering combination includes at least one cluster field, and each field that clusters is a row word in the pending spread-sheet data Section;Class statistic analysis module, for extracting group cluster combination, the processing electronic watch is inquired about according to the clustering combination Corresponding informance in lattice data, the corresponding informance is subjected to cluster analysis, obtains the multiple cluster samples specified, count each poly- Class sample is preserved after accounting for the ratio of the corresponding informance;Count the corresponding letter each clustered in the clustering combination under field Preserved after ceasing ratio shared in each cluster sample respectively;Loop module, the operation of extraction one group cluster combination is performed for return, until at default multigroup clustering combination is whole Reason finishes.
- 8. device according to claim 7, it is characterised in that the pending spread-sheet data acquisition module, be used for:Obtain original electron list data;It need not will be carried out in the original electron list data corresponding to the row field and the row field of class statistic analysis Information deletion;The row field for including multiple subfields in the original electron list data is split, generates pending electrical form Data.
- 9. device according to claim 7, it is characterised in that also including the first dictionary table variable generating module, for The spread-sheet data digital processing module inquires about the row for including nonnumerical information in the pending spread-sheet data, Before the row inquired are digitized into processing,Row field and its sequence number of corresponding row in header line in the pending spread-sheet data are preserved to the first word Allusion quotation table variable;The clustering combination acquisition module, is used for:Obtain multiple row fields in the first dictionary table variable;The multiple row field is subjected to various combination and generates multigroup clustering combination, multigroup clustering combination is preserved to first Worksheet.
- 10. device according to claim 7, it is characterised in that the nonnumerical information includes Chinese character information, null information With null character string information;The spread-sheet data digital processing module, is used for:The Chinese character information is grouped according to field, deletes duplicate message in every group, and duplicate message is ranked up;Duplicate message does not carry out digital number according to sequence by described in, and the digital number is preserved to the second dictionary table variable;According to the information in the second dictionary table variable, Chinese character information is converted into corresponding digital number;The null information and null character string information are converted into optional network specific digit.
- 11. device according to claim 7, it is characterised in that the class statistic analysis module, be used for:K are randomly selected from the corresponding informance and is used as cluster centre;Other information in the corresponding informance is distributed to corresponding to closest cluster centre according to minimal distance principle and gathered Class, obtain K cluster sample;The sample average that described K is clustered in sample in each cluster sample is as new cluster centre;Return to perform and distribute the other information in the corresponding informance to closest cluster centre according to minimal distance principle Corresponding cluster, the operation of K cluster sample is obtained, until when cluster centre no longer changes, obtain K current cluster sample As specified multiple cluster samples.
- 12. device according to claim 7, it is characterised in that the class statistic analysis module, be used for:The ratio that each cluster sample of statistics accounts for the corresponding informance is preserved to the second worksheet;By the cluster of statistics Each ratio shared in each cluster sample preserves to the 3rd work the corresponding informance under cluster field respectively in combination Make table;The class statistic analysis module is additionally operable to:Corresponding informance point under each cluster field in counting the clustering combination After not preserved after ratio shared in each cluster sample to the 3rd worksheet, by second worksheet and described 3rd worksheet is preserved into the first book by each cluster field name.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710683566.2A CN107515908A (en) | 2017-08-11 | 2017-08-11 | A kind of data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710683566.2A CN107515908A (en) | 2017-08-11 | 2017-08-11 | A kind of data processing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107515908A true CN107515908A (en) | 2017-12-26 |
Family
ID=60722127
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710683566.2A Pending CN107515908A (en) | 2017-08-11 | 2017-08-11 | A kind of data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107515908A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109359054A (en) * | 2018-12-18 | 2019-02-19 | 浙江诺诺网络科技有限公司 | Generate method, apparatus, equipment and the storage medium of coverage rate statistical report |
CN110719145A (en) * | 2018-07-13 | 2020-01-21 | 深圳兆日科技股份有限公司 | Method and device for sending read receipt and computer readable storage medium |
CN111190898A (en) * | 2019-11-25 | 2020-05-22 | 泰康保险集团股份有限公司 | Data processing method and device, electronic equipment and storage medium |
CN111639077A (en) * | 2020-05-15 | 2020-09-08 | 杭州数梦工场科技有限公司 | Data management method and device, electronic equipment and storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090271359A1 (en) * | 2008-04-24 | 2009-10-29 | Lexisnexis Risk & Information Analytics Group Inc. | Statistical record linkage calibration for reflexive and symmetric distance measures at the field and field value levels without the need for human interaction |
CN101770446A (en) * | 2008-12-26 | 2010-07-07 | 北大方正集团有限公司 | Method and system for identifying form in layout file |
CN103020122A (en) * | 2012-11-16 | 2013-04-03 | 哈尔滨工程大学 | Transfer learning method based on semi-supervised clustering |
CN103617249A (en) * | 2013-11-22 | 2014-03-05 | 烟台大学 | Bidirectional clustering detection method for local similarity submatrices in data matrix |
CN103761337A (en) * | 2014-02-18 | 2014-04-30 | 上海锦恩信息科技有限公司 | Method and system for processing unstructured data |
CN103812961A (en) * | 2013-11-01 | 2014-05-21 | 北京奇虎科技有限公司 | Method and device for recognizing Internet protocol (IP) addresses of designated class and defending method and system |
CN104462301A (en) * | 2014-11-28 | 2015-03-25 | 北京奇虎科技有限公司 | Network data processing method and device |
CN105069521A (en) * | 2015-07-24 | 2015-11-18 | 许继集团有限公司 | Photovoltaic power plant output power prediction method based on weighted FCM clustering algorithm |
CN105117810A (en) * | 2015-09-24 | 2015-12-02 | 国网福建省电力有限公司泉州供电公司 | Residential electricity consumption mid-term load prediction method under multistep electricity price mechanism |
US9501737B1 (en) * | 2012-11-09 | 2016-11-22 | Colin James, III | Method and system for prediction of time series by kanban neuron model |
CN106682411A (en) * | 2016-12-22 | 2017-05-17 | 浙江大学 | Method for converting physical examination diagnostic data into disease label |
-
2017
- 2017-08-11 CN CN201710683566.2A patent/CN107515908A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090271359A1 (en) * | 2008-04-24 | 2009-10-29 | Lexisnexis Risk & Information Analytics Group Inc. | Statistical record linkage calibration for reflexive and symmetric distance measures at the field and field value levels without the need for human interaction |
CN101770446A (en) * | 2008-12-26 | 2010-07-07 | 北大方正集团有限公司 | Method and system for identifying form in layout file |
US9501737B1 (en) * | 2012-11-09 | 2016-11-22 | Colin James, III | Method and system for prediction of time series by kanban neuron model |
CN103020122A (en) * | 2012-11-16 | 2013-04-03 | 哈尔滨工程大学 | Transfer learning method based on semi-supervised clustering |
CN103812961A (en) * | 2013-11-01 | 2014-05-21 | 北京奇虎科技有限公司 | Method and device for recognizing Internet protocol (IP) addresses of designated class and defending method and system |
CN103617249A (en) * | 2013-11-22 | 2014-03-05 | 烟台大学 | Bidirectional clustering detection method for local similarity submatrices in data matrix |
CN103761337A (en) * | 2014-02-18 | 2014-04-30 | 上海锦恩信息科技有限公司 | Method and system for processing unstructured data |
CN104462301A (en) * | 2014-11-28 | 2015-03-25 | 北京奇虎科技有限公司 | Network data processing method and device |
CN105069521A (en) * | 2015-07-24 | 2015-11-18 | 许继集团有限公司 | Photovoltaic power plant output power prediction method based on weighted FCM clustering algorithm |
CN105117810A (en) * | 2015-09-24 | 2015-12-02 | 国网福建省电力有限公司泉州供电公司 | Residential electricity consumption mid-term load prediction method under multistep electricity price mechanism |
CN106682411A (en) * | 2016-12-22 | 2017-05-17 | 浙江大学 | Method for converting physical examination diagnostic data into disease label |
Non-Patent Citations (1)
Title |
---|
张林泉: "大数据及其在统计分析中的应用研究", 《哈尔滨职业技术学院学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110719145A (en) * | 2018-07-13 | 2020-01-21 | 深圳兆日科技股份有限公司 | Method and device for sending read receipt and computer readable storage medium |
CN110719145B (en) * | 2018-07-13 | 2022-04-19 | 深圳兆日科技股份有限公司 | Method and device for sending read receipt and computer readable storage medium |
CN109359054A (en) * | 2018-12-18 | 2019-02-19 | 浙江诺诺网络科技有限公司 | Generate method, apparatus, equipment and the storage medium of coverage rate statistical report |
CN111190898A (en) * | 2019-11-25 | 2020-05-22 | 泰康保险集团股份有限公司 | Data processing method and device, electronic equipment and storage medium |
CN111639077A (en) * | 2020-05-15 | 2020-09-08 | 杭州数梦工场科技有限公司 | Data management method and device, electronic equipment and storage medium |
CN111639077B (en) * | 2020-05-15 | 2024-03-22 | 杭州数梦工场科技有限公司 | Data management method, device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107515908A (en) | A kind of data processing method and device | |
CN109948023A (en) | Recommended acquisition methods, device and storage medium | |
CN110197286A (en) | A kind of Active Learning classification method based on mixed Gauss model and sparse Bayesian | |
CN109857871B (en) | User relationship discovery method based on social network mass contextual data | |
CN108984642B (en) | Printed fabric image retrieval method based on Hash coding | |
CN107423820B (en) | Knowledge graph representation learning method combined with entity hierarchy categories | |
DE202015009255U1 (en) | Automatic image organization | |
WO2005001838A1 (en) | Apparatus and method for automatic video summarization using fuzzy one-class support vector machines | |
CN105681908A (en) | Broadcast television system based on individual watching behaviour and personalized programme recommendation method thereof | |
CN113868235A (en) | Big data-based information retrieval and analysis system | |
US11010543B1 (en) | Systems and methods for table extraction in documents | |
Wang et al. | The monkeytyping solution to the youtube-8m video understanding challenge | |
CN116455861B (en) | Big data-based computer network security monitoring system and method | |
CN1669301A (en) | Method and apparatus for optimizing video processing system design using a probabilistic method to fast direct local search | |
CN109345684A (en) | A kind of multinational paper money number recognition methods based on GMDH-SVM | |
CN112784549A (en) | Method, device and storage medium for generating chart | |
CN107451617A (en) | One kind figure transduction semisupervised classification method | |
CN113705215A (en) | Meta-learning-based large-scale multi-label text classification method | |
CN112612948B (en) | Deep reinforcement learning-based recommendation system construction method | |
CN113034193A (en) | Working method for modeling of APP2VEC in wind control system | |
Sun et al. | ESinGAN: Enhanced single-image GAN using pixel attention mechanism for image super-resolution | |
CN110175289A (en) | Mixed recommendation method based on cosine similarity collaborative filtering | |
Kim et al. | Digitalizing scheme of handwritten Hanja historical documents | |
CN108710562A (en) | Merging method, device and the equipment of exception record | |
Keim et al. | Mail explorer-spatial and temporal exploration of electronic mail |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
AD01 | Patent right deemed abandoned | ||
AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20210702 |