CN107368526A - A kind of data processing method and device - Google Patents
A kind of data processing method and device Download PDFInfo
- Publication number
- CN107368526A CN107368526A CN201710433424.0A CN201710433424A CN107368526A CN 107368526 A CN107368526 A CN 107368526A CN 201710433424 A CN201710433424 A CN 201710433424A CN 107368526 A CN107368526 A CN 107368526A
- Authority
- CN
- China
- Prior art keywords
- link
- industrial chain
- sorted
- data
- target patent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Abstract
The present invention relates to data processing field, more particularly to a kind of data processing method and device, obtains target patent data to be sorted;Disaggregated model based on training in advance, according to the probability subject of the target patent data to be sorted of acquisition and default matching condition, judging target patent data to be sorted, whether the match is successful with the link of predefined industrial chain;Wherein, the disaggregated model of training in advance, it is the patent sample data in the link based on default sorting algorithm and each industrial chain collected, is trained what is obtained;If it is determined that the match is successful, then obtain the corresponding relation of the link of target patent data to be sorted and industrial chain, and according to corresponding relation, target patent data to be sorted is categorized into the link of corresponding industrial chain, so, using the matching way of disaggregated model, for patent data, classified according to the link of industrial chain, meet patent search demand of the user in the link of industrial chain, also improve the efficiency of search.
Description
Technical field
The present invention relates to data processing field, more particularly to a kind of data processing method and device.
Background technology
With the reach of science and progress, the communication technology is checked and searched for for ease of user also with rapid development, for
The classification and processing of mass data are also more and more important.
In the prior art, it is for patent data, in general sorting technique, by patent data according to applicant, the applying date
Or keyword etc. carries out homogeneous classification, so, user can view the patent data belonged under these classifications.
For example, when user search patent, after system receives the retrieval type of user's input, according to retrieval type, by retrieval type
In information, match query is carried out in database, the search result of output matching is simultaneously shown, and so, user can look into
See the patent data related to retrieval type belonged under above-mentioned classification.
But in the prior art, these patent datas can only be classified according to simple standard, for example, according to
Applicant, applying date etc., it is impossible to classified according to the link of industrial chain, also, if the pass set in the retrieval type of user
Key word is incorrect, then may be very more according to the patent data of keyword search, also inaccurate, and user may need to spend more
Time searches the patent data in the link of required industrial chain, very big inconvenience is caused to user, it is impossible to meet user very well
Search need.
The content of the invention
The embodiment of the present invention provides a kind of data processing method and device, to solve to meet that user exists in the prior art
Patent search demand in the link of industrial chain, improve the search efficiency of user.
Concrete technical scheme provided in an embodiment of the present invention is as follows:
A kind of data processing method, including:
Obtain target patent data to be sorted;
Disaggregated model based on training in advance, according to the probability subject of the target patent data to be sorted of acquisition
With default matching condition, judge whether the target patent data to be sorted matches into the link of predefined industrial chain
Work(;Wherein, the disaggregated model of the training in advance, it is in the link based on default sorting algorithm and each industrial chain collected
Patent sample data, be trained what is obtained;
If it is determined that the match is successful, then the target patent data to be sorted pass corresponding with the link of industrial chain is obtained
System, and according to the corresponding relation, the target patent data to be sorted is categorized into the link of corresponding industrial chain.
Preferably, judging whether the target patent data to be sorted matches into the link of predefined industrial chain
Work(, specifically include:
According to the probability subject of the target patent data to be sorted, the target patent to be sorted is calculated respectively
Data belong to the posterior probability values of the link of industrial chain, and judge that the target patent data to be sorted belongs to the ring of industrial chain
Whether the posterior probability values of section are more than predetermined threshold value;Or,
It is default whether the number for the probability subject for judging to extract from the target patent data to be sorted is more than
Number.
Preferably, the disaggregated model of the training in advance, including the first disaggregated model and the second disaggregated model;Wherein,
One disaggregated model represents the model by the training of predefined industrial chain, and the second disaggregated model represents the ring by predefined industrial chain
Save the model of training;
Disaggregated model based on training in advance, according to the probability subject of the target patent data to be sorted of acquisition
With default matching condition, judge whether the target patent data to be sorted matches into the link of predefined industrial chain
Work(, specifically include:
Obtain the probability subject of the target patent data to be sorted;
Based on the first disaggregated model, according to the probability subject and default matching condition, judge described to be sorted
Whether the match is successful with predefined industrial chain for target patent data;
It is determined that with industrial chain after the match is successful, based on the second disaggregated model, according to the probability subject and default
Matching condition, judging the target patent data to be sorted, whether the match is successful with the link in the industrial chain that the match is successful.
Preferably, further comprise:
If it is determined that it fails to match, then according to default feature extracting method, the target patent data to be sorted is obtained
Patent characteristic word, keyword of the patent characteristic word respectively with the link of default industrial chain is matched, obtain institute
State the corresponding relation of the link of target patent data to be sorted and industrial chain.
Preferably, according to default feature extracting method, the patent characteristic of the acquisition target patent data to be sorted
Word, keyword of the patent characteristic word respectively with the link of default industrial chain is matched, obtained described to be sorted
The corresponding relation of the link of target patent data and industrial chain, is specifically included:
According to default feature extracting method, the patent characteristic word of the acquisition target patent data to be sorted;
Respectively by the patent characteristic word compared with the keyword of the link of default industrial chain, and statistics is special respectively
The keyword identical number of the link of sharp Feature Words and industrial chain, determine the link of the most industrial chain of same number;
According to the link of the most industrial chain of the same number, the target patent data to be sorted and industry are obtained
The corresponding relation of the link of chain.
Preferably, the training method of the disaggregated model is:
The patent sample data in the link of each industrial chain is gathered respectively, and extracts the feature of each patent sample data respectively
Part, using the characteristic as characteristic index;
According to the characteristic index of each patent sample data, the span of each characteristic index is divided respectively,
And the division according to the span of each characteristic index and default sorting algorithm, the disaggregated model is trained, is calculated every
One characteristic index belongs to the probability of the link of each industrial chain, and patent sample data is categorized into the industry of corresponding maximum probability
In the link of chain.
Preferably, the characteristic index includes following a kind of or any combination:International Patent classificating number IPC classification, patent
Title, summary, patent characteristic word.
Preferably, before the span of each characteristic index is divided, further comprise:
The value of each characteristic index is normalized in the default span of identical respectively.
A kind of data processing equipment, including:
Acquiring unit, for obtaining target patent data to be sorted;
Matching unit, for the disaggregated model based on training in advance, according to the target patent number to be sorted of acquisition
According to probability subject and default matching condition, judge the target patent data to be sorted whether with predefined industry
The match is successful for the link of chain;Wherein, the disaggregated model of the training in advance is based on default sorting algorithm and is collected each
Patent sample data in the link of industrial chain, is trained what is obtained;
Taxon, for if it is determined that the match is successful, then obtaining the target patent data to be sorted and industrial chain
The corresponding relation of link, and according to the corresponding relation, the target patent data to be sorted is categorized into corresponding industry
In the link of chain.
Preferably, judging whether the target patent data to be sorted matches into the link of predefined industrial chain
Work(, matching unit are specifically used for:
According to the probability subject of the target patent data to be sorted, the target patent to be sorted is calculated respectively
Data belong to the posterior probability values of the link of industrial chain, and judge that the target patent data to be sorted belongs to the ring of industrial chain
Whether the posterior probability values of section are more than predetermined threshold value;Or,
It is default whether the number for the probability subject for judging to extract from the target patent data to be sorted is more than
Number.
Preferably, the disaggregated model of the training in advance, including the first disaggregated model and the second disaggregated model;Wherein,
One disaggregated model represents the model by the training of predefined industrial chain, and the second disaggregated model represents the ring by predefined industrial chain
Save the model of training;
Disaggregated model based on training in advance, according to the probability subject of the target patent data to be sorted of acquisition
With default matching condition, judge whether the target patent data to be sorted matches into the link of predefined industrial chain
Work(, matching unit are specifically used for:
Obtain the probability subject of the target patent data to be sorted;
Based on the first disaggregated model, according to the probability subject and default matching condition, judge described to be sorted
Whether the match is successful with predefined industrial chain for target patent data;
It is determined that with industrial chain after the match is successful, based on the second disaggregated model, according to the probability subject and default
Matching condition, judging the target patent data to be sorted, whether the match is successful with the link in the industrial chain that the match is successful.
Preferably, taxon is further used for:
If it is determined that it fails to match, then according to default feature extracting method, the target patent data to be sorted is obtained
Patent characteristic word, keyword of the patent characteristic word respectively with the link of default industrial chain is matched, obtain institute
State the corresponding relation of the link of target patent data to be sorted and industrial chain.
Preferably, according to default feature extracting method, the patent characteristic of the acquisition target patent data to be sorted
Word, keyword of the patent characteristic word respectively with the link of default industrial chain is matched, obtained described to be sorted
The corresponding relation of the link of target patent data and industrial chain, taxon are specifically used for:
According to default feature extracting method, the patent characteristic word of the acquisition target patent data to be sorted;
Respectively by the patent characteristic word compared with the keyword of the link of default industrial chain, and statistics is special respectively
The keyword identical number of the link of sharp Feature Words and industrial chain, determine the link of the most industrial chain of same number;
According to the link of the most industrial chain of the same number, the target patent data to be sorted and industry are obtained
The corresponding relation of the link of chain.
Preferably, the training method of the disaggregated model is:
Collecting unit, the patent sample data in link for gathering each industrial chain respectively, and each patent is extracted respectively
The characteristic of sample data, using the characteristic as characteristic index;
Training unit, for the characteristic index according to each patent sample data, respectively by the value of each characteristic index
Scope is divided, and the division according to the span of each characteristic index and default sorting algorithm, described point of training
Class model, the probability that each characteristic index belongs to the link of each industrial chain is calculated, patent sample data is categorized into corresponding
In the link of the industrial chain of maximum probability.
Preferably, the characteristic index includes following a kind of or any combination:International Patent classificating number IPC classification, patent
Title, summary, patent characteristic word.
Preferably, before the span of each characteristic index is divided, further comprise:
Normalized unit, for the value of each characteristic index to be normalized into the default value of identical respectively
In the range of.
In the embodiment of the present invention, target patent data to be sorted is obtained;Disaggregated model based on training in advance, according to obtaining
The probability subject of the target patent data to be sorted taken and default matching condition, judge the target to be sorted
Whether the match is successful with the link of predefined industrial chain for patent data;Wherein, the disaggregated model of the training in advance, it is to be based on
Patent sample data in the link of default sorting algorithm and each industrial chain collected, is trained what is obtained;If it is determined that
The match is successful, then obtains the corresponding relation of the link of the target patent data to be sorted and industrial chain, and according to described right
It should be related to, the target patent data to be sorted is categorized into the link of corresponding industrial chain, so, extracted to be sorted
The probability subject of target patent data, based on disaggregated model, target patent data is classified according to probability subject, entered
And patent data is directed to, classified according to the link of industrial chain, meet patent search of the user in the link of industrial chain
Demand, when user searches for patent, the patent data under each industrial chain link can be obtained, is easy to user to check and distinguish,
The accuracy rate of search is improved, meets user's request.
Brief description of the drawings
Fig. 1 is data processing method general introduction flow chart in the embodiment of the present invention;
Fig. 2 is data processing method detail flowchart in the embodiment of the present invention;
Fig. 3 is data processing equipment structural representation in the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, is not whole embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not made
Embodiment, belong to the scope of protection of the invention.
In order to meet patent search demand of the user in industrial chain link, the search efficiency and accuracy of user are improved,
In the embodiment of the present invention, training in advance disaggregated model, by target patent data to be sorted, respectively with predefined each industrial chain
Link matched, and then by target patent data to be sorted be categorized into corresponding to industrial chain link in, meet user
Patent search demand in the link of industrial chain.
The present invention program is described in detail below by specific embodiment, certainly, the present invention is not limited to following reality
Apply example.
As shown in fig.1, in the embodiment of the present invention, the idiographic flow of data processing method is as follows:
Step 100:Obtain target patent data to be sorted.
In practice, during user search patent, for example, keyword can be inputted, the keyword inputted according to user, looked into
Matching is ask, shows search result, also, these search results can be according to, such as applicant, the applying date are classified, user
The patent data belonged under different applicants or applying date etc. can be checked, still, these are only simple secondary classifications, it is impossible to
Classified according to the link of industrial chain, can not reach the demand of user, in the embodiment of the present invention, by patent data according to pre-
The link of the industrial chain of definition is classified, and can meet patent search demand of the user in the link of each industrial chain, more
Added with effect.
Wherein, target patent data to be sorted, it is, for example, new patent data, or user retrieves in retrieval
All patent datas.
Step 110:Disaggregated model based on training in advance, according to the general of the target patent data to be sorted of acquisition
Rate descriptor and default matching condition, judge the target patent data to be sorted whether the ring with predefined industrial chain
The match is successful for section.
Wherein, the disaggregated model of the training in advance, it is based on default sorting algorithm and each industrial chain collected
Patent sample data in link, is trained what is obtained.
Wherein, the disaggregated model of above-mentioned training in advance, including the first disaggregated model and the second disaggregated model;Wherein, first
Disaggregated model represents the model by the training of predefined industrial chain, and the second disaggregated model represents the link by predefined industrial chain
The model of training.
Also, above-mentioned disaggregated model, for example, Naive Bayes Classification Model, it is of course also possible to be other classification moulds
Type, in the embodiment of the present invention, and it is not limited.
That is, in the embodiment of the present invention, the link of industrial chain and its industrial chain is pre-defined, and respectively according to industry
The classification of the link of chain and industrial chain, train classification models.Wherein, the link of industrial chain and industrial chain can be understood as each skill
Art field, sport technique segment or technical method for being related to etc..
Wherein, the definition of the link of the definition for industrial chain and industrial chain, in the embodiment of the present invention, and without limit
System, the classification of the link of existing industrial chain and industrial chain can be used, industrial chain can also be redefined according to actual conditions
And its link of industrial chain.
For example, agricultural industry chain, forestry industry chain, IT industry chain, religion can be divided into according to the industrial chain of trade classification method
Educate industrial chain etc..
In another example for Internet of Things industrial chain, its each link can be defined as:Chip supplier, sensor supplier, nothing
Line module (containing antenna) manufacturer, Virtual network operator (business containing SIM card), platform service business, system and software developer, Intelligent hardware
Manufacturer, the system integration and application service provider.
When performing step 110, specifically include:
First, the probability subject of the target patent data to be sorted is obtained.
Wherein, probability subject represents, the posterior probability for belonging to some classification is more than the word of setting value.
Wherein, setting value is, for example, the value more than 0.5, therefore, generally for a target patent data to be sorted, is carried
The number of the probability subject of taking-up will not be a lot.
First, based on the first disaggregated model, according to the probability subject and default matching condition, treated described in judgement point
Whether the match is successful with predefined industrial chain for the target patent data of class.
For example, target patent data to be sorted is inputted into the first disaggregated model, to target patent number to be sorted
According to being analyzed, matched respectively with each industrial chain.
Then, it is determined that with industrial chain after the match is successful, based on the second disaggregated model, according to the probability subject and pre-
If matching condition, judge whether the target patent data to be sorted matches into the link in the industrial chain that the match is successful
Work(.
That is, in the embodiment of the present invention, first target patent data to be sorted is carried out with each industrial chain respectively
Match somebody with somebody, with industrial chain after the match is successful, then by target patent data to be sorted each link with the industrial chain that the match is successful respectively
Matched.
Specifically, judging target patent data to be sorted, whether the match is successful with the link of predefined industrial chain, can
By be divided into it is following two in a manner of:
First way:According to the probability subject of the target patent data to be sorted, calculate respectively described in treat point
The target patent data of class belongs to the posterior probability values of the link of industrial chain, and judges the target patent data category to be sorted
Whether it is more than predetermined threshold value in the posterior probability values of the link of industrial chain.
The second way:The number for the probability subject for judging to extract from the target patent data to be sorted is
It is no to be more than preset number.
That is, default matching condition can be to judge that the target patent data to be sorted belongs to industrial chain
Whether the posterior probability values of link are more than predetermined threshold value, or, judge what is extracted from the target patent data to be sorted
Whether the number of probability subject is more than preset number.
In the embodiment of the present invention, the first disaggregated model of industrial chain is not only constructed, the production also in industrial chain internal build
Second disaggregated model of the link of industry chain, so, when classifying to target patent data to be sorted, use the first classification mould
The matching way that type and the second disaggregated model are combined, it is more efficient, target patent data to be sorted may finally be classified
Into the link of industrial chain.
Step 120:If it is determined that the match is successful, then the link of the target patent data to be sorted and industrial chain is obtained
Corresponding relation, and according to the corresponding relation, by the target patent data to be sorted be categorized into corresponding to industrial chain ring
In section.
When performing step 120, specifically include:
First, however, it is determined that the match is successful, then obtains pair of the link of the target patent data to be sorted and industrial chain
It should be related to.
Specially:Judge that target patent data to be sorted belongs to the posterior probability values of the link of industrial chain more than default threshold
Value, or, the number for the probability subject for judging to extract from target patent data to be sorted are more than preset number, it is determined that
The match is successful, obtains the corresponding relation of the link of the target patent data to be sorted and industrial chain.
For example, target patent data to be sorted is a, the link of industrial chain has two, respectively link 1 and link 2, point
A is not matched with link 1 and link 2, predetermined threshold value 0.8, if the posterior probability that a belongs to link 1 belongs to link 2 for 0.5, a
Posterior probability be 0.9, then can obtain corresponding relation for a it is corresponding with link 2.
Then, according to the corresponding relation, the target patent data to be sorted is categorized into corresponding industrial chain
In link.
Further, when it fails to match, can also include:If it is determined that it fails to match, then according to default feature extraction side
Method, obtain the patent characteristic word of the target patent data to be sorted, by the patent characteristic word respectively with default industry
The keyword of the link of chain is matched, and obtains the target patent data to be sorted pass corresponding with the link of industrial chain
System.
Specifically:
First, it is determined that it fails to match.
Specially:Judge that target patent data to be sorted belongs to the posterior probability values of the link of industrial chain no more than default
Threshold value, or, the number for the probability subject for judging to extract from target patent data to be sorted are not more than preset number, then
It is determined that it fails to match.
Then, according to default feature extracting method, the patent characteristic word of the acquisition target patent data to be sorted.
Wherein, default feature extracting method, in the embodiment of the present invention, also it is not limited, is, for example, information gain is special
Levy extracting method, word frequency method etc..Also, with these feature extracting methods, the patent characteristic word number extracted can compare probability
Descriptor is more.
Then, respectively by the patent characteristic word compared with the keyword of the link of default industrial chain, and respectively
The keyword identical number of the link of patent characteristic word and industrial chain is counted, determines the ring of the most industrial chain of same number
Section.
Wherein, the keyword of the link of industrial chain, can be obtained according to keyword extraction algorithm of the prior art,
It can be configured with self-defined, be not defined in the embodiment of the present invention.
Finally, according to the link of the most industrial chain of the same number, the target patent data to be sorted is obtained
With the corresponding relation of the link of industrial chain.
That is, by the patent characteristic word with target patent data to be sorted, there are most same number keywords
Industrial chain link, the link of industrial chain corresponding to the target patent data to be sorted as this.
For example, target patent data to be sorted there are 3 patent characteristic words, carried out respectively with the link of 2 industrial chains
To match somebody with somebody, the link of first industrial chain and the link of second industrial chain also have 3 keywords, then by this 3 patent characteristic words,
Respectively compared with the keyword of the two links, for example, having 2 with the keyword identical of first link, with second
The keyword identical of individual link has 3, then establishes pair of the link of target patent data to be sorted and second industrial chain
It should be related to.
So, in the embodiment of the present invention, target patent data is classified according to disaggregated model and probability subject, and
For classification failure, then the patent characteristic word of target patent data to be sorted is extracted, according to the pass of the link of each industrial chain
Keyword is matched, and finally target patent data to be sorted is categorized into the link of industrial chain, is not only increased patent
The classification effectiveness and correctness of data, also meet patent search demand of the user in the link of industrial chain, and user's search is special
When sharp, the patent data under each industrial chain link can be obtained, is easy to user to check and distinguish, improves the accurate of search
Rate, meet user's request.
The training method of disaggregated model is briefly described below, the training method of disaggregated model is:
First, the patent sample data in the link of each industrial chain is gathered respectively, and extracts each patent sample data respectively
Characteristic, using the characteristic as characteristic index.
Wherein, features described above index includes following a kind of or any combination:International Patent classificating number (International
Patent Classification, IPC) classification, patent name, summary, patent characteristic word.Certainly, characteristic index can also be
Other patent characteristic parts, in the embodiment of the present invention, and it is not limited.
Specifically, in units of the link of industrial chain, patent sample data corresponding to the link of each industrial chain is collected, it is right
In each link patent sample data, for example, being evaluated in IPC classification, patent name, summary, patent characteristic word.
Further, in the patent sample data of the link of some industrial chains, patent content feature is obvious, from plucking
Want, be that can be seen that the degree of association with the link of industrial chain in title, then carry patent name, summary as the weight of characteristic index
It is high;For that in the patent sample data of the link of some industrial chains, patent content feature unobvious, then IPC can be classified, specially
Sharp Feature Words improve as the weight of characteristic index.
So, after being characterized the different weights of index imparting, it is trained according to these characteristic indexs, builds disaggregated model
Grader is more accurate.
Then, according to the characteristic index of each patent sample data, the span of each characteristic index is carried out respectively
Division, and the division according to the span of each characteristic index and default sorting algorithm, train the disaggregated model, count
The probability that each characteristic index belongs to the link of each industrial chain is calculated, patent sample data is categorized into corresponding maximum probability
In the link of industrial chain.
Further, before the span of each characteristic index is divided, in addition to:
The value of each characteristic index is normalized in the default span of identical respectively.
For example, default span for (0,1], then can according to it is default normalization formula be handled, for example, returning
One, which changes formula, is:Y=[x-MinValue (x)]/[MaxValue (x)-MinValue (x)], wherein, x is that any one feature refers to
Value before mark normalization, MinValue (x) and MaxValue (x) are respectively x minimum value and maximum, and y is normalization
The value of this feature index afterwards.
So that disaggregated model is Naive Bayes Classification Model as an example, specifically:
1) Bayes kit Spark MLlib are called.
2) span of each characteristic index is divided, and according to the span of each characteristic index
Division, the value for obtaining P (yj > ajk | Ci), P (Ci) and P (yj > ajk) is calculated respectively.
Wherein, yj is the value of j-th of characteristic index, and j=1,2 ..., N, N is the sum of the characteristic index;
Ajk is the value of j-th of characteristic index yj k-th of division points, and 0 < ajk≤1, k is positive integer;Ci indicate whether to belong to for
Some link of industrial chain, i=1,2, C1 be the link for belonging to certain industrial chain, and C2 is the link for being not belonging to certain industrial chain, P (Ci)
Belong to the probability of Ci class links for patent data, P (yj > ajk | Ci) is the value yj > ajk of the characteristic index in Ci class links
The conditional probability of appearance, P (yj > ajk) are characterized the value yj > ajk of index probability.
For example, any one patent sample data can have 4 kinds of characteristic indexs, i.e. Y={ y1, y2, y3, y4 }, Suo Youzhuan
4 kinds of characteristic index composing training data sets of sharp sample data.According to naive Bayesian theorem:P (Ci | yj)=P (yj | Ci) * P
(Ci)/P (yj), wherein, i=1,2, j=1,2 ..., 8.It is general for any one characteristic index yj, P (yj), P (Ci) and condition
Rate P (yj | Ci) can directly calculate from training dataset to be obtained.
3) train classification models, the probability that each characteristic index belongs to the link of each industrial chain is calculated, by patent
Sample data is categorized into the link of the industrial chain of corresponding maximum probability.
It is possible to further set iterations, the accuracy rate of Naive Bayes Classifier is calculated or assesses, when simple shellfish
When the accuracy rate of this grader of leaf is more than the threshold value of setting, final Naive Bayes Classifier is obtained, i.e. classification is completed in training
Model.
In practice, field of distributed file processing HDFS and distribution based on Hadoop distributed system architectures
Computational frame MapReduce is widely used in big data analysis field.Spark is that UC Berkeley AMP lab are increased income
Class Hadoop MapReduce universal parallel framework, Spark possesses advantage possessed by Hadoop MapReduce;But no
Be same as MapReduce is that output result can be stored in internal memory among Job, so as to no longer need to read and write HDFS, therefore
Spark can preferably be applied to the algorithm that data mining and machine learning etc. need the MapReduce of iteration.
Therefore, in the embodiment of the present invention, training to obtain Naive Bayes Classification based on the kit of naive Bayesian
During device, make full use of Spark to be based on the advantages of internal memory calculates, the parallelization interface of Spark MLlib offers is be provided, will be selected
In the characteristic index input Spark MLlib of the sample user taken NB Algorithm interface, and iterations is set,
Spark MLlib automatic Iteratives calculate, and after the completion of iteration, obtain Naive Bayes Classifier, make patent data and industrial chain
The mining process of link matching is more intelligent, and the characteristic index combination of excavation is more comprehensively.So, Spark is made full use of to be based on interior
The advantages of depositing calculating, calculating speed faster, substantially reduce the time of structure Naive Bayes Classifier.
What deserves to be explained is the mode of above-mentioned train classification models, suitable for the first disaggregated model and the second classification mould
Type, so, in the embodiment of the present invention, the first disaggregated model is constructed not only for each industrial chain, also, in order to further improve
The accuracy rate that patent data matches with each link inside industrial chain, inside industrial chain, equally used for the link of industrial chain
Patent sample data is trained, and builds the second disaggregated model, for the link in patent data and industrial chain is carried out
Match somebody with somebody, improve classification accuracy and efficiency.
Further description is made to above-described embodiment using a specific application scenarios below.Referring particularly to Fig. 2
Shown, in the embodiment of the present invention, the implementation procedure of data processing method is specific as follows:
Step 200:Obtain target patent data to be sorted.
Step 201:Disaggregated model based on training in advance, by target patent data to be sorted respectively with predefined production
The link of industry chain is matched.
Wherein, the disaggregated model of training in advance, preferably Naive Bayes Classification Model.Also, point of training in advance
Class model includes the first disaggregated model and the second disaggregated model.
Specifically, based on the first disaggregated model, according to probability subject and default matching condition, mesh to be sorted is judged
Marking patent data, whether the match is successful with predefined industrial chain;It is determined that with industrial chain after the match is successful, based on the second classification
Model, according to probability subject and default matching condition, judge target patent data to be sorted whether with the match is successful
The match is successful for link in industrial chain.
Step 202:Judge whether that the match is successful, if so, then performing step 203, otherwise, then perform step 204.
Step 203:Obtain the corresponding relation of the link of target patent data to be sorted and industrial chain.
Step 204:Obtain the patent characteristic word of target patent data to be sorted.
Step 205:Patent characteristic word is matched with the keyword of the link of default industrial chain, obtained to be sorted
The corresponding relation of the link of target patent data and industrial chain.
Step 206:According to corresponding relation, by target patent data to be sorted be categorized into corresponding to industrial chain link
In.
In the embodiment of the present invention, using second point corresponding to the link of the first disaggregated model and industrial chain corresponding to industrial chain
The matching way that class model is combined, also, the matching way being combined using disaggregated model and Keywords matching, are further carried
High efficiency and accuracy to the classification of target patent data, target patent data can be categorized into corresponding industry exactly
In the link of chain, the purpose that target patent data is classified according to the link of industrial chain is realized.
Based on above-described embodiment, as shown in fig.3, in the embodiment of the present invention, data processing equipment, specifically include:
Acquiring unit 30, for obtaining target patent data to be sorted;
Matching unit 31, for the disaggregated model based on training in advance, according to the target patent to be sorted of acquisition
The probability subject of data and default matching condition, judge the target patent data to be sorted whether with predefined production
The match is successful for the link of industry chain;Wherein, the disaggregated model of the training in advance, based on default sorting algorithm and collect
Patent sample data in the link of each industrial chain, is trained what is obtained;
Taxon 32, for if it is determined that the match is successful, then obtaining the target patent data to be sorted and industrial chain
Link corresponding relation, and according to the corresponding relation, the target patent data to be sorted is categorized into corresponding production
In the link of industry chain.
Preferably, judging whether the target patent data to be sorted matches into the link of predefined industrial chain
Work(, matching unit 31 are specifically used for:
According to the probability subject of the target patent data to be sorted, the target patent to be sorted is calculated respectively
Data belong to the posterior probability values of the link of industrial chain, and judge that the target patent data to be sorted belongs to the ring of industrial chain
Whether the posterior probability values of section are more than predetermined threshold value;Or,
It is default whether the number for the probability subject for judging to extract from the target patent data to be sorted is more than
Number.
Preferably, the disaggregated model of the training in advance, including the first disaggregated model and the second disaggregated model;Wherein,
One disaggregated model represents the model by the training of predefined industrial chain, and the second disaggregated model represents the ring by predefined industrial chain
Save the model of training;
Disaggregated model based on training in advance, according to the probability subject of the target patent data to be sorted of acquisition
With default matching condition, judge whether the target patent data to be sorted matches into the link of predefined industrial chain
Work(, matching unit 31 are specifically used for:
Obtain the probability subject of the target patent data to be sorted;
Based on the first disaggregated model, according to the probability subject and default matching condition, judge described to be sorted
Whether the match is successful with predefined industrial chain for target patent data;
It is determined that with industrial chain after the match is successful, based on the second disaggregated model, according to the probability subject and default
Matching condition, judging the target patent data to be sorted, whether the match is successful with the link in the industrial chain that the match is successful.
Preferably, taxon 32 is further used for:
If it is determined that it fails to match, then according to default feature extracting method, the target patent data to be sorted is obtained
Patent characteristic word, keyword of the patent characteristic word respectively with the link of default industrial chain is matched, obtain institute
State the corresponding relation of the link of target patent data to be sorted and industrial chain.
Preferably, according to default feature extracting method, the patent characteristic of the acquisition target patent data to be sorted
Word, keyword of the patent characteristic word respectively with the link of default industrial chain is matched, obtained described to be sorted
The corresponding relation of the link of target patent data and industrial chain, taxon 32 are specifically used for:
According to default feature extracting method, the patent characteristic word of the acquisition target patent data to be sorted;
Respectively by the patent characteristic word compared with the keyword of the link of default industrial chain, and statistics is special respectively
The keyword identical number of the link of sharp Feature Words and industrial chain, determine the link of the most industrial chain of same number;
According to the link of the most industrial chain of the same number, the target patent data to be sorted and industry are obtained
The corresponding relation of the link of chain.
Preferably, the training method of the disaggregated model is:
Collecting unit 33, the patent sample data in link for gathering each industrial chain respectively, and extract respectively each special
The characteristic of sharp sample data, using the characteristic as characteristic index;
Training unit 34, for the characteristic index according to each patent sample data, taking each characteristic index respectively
Value scope is divided, and the division according to the span of each characteristic index and default sorting algorithm, described in training
Disaggregated model, the probability that each characteristic index belongs to the link of each industrial chain is calculated, patent sample data is categorized into correspondingly
Maximum probability industrial chain link in.
Preferably, the characteristic index includes following a kind of or any combination:International Patent classificating number IPC classification, patent
Title, summary, patent characteristic word.
Preferably, before the span of each characteristic index is divided, further comprise:
Normalized unit 35, normalizes to the value of each characteristic index for respectively that identical is default to be taken
In the range of value.
In summary, in the embodiment of the present invention, target patent data to be sorted is obtained;Classification mould based on training in advance
Type, according to the probability subject of the target patent data to be sorted of acquisition and default matching condition, treated described in judgement
Whether the match is successful with the link of predefined industrial chain for the target patent data of classification;Wherein, the classification of the training in advance
Model, it is the patent sample data in the link based on default sorting algorithm and each industrial chain collected, is trained
Arrive;If it is determined that the match is successful, then the corresponding relation of the link of the target patent data to be sorted and industrial chain is obtained, and
According to the corresponding relation, the target patent data to be sorted is categorized into the link of corresponding industrial chain, so, carried
The probability subject of target patent data to be sorted is taken, based on disaggregated model, according to probability subject to target patent data
Classified, and then be directed to patent data, classified according to the link of industrial chain, meet user in the link of industrial chain
Patent search demand, user search for patent when, the patent data under each industrial chain link can be obtained, be easy to user to look into
See and distinguish, improve the accuracy rate of search, meet user's request.
Also, for classification failure, further according to default feature extracting method, extract target patent data to be sorted
Patent characteristic word, matched according to the keyword of the link of each industrial chain, mutually tied with Keywords matching using disaggregated model
The matching way of conjunction, improve the accuracy and efficiency of target patent data classification.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program
Product.Therefore, the present invention can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware
Apply the form of example.Moreover, the present invention can use the computer for wherein including computer usable program code in one or more
The computer program production that usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of product.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram
Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided
The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real
The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to
Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or
The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or
The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in individual square frame or multiple square frames.
Although preferred embodiments of the present invention have been described, but those skilled in the art once know basic creation
Property concept, then can make other change and modification to these embodiments.So appended claims be intended to be construed to include it is excellent
Select embodiment and fall into having altered and changing for the scope of the invention.
Obviously, those skilled in the art can carry out various changes and modification without departing from this hair to the embodiment of the present invention
The spirit and scope of bright embodiment.So, if these modifications and variations of the embodiment of the present invention belong to the claims in the present invention
And its within the scope of equivalent technologies, then the present invention is also intended to comprising including these changes and modification.
Claims (16)
- A kind of 1. data processing method, it is characterised in that including:Obtain target patent data to be sorted;Disaggregated model based on training in advance, according to the probability subject of the target patent data to be sorted of acquisition and in advance If matching condition, judging the target patent data to be sorted, whether the match is successful with the link of predefined industrial chain; Wherein, the disaggregated model of the training in advance, it is in the link based on default sorting algorithm and each industrial chain collected Patent sample data, is trained what is obtained;If it is determined that the match is successful, then the corresponding relation of the link of the target patent data to be sorted and industrial chain is obtained, and According to the corresponding relation, the target patent data to be sorted is categorized into the link of corresponding industrial chain.
- 2. the method as described in claim 1, it is characterised in that judge the target patent data to be sorted whether with making a reservation for The match is successful for the link of the industrial chain of justice, specifically includes:According to the probability subject of the target patent data to be sorted, the target patent data to be sorted is calculated respectively Belong to the posterior probability values of the link of industrial chain, and judge that the target patent data to be sorted belongs to the link of industrial chain Whether posterior probability values are more than predetermined threshold value;Or,Whether the number for the probability subject for judging to extract from the target patent data to be sorted is more than preset number.
- 3. method as claimed in claim 2, it is characterised in that the disaggregated model of the training in advance, including the first classification mould Type and the second disaggregated model;Wherein, the first disaggregated model represents the model by the training of predefined industrial chain, the second disaggregated model Represent the model trained by the link of predefined industrial chain;Disaggregated model based on training in advance, according to the probability subject of the target patent data to be sorted of acquisition and in advance If matching condition, judging the target patent data to be sorted, whether the match is successful with the link of predefined industrial chain, Specifically include:Obtain the probability subject of the target patent data to be sorted;Based on the first disaggregated model, according to the probability subject and default matching condition, the target to be sorted is judged Whether the match is successful with predefined industrial chain for patent data;It is determined that with industrial chain after the match is successful, based on the second disaggregated model, according to the probability subject and default matching Condition, judging the target patent data to be sorted, whether the match is successful with the link in the industrial chain that the match is successful.
- 4. method as claimed in claim 2, it is characterised in that further comprise:If it is determined that it fails to match, then according to default feature extracting method, the special of the target patent data to be sorted is obtained Sharp Feature Words, keyword of the patent characteristic word respectively with the link of default industrial chain is matched, treated described in acquisition The corresponding relation of the target patent data of classification and the link of industrial chain.
- 5. method as claimed in claim 4, it is characterised in that according to default feature extracting method, obtain described to be sorted Target patent data patent characteristic word, keyword of the patent characteristic word respectively with the link of default industrial chain is entered Row matching, obtains the corresponding relation of the link of the target patent data to be sorted and industrial chain, specifically includes:According to default feature extracting method, the patent characteristic word of the acquisition target patent data to be sorted;Respectively by the patent characteristic word compared with the keyword of the link of default industrial chain, and it is special to count patent respectively The keyword identical number of the link of word and industrial chain is levied, determines the link of the most industrial chain of same number;According to the link of the most industrial chain of the same number, the target patent data to be sorted and industrial chain are obtained The corresponding relation of link.
- 6. the method as described in claim any one of 1-5, it is characterised in that the training method of the disaggregated model is:The patent sample data in the link of each industrial chain is gathered respectively, and extracts the features of each patent sample data respectively Point, using the characteristic as characteristic index;According to the characteristic index of each patent sample data, the span of each characteristic index is divided respectively, and root Division and default sorting algorithm according to the span of each characteristic index, train the disaggregated model, calculate each Characteristic index belongs to the probability of the link of each industrial chain, and patent sample data is categorized into the industrial chain of corresponding maximum probability In link.
- 7. method as claimed in claim 6, it is characterised in that the characteristic index includes following a kind of or any combination:State Border Patent classificating number IPC classification, patent name, summary, patent characteristic word.
- 8. method as claimed in claim 6, it is characterised in that carry out the span of each characteristic index to divide it Before, further comprise:The value of each characteristic index is normalized in the default span of identical respectively.
- A kind of 9. data processing equipment, it is characterised in that including:Acquiring unit, for obtaining target patent data to be sorted;Matching unit, for the disaggregated model based on training in advance, according to the target patent data to be sorted of acquisition Probability subject and default matching condition, judge the target patent data to be sorted whether with predefined industrial chain The match is successful for link;Wherein, the disaggregated model of the training in advance, it is based on default sorting algorithm and each industry collected Patent sample data in the link of chain, is trained what is obtained;Taxon, for if it is determined that the match is successful, then obtaining the link of the target patent data to be sorted and industrial chain Corresponding relation, and according to the corresponding relation, by the target patent data to be sorted be categorized into corresponding to industrial chain In link.
- 10. device as claimed in claim 9, it is characterised in that judge the target patent data to be sorted whether with advance The match is successful for the link of the industrial chain of definition, and matching unit is specifically used for:According to the probability subject of the target patent data to be sorted, the target patent data to be sorted is calculated respectively Belong to the posterior probability values of the link of industrial chain, and judge that the target patent data to be sorted belongs to the link of industrial chain Whether posterior probability values are more than predetermined threshold value;Or,Whether the number for the probability subject for judging to extract from the target patent data to be sorted is more than preset number.
- 11. device as claimed in claim 10, it is characterised in that the disaggregated model of the training in advance, including the first classification Model and the second disaggregated model;Wherein, the first disaggregated model represents the model by the training of predefined industrial chain, the second classification mould Type represents the model trained by the link of predefined industrial chain;Disaggregated model based on training in advance, according to the probability subject of the target patent data to be sorted of acquisition and in advance If matching condition, judging the target patent data to be sorted, whether the match is successful with the link of predefined industrial chain, Matching unit is specifically used for:Obtain the probability subject of the target patent data to be sorted;Based on the first disaggregated model, according to the probability subject and default matching condition, the target to be sorted is judged Whether the match is successful with predefined industrial chain for patent data;It is determined that with industrial chain after the match is successful, based on the second disaggregated model, according to the probability subject and default matching Condition, judging the target patent data to be sorted, whether the match is successful with the link in the industrial chain that the match is successful.
- 12. device as claimed in claim 10, it is characterised in that taxon is further used for:If it is determined that it fails to match, then according to default feature extracting method, the special of the target patent data to be sorted is obtained Sharp Feature Words, keyword of the patent characteristic word respectively with the link of default industrial chain is matched, treated described in acquisition The corresponding relation of the target patent data of classification and the link of industrial chain.
- 13. device as claimed in claim 12, it is characterised in that according to default feature extracting method, treated described in acquisition point The patent characteristic word of the target patent data of class, by the patent characteristic word keyword with the link of default industrial chain respectively Matched, obtain the corresponding relation of the link of the target patent data to be sorted and industrial chain, taxon is specifically used In:According to default feature extracting method, the patent characteristic word of the acquisition target patent data to be sorted;Respectively by the patent characteristic word compared with the keyword of the link of default industrial chain, and it is special to count patent respectively The keyword identical number of the link of word and industrial chain is levied, determines the link of the most industrial chain of same number;According to the link of the most industrial chain of the same number, the target patent data to be sorted and industrial chain are obtained The corresponding relation of link.
- 14. the device as described in claim any one of 9-13, it is characterised in that the training method of the disaggregated model is:Collecting unit, the patent sample data in link for gathering each industrial chain respectively, and each patent sample is extracted respectively The characteristic of data, using the characteristic as characteristic index;Training unit, for the characteristic index according to each patent sample data, respectively by the span of each characteristic index Divided, and the division according to the span of each characteristic index and default sorting algorithm, train the classification mould Type, calculates the probability that each characteristic index belongs to the link of each industrial chain, and patent sample data is categorized into corresponding probability In the link of maximum industrial chain.
- 15. device as claimed in claim 14, it is characterised in that the characteristic index includes following a kind of or any combination: International Patent classificating number IPC classification, patent name, summary, patent characteristic word.
- 16. device as claimed in claim 14, it is characterised in that carry out the span of each characteristic index to divide it Before, further comprise:Normalized unit, for the value of each characteristic index to be normalized into the default span of identical respectively It is interior.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710433424.0A CN107368526A (en) | 2017-06-09 | 2017-06-09 | A kind of data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710433424.0A CN107368526A (en) | 2017-06-09 | 2017-06-09 | A kind of data processing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107368526A true CN107368526A (en) | 2017-11-21 |
Family
ID=60306149
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710433424.0A Pending CN107368526A (en) | 2017-06-09 | 2017-06-09 | A kind of data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107368526A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108536800A (en) * | 2018-04-03 | 2018-09-14 | 有米科技股份有限公司 | File classification method, system, computer equipment and storage medium |
CN109543042A (en) * | 2018-12-01 | 2019-03-29 | 南京鸿越科技有限公司 | Patent automatic classifying system |
CN110781955A (en) * | 2019-10-24 | 2020-02-11 | 中国银联股份有限公司 | Method and device for classifying label-free objects and detecting nested codes and computer-readable storage medium |
CN112785441A (en) * | 2020-04-20 | 2021-05-11 | 招商证券股份有限公司 | Data processing method and device, terminal equipment and storage medium |
CN116010600A (en) * | 2023-01-09 | 2023-04-25 | 北京天融信网络安全技术有限公司 | Log classification method, device, equipment and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020138320A1 (en) * | 2000-12-26 | 2002-09-26 | Appareon | System, method and article of manufacture for global, device-independent deployment of a supply chain management system |
CN103324628A (en) * | 2012-03-21 | 2013-09-25 | 腾讯科技(深圳)有限公司 | Industry classification method and system for text publishing |
CN105677907A (en) * | 2016-02-16 | 2016-06-15 | 大连理工大学 | Patent technology evolution analysis method and system |
-
2017
- 2017-06-09 CN CN201710433424.0A patent/CN107368526A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020138320A1 (en) * | 2000-12-26 | 2002-09-26 | Appareon | System, method and article of manufacture for global, device-independent deployment of a supply chain management system |
CN103324628A (en) * | 2012-03-21 | 2013-09-25 | 腾讯科技(深圳)有限公司 | Industry classification method and system for text publishing |
CN105677907A (en) * | 2016-02-16 | 2016-06-15 | 大连理工大学 | Patent technology evolution analysis method and system |
Non-Patent Citations (5)
Title |
---|
严泰来 等: "《遥感技术与农业应用》", 31 July 2008, 北京:中国农业大学出版社 * |
周亦鹏: "《软件人主题分析和信息检索技术》", 31 August 2012, 北京:北京邮电大学出版社 * |
周宁: "《信息组织 第2版》", 30 November 2004 * |
张端阳 等: "面向专利集成的专利技术相关性测度方法研究", 《情报杂志》 * |
韦鹏程 等: "《大数据巨量分析与机器学习的整合与开发》", 31 May 2017, 成都:电子科技大学出版社 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108536800A (en) * | 2018-04-03 | 2018-09-14 | 有米科技股份有限公司 | File classification method, system, computer equipment and storage medium |
CN109543042A (en) * | 2018-12-01 | 2019-03-29 | 南京鸿越科技有限公司 | Patent automatic classifying system |
CN110781955A (en) * | 2019-10-24 | 2020-02-11 | 中国银联股份有限公司 | Method and device for classifying label-free objects and detecting nested codes and computer-readable storage medium |
CN112785441A (en) * | 2020-04-20 | 2021-05-11 | 招商证券股份有限公司 | Data processing method and device, terminal equipment and storage medium |
CN112785441B (en) * | 2020-04-20 | 2023-12-05 | 招商证券股份有限公司 | Data processing method, device, terminal equipment and storage medium |
CN116010600A (en) * | 2023-01-09 | 2023-04-25 | 北京天融信网络安全技术有限公司 | Log classification method, device, equipment and medium |
CN116010600B (en) * | 2023-01-09 | 2023-09-26 | 北京天融信网络安全技术有限公司 | Log classification method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107368526A (en) | A kind of data processing method and device | |
CN109325691B (en) | Abnormal behavior analysis method, electronic device and computer program product | |
JP7090936B2 (en) | ESG-based corporate evaluation execution device and its operation method | |
CN107545038B (en) | Text classification method and equipment | |
CN105574544A (en) | Data processing method and device | |
CN107193915A (en) | A kind of company information sorting technique and device | |
CN113095927B (en) | Method and equipment for identifying suspected transactions of backwashing money | |
CN112733146B (en) | Penetration testing method, device and equipment based on machine learning and storage medium | |
CN113221960B (en) | Construction method and collection method of high-quality vulnerability data collection model | |
CN110110663A (en) | A kind of age recognition methods and system based on face character | |
CN109067800A (en) | A kind of cross-platform association detection method of firmware loophole | |
CN108229170A (en) | Utilize big data and the software analysis method and device of neural network | |
CN111047173B (en) | Community credibility evaluation method based on improved D-S evidence theory | |
CN106778851A (en) | Social networks forecasting system and its method based on Mobile Phone Forensics data | |
CN111510368A (en) | Family group identification method, device, equipment and computer readable storage medium | |
CN115563610A (en) | Method and device for training and identifying intrusion detection model | |
CN110807159B (en) | Data marking method and device, storage medium and electronic equipment | |
CN111325255B (en) | Specific crowd delineating method and device, electronic equipment and storage medium | |
CN110196911B (en) | Automatic classification management system for civil data | |
CN111753998A (en) | Model training method, device and equipment for multiple data sources and storage medium | |
CN108805152A (en) | A kind of scene classification method and device | |
CN104331507B (en) | Machine data classification is found automatically and the method and device of classification | |
CN110413307A (en) | Correlating method, device and the electronic equipment of code function | |
CN107291722B (en) | Descriptor classification method and device | |
CN113282686B (en) | Association rule determining method and device for unbalanced sample |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171121 |