CN107168992A - Article sorting technique and device, equipment and computer-readable recording medium based on artificial intelligence - Google Patents

Article sorting technique and device, equipment and computer-readable recording medium based on artificial intelligence Download PDF

Info

Publication number
CN107168992A
CN107168992A CN201710196073.6A CN201710196073A CN107168992A CN 107168992 A CN107168992 A CN 107168992A CN 201710196073 A CN201710196073 A CN 201710196073A CN 107168992 A CN107168992 A CN 107168992A
Authority
CN
China
Prior art keywords
level
article
target article
classification
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710196073.6A
Other languages
Chinese (zh)
Inventor
陈亮宇
肖欣延
吕亚娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201710196073.6A priority Critical patent/CN107168992A/en
Publication of CN107168992A publication Critical patent/CN107168992A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes

Abstract

The present invention provides a kind of article sorting technique and device, equipment and computer-readable recording medium based on artificial intelligence.Its method includes:By the text for obtaining target article;The word segmentation processing of at least two different participle granularities is carried out to the text of target article, the corresponding participle of each participle granularity is obtained;The marking label model of each level of training in advance according to the corresponding participle of each participle granularity of target article and in target classification system, prediction target article and the similarity of each subject categories in each level;According to the similarity and default similarity threshold of each subject categories on target article and each level, target article is classified in each level.Technical scheme, the accuracy not only classified to article is higher, and can automatically realize article is classified, time saving, laborsaving, and the efficiency classified to article is very high.

Description

Article sorting technique and device, equipment and computer-readable recording medium based on artificial intelligence
【Technical field】
The present invention relates to Computer Applied Technology field, more particularly to a kind of article sorting technique based on artificial intelligence and Device, equipment and computer-readable recording medium.
【Background technology】
Artificial intelligence (Artificial Intelligence;AI), it is research, develops for simulating, extending and extending people Intelligent theory, method, a new technological sciences of technology and application system.Artificial intelligence is one of computer science Branch, it attempts to understand the essence of intelligence, and produces a kind of new intelligence that can be made a response in the similar mode of human intelligence Energy machine, the research in the field includes robot, language identification, image recognition, natural language processing and expert system etc..
With the development of the network technology, the electronic multimedia on network, which is used, increasingly to be popularized.In order to effectively to network On various Domestic News effectively managed, Domestic News can be generally divided into different subject categories.
In the prior art by setting up the subject classification system of level, the subject categories to the article of Domestic News are carried out Management.Such as Domestic News are divided into " entertaining ", " physical culture ", " education " etc. subject categories.On this basis, may be used also Further to segment, such as " physical culture " can be divided into " football ", " basketball ", " shuttlecock " etc..So, article is being shown for user When can carry out classification displaying, user can also select oneself theme interested according to the subject categories in subject classification system Classification is read.It is in the prior art to adopt in order to effectively effectively be classified to the article of the Domestic News in network more The method that manually identifies is classified for article.For example specifically can be before article issue, by website staff's root According to the title of the article of Domestic News, subjectivity is classified to this article, and stamps the label of correspondence subject categories.
But, in the prior art by title of the staff according to the article of Domestic News, subjectivity is divided this article Class, not only wastes time and energy, and the accuracy classified to article is also excessively poor.
【The content of the invention】
The invention provides a kind of article sorting technique based on artificial intelligence and device, equipment and computer-readable recording medium, it is used for Improve the accuracy classified to article.
The present invention provides a kind of article sorting technique based on artificial intelligence, and methods described includes:
Obtain the text of target article;
The word segmentation processing of at least two different participle granularities is carried out to the text of the target article, each participle is obtained The corresponding participle of granularity;
According to the corresponding participle of each participle granularity of the target article and the training in advance in target classification system Each level marking label model, predict the target article and the similarity of each subject categories in each level;
According to the similarity and default similarity of each subject categories on the target article and each level Threshold value, classifies to the target article in each level.
Still optionally further, in method as described above, according to each master on the target article and each level The similarity and default similarity threshold of classification are inscribed, after classifying to the target article in each level, Methods described also includes:
Verify classification of the target article in each level.
Still optionally further, in method as described above, classification of the target article in each level, tool are verified Body includes following at least one:
Detect whether the classification of each level of the target article conflicts;If conflict, cancel the target article and exist The classification of downstream level;
If when being categorized as particular topic classification of the specific level of the target article, the spy in the target article is detected Whether the frequency of occurrences for determining keyword reaches predeterminated frequency threshold value, if not up to, cancelling the target article in the certain layer The classification of the particular topic classification of level;With
If when being categorized as particular topic classification of the specific level of the target article, detect in the target article whether There are particular keywords, if occurring, cancel the classification of the particular topic classification of the target article in the specific level.
Still optionally further, it is corresponding according to each participle granularity of the target article in method as described above Participle and in target classification system each level of training in advance marking label model, predict the target article with it is each described Before the similarity of each subject categories in level, methods described also includes:
Several training corpus are captured from each information website, each training corpus includes training article and the training article Former classification in the corresponding information website;
It is by former classification map of the training article in each training corpus in the corresponding information website Subject categories in the target classification system;
The word segmentation processing of at least two different participle granularities is carried out to the text of each training corpus, several positive examples are obtained Training data;
According to the training data of several positive examples, the training corpus in the training data of each positive example is constructed in each institute Multiple incoherent subject categories in level are stated, the training data of several negative examples is generated;
Using the training data and the training data of several negative examples of several positive examples, each the described of level is trained to beat Divide label model.
The present invention also provides a kind of article sorter based on artificial intelligence, and described device includes:
Acquisition module, the text for obtaining target article;
Word-dividing mode, the word segmentation processing of at least two different participle granularities is carried out for the text to the target article, Obtain the corresponding participle of each participle granularity;
Prediction module, for the corresponding participle of each participle granularity according to the target article and in target classification body The marking label model of each level of training in advance in system, predicts the target article and each theme class in each level Other similarity;
Sort module, for the similarity according to each subject categories in the target article and each level and Default similarity threshold, classifies to the target article in each level.
Still optionally further, in device as described above, in addition to:
Correction verification module, for verifying classification of the target article in each level.
Still optionally further, in device as described above,
The correction verification module, it is following at least one specifically for performing:
Detect whether the classification of each level of the target article conflicts;If conflict, cancel the target article and exist The classification of downstream level;
If when being categorized as particular topic classification of the specific level of the target article, the spy in the target article is detected Whether the frequency of occurrences for determining keyword reaches predeterminated frequency threshold value, if not up to, cancelling the target article in the certain layer The classification of the particular topic classification of level;With
If when being categorized as particular topic classification of the specific level of the target article, detect in the target article whether There are particular keywords, if occurring, cancel the classification of the particular topic classification of the target article in the specific level.
Still optionally further, in device as described above, in addition to:
Handling module, for capturing several training corpus from each information website, each training corpus includes training article With former classification of the training article in the corresponding information website;
Mapping block, for by the training article in each training corpus in the corresponding information website Former classification map is the subject categories in the target classification system,
Positive example generation module, the participle of at least two different participle granularities is carried out for the text to each training corpus Processing, obtains the training data of several positive examples;
Negative example generation module, for the training data according to several positive examples, constructs the training data of each positive example In multiple incoherent subject categories of the training corpus in each level, generate the training data of several negative examples;
Training module, for the training data and the training data of several negative examples using several positive examples, training The marking label model of each level.
The present invention also provides a kind of computer equipment, and the equipment includes:
One or more processors;
Memory, for storing one or more programs,
When one or more of programs are by one or more of computing devices so that one or more of processing Device realizes the article sorting technique as described above based on artificial intelligence.
The present invention also provides a kind of computer-readable medium, is stored thereon with computer program, the program is held by processor The article sorting technique as described above based on artificial intelligence is realized during row.
The article sorting technique based on artificial intelligence and device, equipment and computer-readable recording medium of the present invention, by obtaining target The text of article;The word segmentation processing of at least two different participle granularities is carried out to the text of target article, each participle granularity is obtained Corresponding participle;Each layer of training in advance according to the corresponding participle of each participle granularity of target article and in target classification system The marking label model of level, prediction target article and the similarity of each subject categories in each level;According to target article with The similarity of each subject categories in each level and default similarity threshold, are divided in each level target article Class.Technical scheme, the participle of at least two different participle granularities is carried out by obtaining the text of target article, can be with So that when prediction target article and the similarity of each subject categories in each level, the mesh of input into marking label model The information content for marking article is enriched very much, and target article and the phase of each subject categories in each level are gone out so as to Accurate Prediction Like degree;And then extremely accurate the target article can be classified in the level.Therefore, technical scheme, The accuracy not only classified to article is higher, and can automatically realize article is classified, time saving, laborsaving, right The efficiency of article classification is very high.
【Brief description of the drawings】
Fig. 1 is the flow chart of the article sorting technique embodiment based on artificial intelligence of the present invention.
Fig. 2 is the structure chart of the article sorter embodiment one based on artificial intelligence of the present invention.
Fig. 3 is the structure chart of the article sorter embodiment two based on artificial intelligence of the present invention.
Fig. 4 is the structure chart of the computer equipment embodiment of the present invention.
A kind of exemplary plot for computer equipment that Fig. 5 provides for the present invention.
【Embodiment】
In order that the object, technical solutions and advantages of the present invention are clearer, below in conjunction with the accompanying drawings with specific embodiment pair The present invention is described in detail.
Fig. 1 is the flow chart of the article sorting technique embodiment based on artificial intelligence of the present invention.As shown in figure 1, this reality The article sorting technique based on artificial intelligence of example is applied, specifically be may include steps of:
100th, the text of target article is obtained;
The executive agent of the article sorting technique based on artificial intelligence of the present invention is the article classification based on artificial intelligence Device, is somebody's turn to do the entity apparatus that the article sorter based on artificial intelligence can be an electronics, or use Integrated Simulation Device.
Target article in the present embodiment is the corresponding article of Domestic News on network, in order to be provided to the news on network News are effectively managed, and after the issue of each Domestic News, are required to the article of the Domestic News as target article, The corresponding article of the Domestic News is classified using the article sorting technique based on artificial intelligence of the present embodiment.
101st, the word segmentation processing of at least two different participle granularities is carried out to the text of target article, obtains each participle granularity Corresponding participle;
, it is necessary to carry out the word segmentation processing of at least two different participle granularities, example to the text of target article in the present embodiment Such as at least two different participle granularities can include the different participle granularity of basic granularity, mixed version granularity.In the present embodiment The word segmentation processing of at least two different participle granularities is carried out by the text to target article, to obtain the text of the target article The participle information of various granularities, more accurately to classify to the text of target article.
It should be noted that the text of target article, which not only includes title, also includes text.In the present embodiment, to target When the text of article carries out participle according to each participle granularity, mark that can be according to this kind of participle granularity respectively to target article Topic and text carry out participle, obtain under the participle granularity, the participle of the corresponding title of target article it is corresponding with the target article Text participle.
102nd, according to the corresponding participle of each participle granularity of target article and in target classification system training in advance it is each The marking label model of level, prediction target article and the similarity of each subject categories in each level;
The target classification system of the present embodiment is the foundation classified to target article in the present embodiment.Website is for pipe Article is managed, can pre-define the target classification system includes several layers of subject categories, and the theme included by each layer Classification.In the present embodiment, there is corresponding marking label model for each level also training in advance in target classification system, often There can be the one-dimensional of each subject categories of the level in target classification system in the marking label model of one level with training in advance Vector., can be with when the similarity of each subject categories of the marking label model on target article and each level is predicted The corresponding participle of each participle granularity is inputted to the marking label model, label model of then giving a mark just can be according to internal advance The one-dimensional vector of each subject categories of the level trained, predicts the target article and each master in the level respectively Inscribe the similarity of classification.Term vector can specifically be used by wherein inputting the corresponding participle of each participle granularity of the marking label model Form input, and each corresponding term vector of word can also be determined by training in advance.Such as identical semantic word, it is corresponding The similarity of term vector should be that comparison is high, if the similarity of term vector is not high, can adjust the numerical value in term vector so that The similarity of two term vectors changes towards elevated direction.Similarly, such as different semantic words, the similarity of corresponding term vector Should, than relatively low, if the similarity of term vector is higher, the numerical value of term vector can be adjusted so that two term vectors it is similar Degree changes towards the direction of reduction.By constantly training and adjusting, it may be determined that the vocabulary of each word.
, can be by each participle granularity and mesh in participle corresponding to each participle granularity of marking label model input Mark the participle point domain input of the title and text of article.For example for using two kinds of participle granularities to the title of target article and just When text carries out participle, the participle of correspondence input can be divided into following four domains:(the corresponding title participle of participle granularity 1), (participle The corresponding text participle of granularity 1), (the corresponding title participle of participle granularity 2) and (the corresponding text participle of participle granularity 2). Then marking label model comes in the Accurate Prediction target article and the level according to the participle of the various participle granularities of input The similarity of each subject categories.In practical application, when carrying out participle to target article, the species of the participle granularity of selection is got over Many, the participle of all size information content is abundanter, although calculating process can slightly complexity, but marking label model prediction should Target article and the similarity of each subject categories in the level are more accurate.
The marking label model of the present embodiment can use bag of words (Bag-of-words;BoW) model or convolutional Neural Network (Convolutional Neural Network;CNN) framework of model, which is trained, obtains.
103rd, according to the similarity and default similarity threshold of each subject categories on target article and each level, Target article is classified in each level.
For each level in target classification system, marking label model prediction is in target article and the level Each subject categories similarity.How many subject categories are the level include i.e. in target classification system, and the level is beaten Divide label model just can export how many similarities, be respectively that the target article is similar to each theme class in the level Degree.Then each similarity and the magnitude relationship of similarity threshold of the marking label model output of the level are may determine that, if Target article and the similarity of some subject categories of the level are more than or equal to default similarity threshold, now the target Article can be distributed into the subject categories in the level;If otherwise target article and the phase of some subject categories of the level It is less than default similarity threshold like degree, now the target article cannot divide into the subject categories in the level.This reality Apply in example, specifically can be by stamping the labels of subject categories for the target article, to identify the target article in the level Distribution is into the subject categories.
Using the technical scheme of the present embodiment, target article can be divided to multiple subject categories in target classification system In, it is also possible to it will not distribute into any subject categories.
And alternatively, all subject categories of all levels is default similar in the target classification system of the present embodiment Spending threshold value can be with identical;Can also each level all subject categories default similarity threshold it is identical, different levels The corresponding default similarity threshold of subject categories is differed;Or the default similarity threshold of each subject categories can also Differ.Or only more special subject categories can also be set with larger or less default similarity threshold, Other subject categories set the default similarity threshold of identical.For example, to the default similar of the subject categories of easy misclassification Degree threshold value is independently controlled, and increases default similarity threshold.Such as subject categories are that " making laughs " is relatively difficult to classification, then Only when label model of giving a mark predicts target article and the sufficiently high similarity threshold of the subject categories, the just meeting of target article It is assigned to this subject categories.
The article sorting technique based on artificial intelligence of the present embodiment, by the text for obtaining target article;To target text The text of chapter carries out the word segmentation processing of at least two different participle granularities, obtains the corresponding participle of each participle granularity;According to target The corresponding participle of each participle granularity of article and in target classification system each level of training in advance marking label model, in advance Survey target article and the similarity of each subject categories in each level;According to each theme class on target article and each level Other similarity and default similarity threshold, classify to target article in each level.The technical side of the present embodiment Case, the participle of at least two different participle granularities is carried out by obtaining the text of target article, can to predict target article During with the similarities of each subject categories in each level, the information content of the target article of the input into marking label model is non- It is often abundant, go out target article and the similarity of each subject categories in each level so as to Accurate Prediction;And then can be non- Often the target article is classified in the level exactly.Therefore, the technical scheme of the present embodiment, is not only carried out to article The accuracy of classification is higher, and can automatically realize article is classified, time saving, laborsaving, the efficiency classified to article It is very high.
Still optionally further, on the basis of the technical scheme of above-described embodiment, step 103 is " according to target article and respectively The similarity of each subject categories in level and default similarity threshold, are divided in each level target article After class ", methods described also includes:Classification of the verification object article in each level.
, can also further verification object article after classifying to target article on each layer in the present embodiment Classification in each level, further to improve the accuracy classified to target article.
For example, classification of the verification object article in each level, can specifically include following at least one mode:
(a1) whether the classification of each level of detection target article conflicts;If conflict, cancel target article in downstream level Classification;Otherwise, if not conflicting, any operation wouldn't be performed.
The article sorter based on artificial intelligence of the present embodiment, can further detect target article in each level The correlation for corresponding subject categories of classifying, if the subject categories of two rank are completely uncorrelated up and down, it is believed that two layers up and down Subject categories mutually conflict.The classification of the subject categories of upstream level can now be retained, and cancel the target article in downstream The classification of the subject categories of level.
If for example, certain target article has stamped the label of " amusement " in first-level class, that is, representing to be allocated in first layer To subject categories in the classification of " amusement ";And the label of " basketball " has been stamped in the secondary classification being somebody's turn to do under " amusement ", Represent to be allocated to subject categories as in the classification of " basketball " in the second layer;At this point it is possible to retain the theme in first-level class Classification is the classification of " amusement ", and cancels the classification that subject categories are " basketball " in the secondary classification under first-level class.
(a2) if when being categorized as particular topic classification of the specific level of target article, the specific pass in detection target article Whether the frequency of occurrences of keyword reaches predeterminated frequency threshold value, if not up to, cancelling target article in the described specific of specific level The classification of subject categories;If reaching, any operation wouldn't be performed;With
(a3) if when being categorized as particular topic classification of the specific level of target article, whether occur in detection target article Particular keywords, if occurring, cancel classification of the target article in the particular topic classification of specific level;Otherwise, it wouldn't perform and appoint What is operated.
The present embodiment can also be directed to classification of the requirement of some particular subject classifications to target article and verify.Example Such as, it can be matched using regular expression, meet a certain requirement and just belong to some subject categories, or meet a certain requirement Just necessarily it is not belonging to some subject categories.For example, according to above-mentioned steps (a2) verification mode, if the subject categories of secondary classification For the condition of " film ", can require to occur in target article keyword " film " more than predeterminated frequency threshold value time must occurs Number, more than secondary.So, the subject categories of all secondary classifications can be detected for the target article of " film ", if The frequency of occurrences of the particular keywords " film " is not up to predeterminated frequency threshold value in target article, now can directly cancel the mesh Mark the classification that subject categories of the article in secondary classification are " film ".
For another example according to above-mentioned steps (a3) verification mode, if the subject categories of the secondary classification of target article are " bear During cat ", whether there are particular keywords " live " in detection target article, if occurring, cancel target article in secondary classification Subject categories be " panda " classification.
In practical application, the mode of classification of above-mentioned (a1), (a2) and (a3) the three kinds of verification object articles in each level, It can work in coordination to use.Moreover, by above-mentioned verification, can further improve the accuracy of target article classification.
Still optionally further, on the basis of the technical scheme of above-described embodiment, in step 102 " according to target article The corresponding participle of each participle granularity and in target classification system each level of training in advance marking label model, obtain target Before article and the similarity of each subject categories in each level ", it can also comprise the following steps:
(b1) several training corpus are captured from each information website, each training corpus includes training article and training article right Former classification in the information website answered;
In the present embodiment, the acquisition of training corpus can specifically be captured from the information such as each portal website website, the instruction of crawl The title and text of article of Domestic News can be included by practicing language material, it may also be necessary to capture the unified resource positioning of this article Accord with (Uniform Resource Locator;URL), in order to can be cleaned according to URL to the information of crawl.In addition, Classification of this article in the taxonomic hierarchies of the information website is also identified with each article in each information website, because This, it is referred to herein as former when capturing this article as training corpus, in addition it is also necessary to capture classification of this article in the information website Classification.If for example, when capturing educational news article now from Sina News as training corpus, this article of crawl Original is categorized as " educating ".
(b2) it is target classification by former classification map of the training article in corresponding information website in each training corpus Subject categories in system;
Due to the taxonomic hierarchies of different information websites the division of subject categories and the definition of classification and differ, be Training corpus can be trained in the target classification system of the present embodiment, therefore, it is necessary to by each instruction in the present embodiment It is the classification in target classification system to practice former classification map of the training article in language material in corresponding information website.In mapping When, ensure that the title of former classification is identical with the title of the subject categories in target classification system as far as possible.If target classification system In be not present with former specific name identical subject name, now can also be to each in former classification and target classification system Individual subject categories carry out semantic analysis, obtain and former Classification Semantics identical subject categories, by former classification map in complex The lower and former Classification Semantics identical subject categories.Can also be according to each theme class in former classification and target classification system Not included scope, by a certain subject categories under former classification map to target classification system.For example, target classification body System is lower to set category to include humane class, without including history class, then can be mapped to the information of history class in source Think in this subject categories of class.
Still optionally further, after above-mentioned steps (b1), before step (b2), some can be assessed with artificial sample Source is accurate in each classification, for example, select 10 articles from certain information website manually, finds the classification logotype of 8 It is all inaccurate.Now all training corpus resources from the crawl of the information website can be given up according to URL.
Still optionally further, in the present embodiment, each training corpus can also be cleaned.For example internally hold or title The training corpus inferior of missing, and because the different wrong classified resources caused of definition are filtered.Such as it is categorized as in original House property, but theme and content are all the training corpus of " slum-dweller transformation ";Do map when, the training corpus is mapped to target Subject categories under taxonomic hierarchies, can be to keyword now when being cleaned to training corpus in the classification of " house property " The information of " slum-dweller transformation " is filtered.
(b3) word segmentation processing of at least two different participle granularities is carried out to the text of each training corpus, obtains several positive examples Training data;
For example, at least two different participle granularities can be specifically carried out to each training corpus in the way of step 101 Word segmentation processing, obtains the training data of several positive examples.The training data of positive example is correct training data version.Each training The participle of the corresponding at least two participles granularity of training corpus can be included in data, the training data is in the target classification system Middle subject categories and the training data are positive example.
(b4) according to the training data of several positive examples, the training corpus in the training data of each positive example is constructed in each level Multiple incoherent subject categories, generate the training data of several negative examples;
When training marking label model, not only need positive example, in addition it is also necessary to negative example, therefore, also needed in the present embodiment Multiple incoherent theme class of the training corpus in each level in each training data are constructed according to the training data of each positive example Not, so as to generate the training data of negative example;The training data of negative example is the training data version of mistake.For each positive example Training data, the training data of three or four negative examples of correspondence can be generated, particular number can come according to the actual requirements Set.The participle of the corresponding at least two participles granularity of training corpus, the training number can similarly be included in each training data It is negative example according to the subject categories and the training data constructed in the target classification system.
(b4) using the training data and the training data of several negative examples of several positive examples, training marking label model.
All subject categories that the level is provided with the marking label model of each level of the present embodiment are corresponding one-dimensional Vector.Before training, the one-dimensional vector for all subject categories of the level can preset initial value at random.Then start During training, a training data first is inputted to marking label model, during input, the training corpus in training data is corresponding at least The participle of two kinds of participle granularities equally can a point domain input;And during input, specifically each participle can be represented using term vector, The representation of term vector may be referred to the record of above-mentioned related embodiment.Then marking label model is according to the training of the input The information of data, predicts the training data and the similarity of each subject categories of the level.If for example the training data is During positive example, it can be determined that whether the similarity of subject categories of the training data with being determined in positive example reaches default similarity threshold Value, if not up to, adjusting the parameter of the corresponding one-dimensional vector of the subject categories and label model of giving a mark so that the instruction of output Practice the direction change that the similarity direction of subject categories of the data with being determined in positive example increases;If the training data is negative example, It may determine that whether the training data and the similarity of the subject categories determined in negative example are less than default similarity threshold, if not small In, adjust the corresponding one-dimensional vector of the subject categories and give a mark label model parameter so that the training data of output with The similarity of the subject categories determined in auxiliary example is towards less direction change;, can be with by the training without several training datas The marking label model Accurate Prediction of training is enabled to go out target article and the similarity of each subject categories in the level. Now the one-dimensional vector of each subject categories of the parameter of marking label model and the level therein is determined, corresponding marking label Model is determined.
The article sorting technique based on artificial intelligence of above-described embodiment, is carried out at least by the text for obtaining target article The participle of two kinds of different participle granularities, can to predict target article and the similarity of each subject categories in each level When, the information content of the target article of the input into marking label model is enriched very much, and target text is gone out so as to Accurate Prediction Chapter and the similarity of each subject categories in each level;And then can be extremely accurate enterprising in the level to the target article Row classification.Therefore, the technical scheme of above-described embodiment, the accuracy not only classified to article is higher, and can be automatic Ground is realized classifies to article, and time saving, laborsaving, the efficiency classified to article is very high.
Fig. 2 is the structure chart of the article sorter embodiment one based on artificial intelligence of the present invention.As shown in Fig. 2 this The article sorter based on artificial intelligence of embodiment, can specifically include:Acquisition module 10, word-dividing mode 11 and prediction mould Block 12 and sort module 13.
Wherein acquisition module 10 is used for the text for obtaining target article;Word-dividing mode 11 is used for what acquisition module 10 was obtained The text of target article carries out the word segmentation processing of at least two different participle granularities, obtains the corresponding participle of each participle granularity;In advance Surveying module 12 is used for according to the corresponding participle of each participle granularity of the target article of the acquisition of word-dividing mode 11 and in target classification body The marking label model of each level of training in advance in system, prediction target article is similar to each subject categories in each level Degree;Sort module 13 is used for the target article predicted according to prediction module 12 and the similarity of each subject categories in each level And default similarity threshold, target article is classified in each level.
The article sorter based on artificial intelligence of the present embodiment, the reality of article classification is realized by using above-mentioned module Existing principle and technique effect are identical with realizing for above-mentioned related method embodiment, and above-mentioned correlation technique is may be referred in detail and is implemented The record of example, will not be repeated here.
Fig. 3 is the structure chart of the article sorter embodiment two based on artificial intelligence of the present invention.As shown in figure 3, this The article sorter based on artificial intelligence of embodiment, on the basis of the technical scheme of above-mentioned embodiment illustrated in fig. 2, enters one Following technical scheme can also be included.
As shown in figure 3, the article sorter based on artificial intelligence of the present embodiment also includes:Correction verification module 14.
The correction verification module 14 is used to verifying the classification of target article that sort module 13 obtains in each level.
Still optionally further, in the article sorter based on artificial intelligence of the present embodiment, the correction verification module 14 is specific For performing following at least one:
Detect whether the classification of each level of target article conflicts;If conflict, cancel point of the target article in downstream level Class;
If when being categorized as particular topic classification of the specific level of target article, the particular keywords in detection target article The frequency of occurrences whether reach predeterminated frequency threshold value, if not up to, cancel target article specific level particular topic classification Classification;With
If when being categorized as particular topic classification of the specific level of target article, whether occur in detection target article specific Keyword, if occurring, cancels classification of the target article in the particular topic classification of the specific level.
Still optionally further, in the article sorter based on artificial intelligence of the present embodiment, in addition to:
Handling module 15 is used to capture several training corpus from each information website, and each training corpus includes training article and instruction Practice former classification of the article in corresponding information website;
The training article that mapping block 16 is used in each training corpus for capturing handling module 15 is in corresponding information network The former classification map stood is the subject categories in target classification system,
Positive example generation module 17 is used to carry out at least two not to the text of each training corpus after the processing of mapping block 16 With the word segmentation processing of participle granularity, the training data of several positive examples is obtained;
Negative example generation module 18 is used for the training data of the several positive examples obtained according to positive example generation module 17, and construction is each just Multiple incoherent subject categories of the training corpus in each level in the training data of example, generate the training number of several negative examples According to;
Training module 19 is used for training data and negative example generation mould using several positive examples of the generation of positive example generation module 17 The training data for several negative examples that block 18 is generated, trains the marking label model of each level.
Accordingly, prediction module 12 is used for corresponding point of each participle granularity of the target article obtained according to word-dividing mode 11 Word and in target classification system each level of the training in advance of training module 19 marking label model, prediction target article with it is each The similarity of each subject categories in level.
The article sorter based on artificial intelligence of the present embodiment, the reality of article classification is realized by using above-mentioned module Existing principle and technique effect are identical with realizing for above-mentioned related method embodiment, and above-mentioned correlation technique is may be referred in detail and is implemented The record of example, will not be repeated here.
Fig. 4 is the structure chart of the computer equipment embodiment of the present invention.As shown in figure 4, the computer equipment of the present embodiment, Including:One or more processors 30, and memory 40, memory 40 are used to store one or more programs, work as memory The one or more programs stored in 40 are performed by one or more processors 30 so that one or more processors 30 are realized such as The article sorting technique based on artificial intelligence of upper embodiment.In embodiment illustrated in fig. 4 exemplified by including multiple processors 30.
For example, a kind of exemplary plot for computer equipment that Fig. 5 provides for the present invention.Fig. 5 is shown suitable for being used for realizing this The exemplary computer device 12a of invention embodiment block diagram.The computer equipment 12a that Fig. 5 is shown is only an example, Any limitation should not be carried out to the function of the embodiment of the present invention and using range band.
As shown in figure 5, computer equipment 12a is showed in the form of universal computing device.Computer equipment 12a component can To include but is not limited to:One or more processor 16a, system storage 28a, connection different system component (including system Memory 28a and processor 16a) bus 18a.
Bus 18a represents the one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.Lift For example, these architectures include but is not limited to industry standard architecture (ISA) bus, MCA (MAC) Bus, enhanced isa bus, VESA's (VESA) local bus and periphery component interconnection (PCI) bus.
Computer equipment 12a typically comprises various computing systems computer-readable recording medium.These media can be it is any can The usable medium accessed by computer equipment 12a, including volatibility and non-volatile media, moveable and immovable Jie Matter.
System storage 28a can include the computer system readable media of form of volatile memory, for example, deposit at random Access to memory (RAM) 30a and/or cache memory 32a.Computer equipment 12a may further include it is other it is removable/ Immovable, volatile/non-volatile computer system storage medium.Only as an example, storage system 34a can be used for reading Write immovable, non-volatile magnetic media (Fig. 5 is not shown, is commonly referred to as " hard disk drive ").Although not shown in Fig. 5, It can provide for the disc driver to may move non-volatile magnetic disk (such as " floppy disk ") read-write, and to removable non-easy The CD drive of the property lost CD (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, each Driver can be connected by one or more data media interfaces with bus 18a.System storage 28a can be included at least One program product, the program product has one group of (for example, at least one) program module, and these program modules are configured to hold The function of row above-mentioned each embodiments of Fig. 1-Fig. 3 of the invention.
Program with one group of (at least one) program module 42a/utility 40a, can be stored in such as system and deposit In reservoir 28a, such program module 42a include --- but being not limited to --- operating system, one or more application program, The reality of network environment is potentially included in each or certain combination in other program modules and routine data, these examples It is existing.Program module 42a generally performs the function and/or method in above-mentioned each embodiments of Fig. 1-Fig. 3 described in the invention.
Computer equipment 12a can also be with one or more external equipment 14a (such as keyboard, sensing equipment, display 24a etc.) communication, the equipment communication interacted with computer equipment 12a can be also enabled a user to one or more, and/or With any equipment (such as network interface card, tune for enabling computer equipment 12a to be communicated with one or more of the other computing device Modulator-demodulator etc.) communication.This communication can be carried out by input/output (I/O) interface 22a.Also, computer equipment 12a can also by network adapter 20a and one or more network (such as LAN (LAN), wide area network (WAN) and/or Public network, such as internet) communication.As illustrated, network adapter 20a by bus 18a and computer equipment 12a its Its module communicates.It should be understood that although not shown in the drawings, can combine computer equipment 12a uses other hardware and/or software Module, includes but is not limited to:Microcode, device driver, redundant processor, external disk drive array, RAID system, tape Driver and data backup storage system etc..
Processor 16a is stored in program in system storage 28a by operation, thus perform various function application and Data processing, for example, realize the article sorting technique based on artificial intelligence shown in above-described embodiment.
The present invention also provides a kind of computer-readable medium, is stored thereon with computer program, the program is held by processor The article sorting technique based on artificial intelligence as shown in above-mentioned embodiment is realized during row.
The computer-readable medium of the present embodiment can be included in the system storage 28a in above-mentioned embodiment illustrated in fig. 5 RAM30a, and/or cache memory 32a, and/or storage system 34a.
With the development of science and technology, the route of transmission of computer program is no longer limited by tangible medium, can also be directly from net Network is downloaded, or is obtained using other modes.Therefore, the computer-readable medium in the present embodiment can not only include tangible Medium, can also include invisible medium.
The computer-readable medium of the present embodiment can use any combination of one or more computer-readable media. Computer-readable medium can be computer-readable signal media or computer-readable recording medium.Computer-readable storage medium Matter for example may be-but not limited to-system, device or the device of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or Combination more than person is any.The more specifically example (non exhaustive list) of computer-readable recording medium includes:With one Or the electrical connections of multiple wires, portable computer diskette, hard disk, random access memory (RAM), read-only storage (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light Memory device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable recording medium can Be it is any include or storage program tangible medium, the program can be commanded execution system, device or device use or Person is in connection.
Computer-readable signal media can be included in a base band or as the data-signal of carrier wave part propagation, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium beyond computer-readable recording medium, the computer-readable medium can send, propagate or Transmit for being used or program in connection by instruction execution system, device or device.
The program code included on computer-readable medium can be transmitted with any appropriate medium, including --- but do not limit In --- wireless, electric wire, optical cable, RF etc., or above-mentioned any appropriate combination.
It can be write with one or more programming languages or its combination for performing the computer that the present invention is operated Program code, described program design language includes object oriented program language-such as Java, Smalltalk, C++, Also include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with Fully perform, partly perform on the user computer on the user computer, as independent software kit execution, a portion Divide part execution or the execution completely on remote computer or server on the remote computer on the user computer. Be related in the situation of remote computer, remote computer can be by the network of any kind --- including LAN (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (is for example carried using Internet service Come for business by Internet connection).
In several embodiments provided by the present invention, it should be understood that disclosed system, apparatus and method can be with Realize by another way.For example, device embodiment described above is only schematical, for example, the unit Divide, only a kind of division of logic function there can be other dividing mode when actually realizing.
The unit illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
In addition, each functional unit in each embodiment of the invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list Member can both be realized in the form of hardware, it would however also be possible to employ hardware adds the form of SFU software functional unit to realize.
The above-mentioned integrated unit realized in the form of SFU software functional unit, can be stored in an embodied on computer readable and deposit In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are to cause a computer Equipment (can be personal computer, server, or network equipment etc.) or processor (processor) perform the present invention each The part steps of embodiment methods described.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (Read- Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD etc. it is various Can be with the medium of store program codes.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention God is with principle, and any modification, equivalent substitution and improvements done etc. should be included within the scope of protection of the invention.

Claims (10)

1. a kind of article sorting technique based on artificial intelligence, it is characterised in that methods described includes:
Obtain the text of target article;
The word segmentation processing of at least two different participle granularities is carried out to the text of the target article, each participle granularity is obtained Corresponding participle;
Training in advance is each according to the corresponding participle of each participle granularity of the target article and in target classification system The marking label model of level, predicts the target article and the similarity of each subject categories in each level;
According to the similarity and default similarity threshold of each subject categories on the target article and each level, The target article is classified in each level.
2. according to the method described in claim 1, it is characterised in that according to each on the target article and each level The similarity of subject categories and default similarity threshold, classification is carried out to the target article in each level Afterwards, methods described also includes:
Verify classification of the target article in each level.
3. method according to claim 2, it is characterised in that point of the verification target article in each level Class, specifically includes following at least one:
Detect whether the classification of each level of the target article conflicts;If conflict, cancel the target article in downstream The classification of level;
If when being categorized as particular topic classification of the specific level of the target article, the specific pass in the target article is detected Whether the frequency of occurrences of keyword reaches predeterminated frequency threshold value, if not up to, cancelling the target article in the specific level The classification of the particular topic classification;With
If when being categorized as particular topic classification of the specific level of the target article, detect whether occur in the target article Particular keywords, if occurring, cancel the classification of the particular topic classification of the target article in the specific level.
4. according to any described methods of claim 1-3, it is characterised in that according to each participle grain of the target article Spend corresponding participle and in target classification system each level of training in advance marking label model, predict the target article Before the similarity of each subject categories in each level, methods described also includes:
Several training corpus are captured from each information website, each training corpus includes training article and the training article right Former classification in the information website answered;
It is described by former classification map of the training article in each training corpus in the corresponding information website Subject categories in target classification system;
The word segmentation processing of at least two different participle granularities is carried out to the text of each training corpus, the instruction of several positive examples is obtained Practice data;
According to the training data of several positive examples, the training corpus in the training data of each positive example is constructed in each layer Multiple incoherent subject categories in level, generate the training data of several negative examples;
Using the training data and the training data of several negative examples of several positive examples, the marking mark of each level is trained Sign model.
5. a kind of article sorter based on artificial intelligence, it is characterised in that described device includes:
Acquisition module, the text for obtaining target article;
Word-dividing mode, the word segmentation processing of at least two different participle granularities is carried out for the text to the target article, is obtained The corresponding participle of each participle granularity;
Prediction module, for the corresponding participle of each participle granularity according to the target article and in target classification system The marking label model of each level of training in advance, predicts the target article and each subject categories in each level Similarity;
Sort module, for the similarity according to each subject categories in the target article and each level and default Similarity threshold, the target article is classified in each level.
6. device according to claim 5, it is characterised in that described device also includes:
Correction verification module, for verifying classification of the target article in each level.
7. device according to claim 6, it is characterised in that the correction verification module, specifically for performing following at least one Kind:
Detect whether the classification of each level of the target article conflicts;If conflict, cancel the target article in downstream The classification of level;
If when being categorized as particular topic classification of the specific level of the target article, the specific pass in the target article is detected Whether the frequency of occurrences of keyword reaches predeterminated frequency threshold value, if not up to, cancelling the target article in the specific level The classification of the particular topic classification;With
If when being categorized as particular topic classification of the specific level of the target article, detect whether occur in the target article Particular keywords, if occurring, cancel the classification of the particular topic classification of the target article in the specific level.
8. according to any described devices of claim 5-7, it is characterised in that described device also includes:
Handling module, for capturing several training corpus from each information website, each training corpus includes training article and institute State former classification of the training article in the corresponding information website;
Mapping block, for the original point by the training article in each training corpus in the corresponding information website Class is mapped as the subject categories in the target classification system,
At positive example generation module, the participle that at least two different participle granularities are carried out for the text to each training corpus Reason, obtains the training data of several positive examples;
Negative example generation module, for the training data according to several positive examples, in the training data for constructing each positive example Multiple incoherent subject categories of the training corpus in each level, generate the training data of several negative examples;
Training module, for the training data and the training data of several negative examples using several positive examples, trains each layer The marking label model of level.
9. a kind of computer equipment, it is characterised in that the equipment includes:
One or more processors;
Memory, for storing one or more programs,
When one or more of programs are by one or more of computing devices so that one or more of processors are real The existing method as described in any in claim 1-4.
10. a kind of computer-readable medium, is stored thereon with computer program, it is characterised in that the program is executed by processor Methods of the Shi Shixian as described in any in claim 1-4.
CN201710196073.6A 2017-03-29 2017-03-29 Article sorting technique and device, equipment and computer-readable recording medium based on artificial intelligence Pending CN107168992A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710196073.6A CN107168992A (en) 2017-03-29 2017-03-29 Article sorting technique and device, equipment and computer-readable recording medium based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710196073.6A CN107168992A (en) 2017-03-29 2017-03-29 Article sorting technique and device, equipment and computer-readable recording medium based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN107168992A true CN107168992A (en) 2017-09-15

Family

ID=59849772

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710196073.6A Pending CN107168992A (en) 2017-03-29 2017-03-29 Article sorting technique and device, equipment and computer-readable recording medium based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN107168992A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944032A (en) * 2017-12-13 2018-04-20 北京百度网讯科技有限公司 Method and apparatus for generating information
CN108090043A (en) * 2017-11-30 2018-05-29 北京百度网讯科技有限公司 Error correction report processing method, device and readable medium based on artificial intelligence
CN108304379A (en) * 2018-01-15 2018-07-20 腾讯科技(深圳)有限公司 A kind of article recognition methods, device and storage medium
CN108345698A (en) * 2018-03-22 2018-07-31 北京百度网讯科技有限公司 Article focus method for digging and device
CN108897871A (en) * 2018-06-29 2018-11-27 北京百度网讯科技有限公司 Document recommendation method, device, equipment and computer-readable medium
CN108932228A (en) * 2018-06-06 2018-12-04 武汉斗鱼网络科技有限公司 INDUSTRY OVERVIEW and subregion matching process, device, server and storage medium is broadcast live
CN109635260A (en) * 2018-11-09 2019-04-16 北京百度网讯科技有限公司 For generating the method, apparatus, equipment and storage medium of article template
CN109840321A (en) * 2017-11-29 2019-06-04 腾讯科技(深圳)有限公司 Text recommended method, device and electronic equipment
CN110633365A (en) * 2019-07-25 2019-12-31 北京国信利斯特科技有限公司 Word vector-based hierarchical multi-label text classification method and system
CN110941961A (en) * 2019-11-29 2020-03-31 秒针信息技术有限公司 Information clustering method and device, electronic equipment and storage medium
CN111191025A (en) * 2018-11-15 2020-05-22 腾讯科技(北京)有限公司 Method and device for determining article relevance, readable medium and electronic equipment
CN111198957A (en) * 2020-01-02 2020-05-26 北京字节跳动网络技术有限公司 Push method and device, electronic equipment and storage medium
CN111353019A (en) * 2020-02-25 2020-06-30 上海昌投网络科技有限公司 WeChat public number topic classification method and device
CN111428486A (en) * 2019-01-08 2020-07-17 北京沃东天骏信息技术有限公司 Article information data processing method, apparatus, medium, and electronic device
WO2020232898A1 (en) * 2019-05-23 2020-11-26 平安科技(深圳)有限公司 Text classification method and apparatus, electronic device and computer non-volatile readable storage medium
CN112711940A (en) * 2019-10-08 2021-04-27 台达电子工业股份有限公司 Information processing system, information processing method, and non-transitory computer-readable recording medium
CN112800083A (en) * 2021-02-24 2021-05-14 山东省建设发展研究院 Government decision-oriented government affair big data analysis method and equipment
CN112883159A (en) * 2021-02-25 2021-06-01 北京精准沟通传媒科技股份有限公司 Method, medium, and electronic device for generating hierarchical category label for domain evaluation short text
CN114417808A (en) * 2022-02-25 2022-04-29 北京百度网讯科技有限公司 Article generation method and device, electronic equipment and storage medium
CN115048525A (en) * 2022-08-15 2022-09-13 有米科技股份有限公司 Method and device for text classification and text classification model training based on multi-tuple
CN115577106A (en) * 2022-10-14 2023-01-06 北京百度网讯科技有限公司 Text classification method, device, equipment and medium based on artificial intelligence
CN117172245A (en) * 2023-05-26 2023-12-05 国家计算机网络与信息安全管理中心 Control method and control system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102479191A (en) * 2010-11-22 2012-05-30 阿里巴巴集团控股有限公司 Method and device for providing multi-granularity word segmentation result
CN106156204A (en) * 2015-04-23 2016-11-23 深圳市腾讯计算机系统有限公司 The extracting method of text label and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102479191A (en) * 2010-11-22 2012-05-30 阿里巴巴集团控股有限公司 Method and device for providing multi-granularity word segmentation result
CN106156204A (en) * 2015-04-23 2016-11-23 深圳市腾讯计算机系统有限公司 The extracting method of text label and device

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840321A (en) * 2017-11-29 2019-06-04 腾讯科技(深圳)有限公司 Text recommended method, device and electronic equipment
CN108090043A (en) * 2017-11-30 2018-05-29 北京百度网讯科技有限公司 Error correction report processing method, device and readable medium based on artificial intelligence
CN107944032A (en) * 2017-12-13 2018-04-20 北京百度网讯科技有限公司 Method and apparatus for generating information
CN107944032B (en) * 2017-12-13 2021-12-31 北京百度网讯科技有限公司 Method and apparatus for generating information
CN108304379A (en) * 2018-01-15 2018-07-20 腾讯科技(深圳)有限公司 A kind of article recognition methods, device and storage medium
CN108304379B (en) * 2018-01-15 2020-12-01 腾讯科技(深圳)有限公司 Article identification method and device and storage medium
CN108345698A (en) * 2018-03-22 2018-07-31 北京百度网讯科技有限公司 Article focus method for digging and device
CN108345698B (en) * 2018-03-22 2022-03-11 北京百度网讯科技有限公司 Method and device for mining attention points of articles
CN108932228A (en) * 2018-06-06 2018-12-04 武汉斗鱼网络科技有限公司 INDUSTRY OVERVIEW and subregion matching process, device, server and storage medium is broadcast live
CN108932228B (en) * 2018-06-06 2023-08-08 广东南方报业移动媒体有限公司 Live broadcast industry news and partition matching method and device, server and storage medium
CN108897871B (en) * 2018-06-29 2020-10-30 北京百度网讯科技有限公司 Document recommendation method, device, equipment and computer readable medium
CN108897871A (en) * 2018-06-29 2018-11-27 北京百度网讯科技有限公司 Document recommendation method, device, equipment and computer-readable medium
CN109635260B (en) * 2018-11-09 2022-07-12 北京百度网讯科技有限公司 Method, device, equipment and storage medium for generating article template
CN109635260A (en) * 2018-11-09 2019-04-16 北京百度网讯科技有限公司 For generating the method, apparatus, equipment and storage medium of article template
CN111191025B (en) * 2018-11-15 2023-12-12 深圳市雅阅科技有限公司 Method and device for determining article relevance, readable medium and electronic equipment
CN111191025A (en) * 2018-11-15 2020-05-22 腾讯科技(北京)有限公司 Method and device for determining article relevance, readable medium and electronic equipment
CN111428486A (en) * 2019-01-08 2020-07-17 北京沃东天骏信息技术有限公司 Article information data processing method, apparatus, medium, and electronic device
WO2020232898A1 (en) * 2019-05-23 2020-11-26 平安科技(深圳)有限公司 Text classification method and apparatus, electronic device and computer non-volatile readable storage medium
CN110633365A (en) * 2019-07-25 2019-12-31 北京国信利斯特科技有限公司 Word vector-based hierarchical multi-label text classification method and system
CN112711940A (en) * 2019-10-08 2021-04-27 台达电子工业股份有限公司 Information processing system, information processing method, and non-transitory computer-readable recording medium
CN110941961A (en) * 2019-11-29 2020-03-31 秒针信息技术有限公司 Information clustering method and device, electronic equipment and storage medium
CN110941961B (en) * 2019-11-29 2023-08-25 秒针信息技术有限公司 Information clustering method and device, electronic equipment and storage medium
CN111198957A (en) * 2020-01-02 2020-05-26 北京字节跳动网络技术有限公司 Push method and device, electronic equipment and storage medium
CN111353019A (en) * 2020-02-25 2020-06-30 上海昌投网络科技有限公司 WeChat public number topic classification method and device
CN112800083A (en) * 2021-02-24 2021-05-14 山东省建设发展研究院 Government decision-oriented government affair big data analysis method and equipment
CN112800083B (en) * 2021-02-24 2022-03-18 山东省住房和城乡建设发展研究院 Government decision-oriented government affair big data analysis method and equipment
CN112883159A (en) * 2021-02-25 2021-06-01 北京精准沟通传媒科技股份有限公司 Method, medium, and electronic device for generating hierarchical category label for domain evaluation short text
CN114417808B (en) * 2022-02-25 2023-04-07 北京百度网讯科技有限公司 Article generation method and device, electronic equipment and storage medium
CN114417808A (en) * 2022-02-25 2022-04-29 北京百度网讯科技有限公司 Article generation method and device, electronic equipment and storage medium
CN115048525A (en) * 2022-08-15 2022-09-13 有米科技股份有限公司 Method and device for text classification and text classification model training based on multi-tuple
CN115577106A (en) * 2022-10-14 2023-01-06 北京百度网讯科技有限公司 Text classification method, device, equipment and medium based on artificial intelligence
CN115577106B (en) * 2022-10-14 2023-12-19 北京百度网讯科技有限公司 Text classification method, device, equipment and medium based on artificial intelligence
CN117172245A (en) * 2023-05-26 2023-12-05 国家计算机网络与信息安全管理中心 Control method and control system

Similar Documents

Publication Publication Date Title
CN107168992A (en) Article sorting technique and device, equipment and computer-readable recording medium based on artificial intelligence
US10936906B2 (en) Training data acquisition method and device, server and storage medium
CN105210064B (en) Classifying resources using deep networks
CN106294344B (en) Video retrieval method and device
CN112819023B (en) Sample set acquisition method, device, computer equipment and storage medium
CN113159095A (en) Model training method, image retrieval method and device
CN106874279A (en) Generate the method and device of applicating category label
CN110598070B (en) Application type identification method and device, server and storage medium
CN104050240A (en) Method and device for determining categorical attribute of search query word
CN111027600B (en) Image category prediction method and device
CN112650923A (en) Public opinion processing method and device for news events, storage medium and computer equipment
CN107491536A (en) A kind of examination question method of calibration, examination question calibration equipment and electronic equipment
CN111539612B (en) Training method and system of risk classification model
CN108009248A (en) A kind of data classification method and system
CN113592605A (en) Product recommendation method, device, equipment and storage medium based on similar products
Rao et al. A first look: Towards explainable textvqa models via visual and textual explanations
CN116975299A (en) Text data discrimination method, device, equipment and medium
CN117132763A (en) Power image anomaly detection method, device, computer equipment and storage medium
CN110532562A (en) Neural network training method, Chinese idiom misuse detection method, device and electronic equipment
CN115392237A (en) Emotion analysis model training method, device, equipment and storage medium
CN110472063A (en) Social media data processing method, model training method and relevant apparatus
CN113705159A (en) Merchant name labeling method, device, equipment and storage medium
CN113761188A (en) Text label determination method and device, computer equipment and storage medium
CN116776157A (en) Model learning method supporting modal increase and device thereof
CN115080748B (en) Weak supervision text classification method and device based on learning with noise label

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170915

RJ01 Rejection of invention patent application after publication