CN107169049A - The label information generation method and device of application - Google Patents
The label information generation method and device of application Download PDFInfo
- Publication number
- CN107169049A CN107169049A CN201710279297.3A CN201710279297A CN107169049A CN 107169049 A CN107169049 A CN 107169049A CN 201710279297 A CN201710279297 A CN 201710279297A CN 107169049 A CN107169049 A CN 107169049A
- Authority
- CN
- China
- Prior art keywords
- application
- level
- class label
- level class
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses the label information generation method and device of a kind of application, belong to technical field of data processing.Method includes:Acquisition application detail information, this is used for the application to a submission using detail information and carries out functional characteristic description;Based on detail information is applied, determine to apply affiliated specified first-level class label at least two first-level class labels prestored;Information sifting processing, the summary info being applied are carried out to application detail information;Based on the word cluster result prestored, Keywords matching is carried out to the summary info of application, based on obtained matching result, determines to apply affiliated specified sub- level tag along sort in the case where specifying first-level class label.Because the generating process of label information is substantially completely automated, therefore substantial amounts of manpower and time can be saved, it is more intelligent.Further, since a concrete function applied and effect more can detailed and be comprehensively characterized using detail information, therefore the label information of generation is more accurate.
Description
Technical field
The present invention relates to technical field of data processing, the label information generation method and device of more particularly to a kind of application.
Background technology
With the continuous progress of society, the smart electronicses product such as smart mobile phone progressively becomes in people's daily life
Indispensability with oneself instrument.At present in order to strengthen Consumer's Experience, it is easy to user fast to carry out types of applications by smart electronicses product
Download, software developer, which have developed, can provide the application resource management platform that abundant application resource is downloaded.Wherein, apply
Resource management platform is essentially a application management software, and it applies the different type of magnanimity the mark according to each application
Label information is classified, such as tourism trip, social communication, financing and shopping etc. is categorized as, to facilitate user in magnanimity application
In quickly search a certain application and be downloaded.Wherein, label information be used for application be identified, with screen application function and
Effect etc..
From above-mentioned narration, the label information of application is particularly significant for subsequent steps such as application class, and it is application
Where one core of resource management platform exploitation, therefore how to generate the label information always art technology of each application
One focus of personnel's concern.And prior art relies on manually to complete when for application generation label information, typically.Than
Such as, application developer to application resource management platform submit one apply when, can for this application carry out a label
The remarks of information, the developer of such application resource management platform side is just this application generation label letter according to this remarks
Breath.Wherein, when generating label information, if label system includes the label information of many levels, the label letter generated
In breath in addition to this applies affiliated first-level class label, in addition to this applies affiliated son under the first-level class label
Level tag along sort.Each application submitted carries out the generation of label information in the manner described above, and then according to this generation
Label information complete application classification.
During the present invention is realized, inventor has found that prior art at least has problems with:
Because label information depends on manually generated, and the application magnanimity of application resource management platform, therefore this kind
The generating mode of label information can consume substantial amounts of manpower and time, not intelligent enough;Further, since label information is relied primarily on
In the remarks of the developer of application, and the usual accuracy of this remarks is poor, therefore the label information of generation may be present
Not accurate enough the and not good enough defect of the other coverage of domain class.
The content of the invention
In order to solve problem of the prior art, the embodiments of the invention provide a kind of label information generation method of application and
Device.The technical scheme is as follows:
First aspect includes there is provided a kind of label information generation method of application, methods described:
Acquisition application detail information, the application detail information is used to retouch the application progress functional characteristic of a submission
State;
Based on the application detail information, the application institute is determined at least two first-level class labels prestored
The specified first-level class label of category;
To carrying out information sifting processing in the application detail information, the summary info of the application is obtained;
Based on the word cluster result prestored, Keywords matching is carried out to the summary info of the application, based on
The matching result arrived, determines the specified sub- level tag along sort belonging to the application under the specified first-level class label, described
Word cluster result is that the summary info progress word cluster processing that application is had been filed on to preset number is obtained.
Second aspect includes there is provided a kind of label information generating means of application, described device:
Acquisition module, detail information is applied for obtaining, and the application detail information is used for the application to a submission and entered
Row functional characteristic is described;
First processing module, for based on the application detail information, at least two first-level class marks prestored
The specified first-level class label belonging to the application is determined in label;
Screening module, for carrying out information sifting processing in the application detail information, obtaining the summary of the application
Information;
Second processing module, for based on the word cluster result prestored, being carried out to the summary info of the application
Keywords matching, based on obtained matching result, determines specifying belonging to the application under the specified first-level class label
Sub- level tag along sort, the word cluster result is that the summary info that application is had been filed on to preset number is carried out at word cluster
What reason was obtained.
The beneficial effect that technical scheme provided in an embodiment of the present invention is brought is:
The application submitted for application developer, the embodiment of the present invention can be automatically based upon the application details letter of the application
Cease to determine that this applies affiliated first-level class label;Next, the label system in order to set up stratification, can also be in the application
The screening of summary info is further carried out in detail information, and based on the word cluster result prestored, the application is plucked
Information is wanted to carry out Keywords matching, so that the sub- level tag along sort belonging to being stamped for the application under first-level class label, due to
The generating process of above-mentioned label information is substantially completely automated, therefore can save substantial amounts of manpower and time, more intelligent.This
Outside, because application detail information more can detailed and comprehensively characterize a concrete function characteristic applied and effect, therefore it is based on
The label information generated using detail information is more accurate and the other coverage of domain class is good.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, makes required in being described below to embodiment
Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for
For those of ordinary skill in the art, on the premise of not paying creative work, other can also be obtained according to these accompanying drawings
Accompanying drawing.
Fig. 1 is a kind of configuration diagram of the label information generation method of application provided in an embodiment of the present invention;
Fig. 2 is a kind of level schematic diagram of tag along sort provided in an embodiment of the present invention;
Fig. 3 is a kind of level schematic diagram of tag along sort provided in an embodiment of the present invention;
Fig. 4 A are a kind of flow charts of the label information generation method of application provided in an embodiment of the present invention;
Fig. 4 B are a kind of label information generating process schematic diagrames of application provided in an embodiment of the present invention;
Fig. 5 is a kind of flow chart of the label information generation method of application provided in an embodiment of the present invention;
Fig. 6 is a kind of composition schematic diagram of label system provided in an embodiment of the present invention;
Fig. 7 is a kind of schematic diagram of the sample data of application sectional drawing provided in an embodiment of the present invention;
Fig. 8 is a kind of text filed schematic diagram of application sectional drawing provided in an embodiment of the present invention;
Fig. 9 A are a kind of schematic diagrames of application sectional drawing provided in an embodiment of the present invention;
Fig. 9 B are a kind of overall flow schematic diagrams of label information generation provided in an embodiment of the present invention;
Figure 10 is a kind of structural representation of the label information generating means of application provided in an embodiment of the present invention;
Figure 11 is a kind of structural representation of server provided in an embodiment of the present invention.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention
Formula is described in further detail.
To the embodiment of the present invention carry out explanation is explained in detail before, first to the present embodiments relate to some names
Word, and the application scenarios and system architecture of the embodiment of the present invention are simply introduced.
Using:The software program being arranged on the terminal devices such as smart mobile phone, tablet personal computer is referred to, it usually can
A certain or multinomial particular job is completed, user model is operated in, can be interacted with user, typically with visual user circle
Face.
Domain classification:, automatically will be using progress domain classification using the application detail information of application (APP).Wherein, lead
Domain classification may include such as service for life, employment, home improvement, auto Life, booking service etc..And apply details letter
Breath may include Apply Names and application recommended information etc..The embodiment of the present invention covers to field classification and using detail information
Content be not especially limited.
Summary info is screened:The important sentence filtered out in the application detail information (mainly using recommended information) of application
Son.That is, some are screened in application detail information (mainly using recommended information) knows that this application is specific for user
Important sentence for function, the summary info that these important sentences screened are applied as this.
Word cluster:Using word2vector (term vector) method, the word feature under same tag along sort is converged.
OCR (Optical Character Recognition, optical character identification) picture character identification:Utilize OCR skills
Art carries out Text region, and text information is identified from picture (spy of the embodiment of the present invention refers to applying sectional drawing).
It is well known that at present application developer after an application is developed, in order to this application can in crowd it is fast
Speed popularization so that more users use the application, it will usually submit the application to application resource management platform.Submitting application
When, generally require application developer synchronously submit some on this using art classification label information remarks, application
Profile information etc..
Under the premise of this, the problem of usually coming at similar following therewith:One, individual other application developer is in order to allow oneself
Application obtain higher light exposure, can exist and deliberately this is set to multiple using affiliated field classification, or even be set to
Altogether irrelevant field classification, causes that the field classification of an application covering is excessive, and severe jamming domain class is other pure
Degree.Two, the developer of application resource management platform side manually applies according to the remarks for this label information submitted for this
Tagged information, not only wastes time and energy, and the label information of generation largely all has not enough accurate and field
The coverage of classification or excessive or too small defect.In order to solve the above problems, the embodiment of the present invention proposes one kind certainly
The method for moving the tagged information of application for submission.
Referring to Fig. 1, the label information generation method of application provided in an embodiment of the present invention is broadly divided into two parts, wherein
One part is that the field classification of application differentiates process, and another part is the tag resolution process of application.
Wherein, for this part differentiated on field classification, it is based primarily upon natural language processing technique and application is carried out
The differentiation of art classification.It is divided into three steps:Screen fraction has been filed on but does not take the embodiment of the present invention to carry first
The application of the tagged information of method of confession, using these applications as training sample, obtains such as Apply Names of training sample
And the application detail information of application recommended information;Next, training domain classification mould according to the application detail information got
Type;Finally, preliminary category filter is carried out to each application of submission according to this field training pattern, is that each should
With stamping first-level class label.
Another part is mainly used in the domain classification that each application to submission is become more meticulous.Wherein, in correspondence
During with the domain classification become more meticulous, summary new screening, word2vector word clusters, OCR picture characters are mainly concerned with
Identification technology etc..That is, domain classification differentiation is being carried out to an application, after the first-level class label for determining an application, also needed
Determine that this applies affiliated sub- level tag along sort.In embodiments of the present invention, sub- level tag along sort mainly includes secondary classification
Label and three-level tag along sort.Certainly, sub- level tag along sort may also include more subtly level Four tag along sort, or Pyatyi
Tag along sort etc., the embodiment of the present invention is limited without specific this, only with including secondary classification label and three-level contingency table
It is illustrated exemplified by label.
Wherein, 19, first-level class label, 118, secondary classification label, three-level tag along sort 923, these ranks
Tag along sort constitutes tree structure label system that is complete and accurate, can covering all trades and professions and different crowd demand.
With reference to a small example, the relation between classifying to first-level class label, secondary classification label and three-level
It is briefly described.Referring to Fig. 2, such as, two fractions as " booking service " can be included under first-level class label " life "
Class label, and under secondary classification label " booking service " can include such as " film ticket ", " drama ticket ", " concert admission ticket " this
The three-level tag along sort of sample.Again such as, referring to Fig. 3, can include under first-level class label " game classification " " online game " this
The secondary classification label of sample, and such as " three states ", " Quadratic Finite Element ", " west can be included under secondary classification label " online game "
Three-level tag along sort as trip ", " celestial chivalrous ".
To sum up, the embodiment of the present invention realizes the tagged letter of application submitted automatically for each based on application detail information
Breath, compared to the manual tag information generating mode of prior art, can save substantial amounts of manpower and time.Further, since using
Detail information more can be detailed and comprehensively characterizes a concrete function applied and effect, therefore based on application detail information generation
Label information it is more accurate and the other coverage of domain class is good, be not in field classification coverage or excessive or too small
Defect.Even if in addition, the field classification belonging to an application is deliberately set to multiple by application developer, or even being set to
Complete incoherent field classification, takes label information generation method provided in an embodiment of the present invention, can also pass through such as field
Disaggregated model etc. to carry out classification correction to this application, and correct tag along sort is stamped for it.
Fig. 4 A are a kind of flow charts of the label information generation method of application provided in an embodiment of the present invention.With reference to figure
Label information product process figure shown in 4B carries out detailed to the label information generation method of application provided in an embodiment of the present invention
Ground is illustrated.Referring to Fig. 4 A, method flow provided in an embodiment of the present invention includes:
Detail information is applied in 401a, acquisition, and this is used for the application to a submission using detail information and carries out functional characteristic
Description, and detail information is applied based on this, determine this using affiliated at least two first-level class labels prestored
Specify first-level class label.
In embodiments of the present invention, in order to carry out the data syn-chronization of the whole network in time, it will usually every periodically pulling daily
The full dose information of one application.Wherein, the full dose information of an application is referred to applies related all information to one.Such as unite
One carries out pulling for full dose information in daily zero point.Afterwards, the extraction of structured message is carried out in these full dose information, with
Obtain the application detail information for each application submitted.Wherein, it may include Apply Names, download using detail information, comment
Point, using recommended information, using sectional drawing etc. all to this apply related content, the embodiment of the present invention to this without specifically limit
It is fixed.And after structural data extraction is carried out, can also be first to extracting for the ease of proceeding data processing in subsequent process
The structural data arrived performs data prediction operation, such as removes noise, mess code or punctuation mark are filtered, to knot
Content of text in structure data carries out word segmentation processing and filtering stop words etc..Wherein, stop words can such as,
, etc. modal particle.
After above-mentioned processing is carried out, one is had been filed on and label information generation side provided in an embodiment of the present invention is not taken
For the application of the tagged information of method, the embodiment of the present invention can first according to the application detail information of the application got,
The differentiation of domain classification is carried out to the application, i.e., stamps a first-level class label first for the application.It should be noted that this
Inventive embodiments expose to prevent one to apply under the classification not theed least concerned, can be to being done using affiliated first-level class label
The classification of one institutionalize a so that application is at most only capable of being labeled with the first-level class label of at most two, i.e., one application
At most it is only capable of exposing under two domain classifications.Such as, baby * * are bred using at most can be while belonging to " social activity ", " health "
The two domain classifications.
In embodiments of the present invention, it is specifically by the application of the application when stamping first-level class label to an application
In the good domain classification model of detail information input training in advance, this is provided using affiliated specified one by the domain classification model
Level tag along sort.Wherein, with reference to Fig. 4 B, the training process of the domain classification model generally includes following several steps:
(a), for each the first-level class label prestored, preliminary belonging first-level class label is obtained extremely
A few application.
Wherein, the embodiment of the present invention has prestored 19 first-level class labels altogether, will also apply affiliated domain class
It is not divided into 19 classes altogether.It should be noted that at least one for obtaining the preliminary belonging first-level class label applies this sentence
Words are meant that:It is because application developer can provide the domain classification of the application in one application of submission, i.e., simple to be answered to be somebody's turn to do
With the remarks for carrying out a label information, so at least one application of a preliminary belonging first-level class label here, refers to
It is generation, according to remarks of the application developer to each label information applied, each application is first preliminary according to this 19 necks
Domain is classified, to construct training sample set.
Such as, for amounting to 19 first-level class labels, for each first-level class label, according to application and development
Person collects the application detail information of 1000 applications under each affiliated first-level class label to the domain classification of application.That
The application detail information of 19000 applications can be collected into altogether.
In addition, in the training process for carrying out domain classification model, what is mainly utilized includes using detail information
Using recommended information.So the application detail information that subsequent step (b) is related into step (d) specifically refer mainly to generation be its bag
The application recommended information contained.
(b) manual sort's annotation results at least one application, are obtained, based on manual sort's annotation results, at least
The training sample for carrying out model training is filtered out in one application, the training sample is to be determined again after manual sort
For the application of the belonging first-level class label.
Classify inaccurate because at least one application under each first-level class label in step (a) there may be
Suspicion, therefore also need manually to mark at least one application under each tag along sort again.Such as, if one should
It is correct with the domain classification belonging to being determined after manual sort verifies, i.e., its belonging first-level class label is determined again,
Then mark 1;If an application determines affiliated domain classification mistake, the i.e. application substantially after artificial verify and is not belonging to
The first-level class label, then mark 0.
Afterwards, for each first-level class label, according to manual sort's annotation results, in above-mentioned collection at least
Classification correct application when those application developer preliminary classifications are filtered out in one application, should belonging to using these applications
The training sample of first-level class label.In other words, for a first-level class label, the embodiment of the present invention only classifies those
Correct application, which is placed in training sample set, to be trained, to lift the classification accuracy of domain classification model.Such structure
The mode of training sample set is made, be may result in for 19 first-level class labels, the instruction under each tag along sort
The number for practicing sample is all different.
(c) word segmentation processing, is carried out to the application detail information of the training sample of each belonging first-level class label, obtained
To word segmentation result, obtained word segmentation result and the corresponding first-level class label of word segmentation result are arrived into specific instruction with specified format storage
Practice in text.
After training sample is obtained, the embodiment of the present invention, first can be to belonging each in order to train domain classification model
The application detail information of the training sample of individual first-level class label carries out word segmentation processing.Wherein, this hair when carrying out word segmentation processing
Bright embodiment is directed to using the application recommended information included in detail information.
In embodiments of the present invention, it is main that this kind of natural language processing technique of participle instrument is increased income to application using jieba
Recommended information carries out word segmentation processing.If it should be noted that the length of Apply Names it is long and need carry out word segmentation processing, that
Apply Names can be also directed to when performing word segmentation processing.For this kind of situation, application name is just included in obtained word segmentation result
Claim and using the word segmentation result of both recommended informations.
Wherein, jieba participle instruments of increasing income mainly support three kinds of participle patterns:One kind is accurate model, it is intended to by sentence
Most accurately cut, be primarily adapted for use in text analyzing;Another is syntype, by sentence it is all can into word word all
Scan and, although speed is very fast, but can not solve ambiguity problem;Last one kind is search engine pattern, in accurate mould
On the basis of formula, to long word cutting again, recall rate is improved, is suitable for search engine participle.The embodiment of the present invention is base
Word segmentation processing is carried out to application recommended information in a kind of last participle pattern.
After word segmentation processing is finished, text classification next is carried out using TextGrocery Open-Source Tools, to train field
Disaggregated model, details content is as follows:It is " affiliated with specified format for each training sample in above-mentioned training sample
First-level class label title+t+ word segmentation results ", store it in specific training text train.txt.That is, will instruction
The word segmentation result for practicing the application of each in sample set is stored in train.txt according to above-mentioned specified format.In addition, in this hair
Why it is because its classifying quality quality to short text is high and use using TextGrocery Open-Source Tools in bright embodiment
It is convenient.It is of course also possible to carry out text classification, this hair with the functionally similar instrument of TextGrocery Open-Source Tools using other
Bright embodiment is to this without specific restriction.
If it should be noted that word segmentation processing has been carried out in data preprocessing phase, then the step can be straight
Connect and skip, the word segmentation result directly obtained using data preprocessing phase just may be used.
(d) model training, is carried out based on text classification tool function and specific training text, training pattern is obtained;Afterwards,
Cross-beta is carried out to training pattern, until the nicety of grading of training pattern meets preparatory condition, domain classification model is obtained.
For the step, after specific training text train.txt is obtained, text classification tool function can be called
Grocery.train () carries out model training, and text classification tool function grocery.train () input is just
train.txt.In another embodiment, in order to ensure the classification accuracy of domain classification model trained, can also to
The training pattern arrived carries out cross-beta, until the nicety of grading of obtained training pattern meets preparatory condition, now obtains
Training pattern is just for the domain classification model needed for the embodiment of the present invention.
Wherein, cross-beta be meant that training sample set and test sample set exchange identity carry out model training with
And model measurement.Such as, 4000 samples are used for model training and model measurement.Wherein, 3000 samples are instructed for model
Practice, remaining 1000 samples are used for model measurement.Respectively with A, B and C to every 1000 samples in above-mentioned 3000 samples
Originally it is identified, remaining 1000 samples is identified with D, then when performing cross-beta, if carries out model training
Use A+B+C, then model measurement is to use D;After a wheel cross-beta, it may carry out using B+C+D during model training,
A is used in model measurement, it is such.
Wherein, the preparatory condition that nicety of grading is met can be that the classification accuracy of obtained training pattern is more than default threshold
Value, or the recall rate of obtained training pattern are more than predetermined threshold value, and the embodiment of the present invention is to this without specific restriction.
In embodiments of the present invention, just can be by the domain classification model, to institute after above-mentioned domain classification model is obtained
There is application that application developer submits and pending classification to be classified.Specifically, the application details of each application are believed
Breath input is into above-mentioned domain classification model;Afterwards, the domain classification result of domain classification model output is obtained.Wherein, should
Domain classification result includes the probability of each first-level class label at least two first-level class labels belonging to the application
Score value;Finally, at least two first-level class labels, at least one first-level class label of screening probability score highest will
At least one first-level class label of probability score highest is defined as this using affiliated specified first-level class label.
That is, the domain classification model is specifically provided belonging to the application when providing a domain classification result applied
The probability score of each first-level class label in 19 first-level class labels, i.e., carry out domain classification marking to the application,
Most at last 1 to 2 first-level class label of probability score highest as the application specified first-level class label.It is mentioned here
The application of pending classification, reference is the application for treating to be classified by method provided in an embodiment of the present invention.
To sum up, after being differentiated by the domain classification shown in step 401, the classification results exported based on domain classification model,
Complete the classification correction for each application submitted to application developer.
402a, the application detail information to the application carry out information sifting processing, obtain the summary info of the application.
In embodiments of the present invention, above-mentioned steps 401 are being used to enter for each application of classification submit and pending
After the other differentiation of row domain class, also it need to yet further determine that this applies affiliated two grades on the basis of first-level class label
Sub- level tag along sort as tag along sort and three-level tag along sort.Wherein, in generation secondary classification label and three-level classification
During label, Main Basiss are the application recommended informations included using detail information and apply sectional drawing.This step only for should
With the mode of recommended information.
For some apply recommended information, wherein can have substantial amounts of invalid and redundant information often.For example,
It is this kind of that the application recommended information of " friend-making of * * videos " includes such as " we do not do word, and phrase sound is not done, and short-sighted frequency is not done "
Content, these contents can not therefrom extract effective tag along sort it is clear that some are unimportant.Therefore
In order to stamp accurate sub- level label information to an application, the application recommended information to application is also needed to carry out summary screening.Its
In, the embodiment of the present invention mainly takes TextRank algorithm when screening summary info in application recommended information, including following several
Individual step:
(1) the application recommended information included in the application detail information of the application, is cut at least two short sentences, calculated
Similarity at least two short sentences between any two short sentence.
For the step, at least two short sentence S will be cut into using recommended information T first1To Sm.That is, T=[S are obtained1,
S2..., Sm], next, for S1To SmIn any two sentence SiAnd Sj, similarity therebetween can use following public affairs
Formula is represented:
Wherein, the molecule in formula represents two sentence SiAnd SjIn the number of word that occurs jointly, that is, represent both to have belonged to
Sentence SiWord, fall within sentence SjWord;Wherein, symbol " ∨ " is disjunction sign, and symbol " ∧ " is conjunction symbol.Point
In mother | Si| and | Sj| that represent respectively is sentence SiAnd SjIn word number.tkWhat is represented is a word,
Similarity(Si,Sj) be used to represent two sentence SiAnd SjBetween similarity.
(2), according to the similarity between any two short sentence, the important journey of each short sentence at least two short sentences is calculated
Angle value.
In embodiments of the present invention, the importance value of each short sentence is calculated using following formula.
Wherein, WS (Vi) refer to sentence ViImportance value, d is damped coefficient, usual value be 0.85, wji=
Similarity(Si,Sj) it is sentence SiAnd SjBetween similarity, WS (Vj) refer to sentence VjImportance value, wjk=
Similarity(Sk,Sj) it is sentence SkAnd SjBetween similarity.
Wherein, TextRank algorithm regards at least two short sentences splitted out as Yi Zhangbao when screening summary info
The authorized graph of multiple nodes is included, each sentence is as a node in figure, if having similitude between two sentences, then it is assumed that right
There is a side of having the right between two nodes answered, weights are similarities.Sentence V in above-mentioned formulaiJust it is a node in figure.
With reference to the above, In (Vi) what is referred to is the in-degree of a node, i.e. the bar number into the side of the node, Out (Vj) refer to
The out-degree of a node, i.e. the bar number from the side of the node.
(3), the importance value of each short sentence is ranked up according to order from big to small, at least two short sentences
In filter out the short sentence of specifying number come above, according to the sequencing occurred in application recommended information, by specified number
Mesh short sentence is combined, the summary info being applied.
For the step, because the basic thought of TextRank algorithm is as plucking using some sentences of significance level highest
Will, therefore after the importance value of each short sentence is obtained, the embodiment of the present invention can be entered to the importance value of each short sentence
Row sequence, such as be ranked up according to order from big to small.Afterwards, based on this ranking results, filter out and come above
A short sentence is specified number, summary info of a short sentence as the application will be specified number.Wherein, the row of importance value is being carried out
During sequence, also it just can so be filtered out based on ranking results and come specifying number below according to from the progress of the order of small arrival
Short sentence.The big I specified number is 4 or 5 etc., and the embodiment of the present invention is to this without specific restriction.
In another embodiment, in order to ensure the continuity of content of summary info screened, the embodiment of the present invention is also
It can be combined according to the sequencing that a short sentence occurs in application recommended information is specified number to specifying number a short sentence,
The summary info for obtaining the application is combined according to original text order.
It should be noted that except the screening of summary info can be carried out in application recommended information using TextRank algorithm
Outside, the screening of keyword can be also carried out in application recommended information, and for follow-up sub- level label generating process.Wherein, close
Keyword screening detailed process be:At least two short sentences will be cut into using recommended information, deactivation is filtered out in each sentence
Word, and only retain the word for specifying part of speech.It is hereby achieved that the set and the set of word of sentence.It regard each word as nothing
A node in weight graph.Window size is set as k, it is assumed that a sentence is made up of following word successively:W1, w2, w3,
W4, w5 ..., wn, wherein, w1, w2 ..., wk, w2, w3 ..., wk+1, w3, w4 ..., wk+2 etc. are a windows.One
There is a side had no right between the corresponding node of any two words in individual window.And based on above-mentioned composition without weight graph,
The significance level of each word can be calculated.Finally, the maximum some words of most important degree are used as keyword.
403rd, based on the word cluster result prestored, Keywords matching is carried out to the summary info of the application, is based on
Obtained matching result, determines that this applies affiliated specified sub- level tag along sort in the case where specifying first-level class label.
In embodiments of the present invention, after the summary info of application is filtered out according to above-mentioned steps 403, according further to
The summary info determines that this applies affiliated sub- level tag along sort.Specifically the practice is:First according to the preset number of the whole network
Submit the summary info of application to carry out word cluster, the summary info of the application is closed again by obtained word cluster result
Keyword is matched, and then stamps sub- level tag along sort according to matching result for the application.Wherein, the whole network is primarily directed at present
The summary info of about 3,000,000 applications carries out word cluster altogether, and detailed process is as follows:
403a, according to affiliated first-level class label, the summary info that preset number has been filed on application is divided
Processing, obtains the training summary info of each first-level class label.
For the step, so that the summary info using above-mentioned about 3,000,000 applications carries out word cluster as an example, due to the present invention
Embodiment has prestored 19 first-level class labels, therefore this step is based on 19 first-level class labels, will about 3,000,000 applications
Summary info be divided into 19 parts.For a first-level class label, by be allocated to this first-level class label those
The summary info of application, is used as follow-up training and the training summary info of the term vector model of this first-level class label.
403b, training summary info based on each first-level class label carry out model training, obtain with each one
The term vector model of level tag along sort matching.
For the step, how many first-level class label will train the term vector model of how many.In the present invention
In embodiment, using word2vector instruments, the training summary info for being respectively adopted 19 parts trains 19 term vectors
Word2vector models.Wherein, term vector is a kind of technology for word being processed into vector, and ensures the relative phase between vector
It is related like spending to semantic similarity.In other words, word2vec technologies are a kind of height that word is characterized as to real number value vector
Algorithm model is imitated, it utilizes deep learning thought, the processing to content of text can be reduced to empty by K dimensional vectors by training
Between in vector operation, and similarity in vector space can be for representing similar on text semantic.
In addition, during a term vector model is trained, can also be first under belonging correspondence first-level class label
Summary info is trained to carry out word segmentation processing, and by the storage of obtained word segmentation result into an input text (form is txt), it
Word2vector source codes are run afterwards, and this input text is subjected to model training as input, so as to obtain term vector mould
Type.And after 19 term vector models are obtained, word cluster is just carried out to each tag along sort using each term vector model,
Detailed process is referring to following step 403c.
The son included under 403c, the first-level class label prestored for each first-level class label, acquisition
Level tag along sort;The first-level class label and the sub- level tag along sort are inputted into the term vector model of matching, obtain with
The first-level class label and the cluster word of the sub- level tag along sort matching;Obtained whole cluster words are combined,
Obtain above-mentioned word cluster result.
Wherein, label system is prestored, and wherein label system is divided into three layers, 19, first-level class label,
118, secondary classification label, three-level tag along sort 923.In embodiments of the present invention, for each first-level class label,
The secondary classification label and three-level tag along sort included under the first-level class label is obtained respectively, then respectively by this fraction
Class label, the secondary classification label included and three-level tag along sort are inputted into above-mentioned 19 term vector models as input
That term vector model of matching, and then obtain and this first-level class label, the secondary classification label included and three-level
The cluster word of tag along sort matching, that is, obtain the participle list close with the title of above-mentioned tag along sort.
It should be noted that above-mentioned, substantially refer to tag along sort input word vector model is by the name of tag along sort
Claim input to term vector model.In addition, after a participle list is obtained, can also be by manual examination and verification in this participle list
The cluster word of appearance is further checked so that the degree of purity of the word in participle list is higher, the embodiment of the present invention
To this without specific restriction.Finally, obtained multiple participle lists are combined, under three obtained grade classification label
The cluster word covered can reach more than 5000.That is, by the word cluster of word2vector technologies, it can obtain such as Fig. 6
The label system of described stratification.
And after word cluster is completed, just Keywords matching can be carried out to the summary info of above-mentioned application, if matching word
Any keyword in language cluster result, just stamps corresponding two grades and three-level tag along sort for the application, and detailed process is:
The summary info of the application is matched with the cluster word that above-mentioned word cluster result includes, matching result is obtained;If
The matching result indicates to include any cluster word in above-mentioned word cluster result in the summary info of the application, then will be with this
It is any cluster word match sub- level tag along sort as the application specified sub- level tag along sort.
For citing a plain example, it is assumed that stamp " amusement and leisure " this first-level class label for one, this
Include " physical culture " this secondary classification label under individual " amusement and leisure " this first-level class label, and in " physical culture " this two fraction
Category is signed and including " ball " this three-level tag along sort, wherein covered under this three-level tag along sort such as " football ",
Participle is clustered as " vollyball ", " basketball " etc., if the summary info of the application includes " football " this keyword, then
" ball " this three-level tag along sort and " physical culture " this secondary classification label will be stamped to the application.
Method provided in an embodiment of the present invention, the application submitted for application developer, the embodiment of the present invention can be certainly
The application detail information based on the application is moved to determine that this applies affiliated first-level class label;Next, in order to set up level
The label system of change, can also further carry out the screening of summary info, and the word based on storage in this applies detail information
Cluster result, carries out Keywords matching, so as to stamp the institute under first-level class label for the application to the summary info of the application
The sub- level tag along sort of category, because the generating process of above-mentioned label information is substantially completely automated, therefore can save substantial amounts of people
It is power and time, more intelligent.Further, since more can detailed and comprehensively characterize the specific work(of an application using detail information
Can and it act on, therefore the label information based on application detail information generation is more accurate and the other coverage of domain class is good.
Understood with reference to above-mentioned Fig. 4 B, completed in the step 401 by above-described embodiment and a fraction is carried out to an application
After the correction of class label, then completed by the step 402 and step 403 of above-described embodiment level contingency table is stamped to the application
Label are completed after label information generation, if it find that the tag along sort generated is inaccurate or coverage of domain classification is inadequate
It is good, then also label information correction can be carried out by schemes such as APP black and white lists as shown in Figure 4 B, or according to OCR pictures text
Word identification technology carries out label information supplement to the application.Wherein, the process for carrying out label information supplement asks implementation as described below
Example.
Fig. 5 is a kind of flow chart of the label information generation method of application provided in an embodiment of the present invention.Referring to Fig. 5, sheet
The method flow that inventive embodiments are provided includes:
501st, the application detail information of the application of a submission is obtained, detail information is applied based on this, what is prestored
Determine this with affiliated specified first-level class label at least two first-level class labels.
The step 501 is similar with above-mentioned steps 401, and here is omitted.
502nd, Screening Treatment is carried out to the application detail information of the application, obtains the summary info of the application.
The step 502 is similar with above-mentioned steps 402, and here is omitted.
503rd, based on the word cluster result prestored, Keywords matching is carried out to the summary info of the application, is based on
Obtained matching result, determines that this applies affiliated specified sub- level tag along sort in the case where specifying first-level class label.
The step 503 is similar with above-mentioned steps 403, and here is omitted.
504th, label information supplement is carried out to the application according to OCR picture characters identification technology.
Wherein, the process for carrying out label information supplement to the application according to OCR picture characters identification technology can be divided into detail
Following several steps:
(I) obtains at least one included in the application detail information of the application and applies sectional drawing.
Wherein, OCR technique has in the industry cycle had wide practical use at present, such as identity card identification, bank card identification,
Case identification, business card recognition etc., and take OCR technique to be directed to that one application of opening is installed first in the embodiment of the present invention
When the application sectional drawing seen of user.That is, the embodiment of the present invention is attempted to carry out Text region to application sectional drawing using OCR technique,
And stamp corresponding label.
Different from traditional OCR technique, claim for the character recognition technology such as using this kind of natural scene image of sectional drawing
Be STR (Scene Text Recognition, scene Text region).Wherein, in order to application sectional drawing in exactly
Text region is carried out, the embodiment of the present invention can mark a large amount of high-quality samples such as shown in Fig. 7.Wherein, these are high-quality
Sample both from application sectional drawing data, and each apply sectional drawing in word segment occupy certain proportion.Such as,
In the application sectional drawing of Investment & Financing type in Fig. 7, the application sectional drawing of electronic contract type, the application sectional drawing of recognition of face type
Word segment occupies very big proportion, and such high quality samples, which just more have, utilizes smart in application sectional drawing in subsequent process
Really carry out Text region.
(II) will be disassembled as at least one image channel, at least one figure for each Zhang Yingyong sectional drawings using sectional drawing
In each image channel included as passage, positioning include word at least one is text filed.
It is first when carrying out Text region at least one application sectional drawing of the application based on STR technologies for the step
First need to also be text filed where location character in application sectional drawing.Wherein, MSER is mainly taken in embodiments of the present invention
(Maximally Stable Extremal Regions, maximum extreme value stability region) and SWT (Stroke Width
Transform, the conversion of person's handwriting width) both approaches.Regardless of whether taking MSER methods or taking SWT methods, obtaining
After alternative textbox, there can be the grader of supervision using one to recognize whether these alternative textboxs really belong to
Word segment.
In embodiments of the present invention, SVM (Support Vector Machine, SVMs) can be used to come to upper
Alternative textbox is stated to be classified, so therefrom determine including word at least one is text filed.In addition, in order to be lifted
The recall rate of Text region, when being handled using sectional drawing one, will can be disassembled as at least one image using sectional drawing
Passage, above-mentioned character machining algorithm is independently executed to different image channels, finally by the results of different passages carry out duplicate removal with
Merging treatment, so that at least one oriented in above-mentioned at least one application sectional drawing including word is text filed.Wherein, wrap
Text filed locating effect containing word can come be to navigate to as shown in figure 8, the square frame of i.e. black overstriking is outlined
Include the text filed of word.
Referring to Fig. 8, for first application sectional drawing of first row, navigate to it is text filed have two, be respectively " vertical
Body repaiies face " and " possessing fair and tender maiden's flesh " the two.For second application sectional drawing of first row, what is navigated to is text filed
Have two, be respectively " U.S. face filter " and " have U.S. face want to broadcast just broadcast " the two.For first ranked third an application sectional drawing,
Navigate to it is text filed also have two, be respectively " live audio " and " entertaining audio is live " the two.For remaining
Three figures of two rows, by that analogy, therefrom can be navigated to text filed exactly.
(III) is combined at least one text filed carry out Text region to the word recognized, is obtained at least
The Text region result of one application sectional drawing, above-mentioned specified first-level class mark is removed based on the word recognition result for the application generation
Label information outside label and specified sub- level tag along sort.
Locking application sectional drawing in it is text filed after, identification of the embodiment of the present invention just based on character gradient statistical information
Method, come at least one the text filed carry out Text region navigated to from application sectional drawing.In addition, the embodiment of the present invention exists
, can be according to the characteristics of character (such as size, direction, density, position etc.), to construct corresponding energy about when carrying out Text region
Beam function is by the text filed line of text for being combined as independent and pending Text region.It is automatic to carry out for each line of text
Cutting or combined treatment, for each the primitive rectangular block obtained after processing, can obtain a corresponding recognition result,
Wherein this recognition result includes the word recognized and corresponding confidence level.Above-mentioned to each text filed progress
After Text region processing, take respective algorithms to be combined the word recognized, just obtain the text of at least one application sectional drawing
Word recognition result.
And after the Text region result to the application sectional drawing of an application is obtained, just can be according to the word recognition result pair
Row label supplement is entered in the application.Label supplement process is described with a specific example below.Referring to Fig. 9, for
For XX map applications, it drives to lead using real-time road, crossing actual scene, location information seniority among brothers and sisters and dynamic is included in sectional drawing
Multiple sectional drawings such as boat, the content of text included in these sectional drawings is substantially for introducing this application and to this using leading
Domain classification has obvious help, therefore recognizes that these apply the word content in sectional drawing based on OCR picture characters identification technology.Than
Such as, stamped after " national real-time road avoid congestion save worry trip " this content of text for the XX map applications " real recognizing
Shi Lukuang " label.5% full dose can be contributed by being currently based on the label information method for digging of OCR picture character identification technologies
The overlay capacity of label information, and the degree of accuracy is up to 99%.
Furthermore, it is necessary to which explanation, the above-mentioned Text region result recognized in application sectional drawing is also applied to above-mentioned
In the step of step 402 and 502 screening summary info.That is, can application recommended information and it is above-mentioned at least one should
With the screening that summary info is carried out in the Text region result of sectional drawing, the embodiment of the present invention is to this without specific restriction.
Method provided in an embodiment of the present invention, the application submitted for application developer, the embodiment of the present invention can be certainly
The application detail information based on the application is moved to determine that this applies affiliated first-level class label;Next, in order to set up level
The label system of change, can also further carry out the screening of summary info, and the word based on storage in this applies detail information
Cluster result, carries out Keywords matching, so as to stamp the institute under first-level class label for the application to the summary info of the application
The sub- level tag along sort of category, because the generating process of above-mentioned label information is substantially completely automated, therefore can save substantial amounts of people
It is power and time, more intelligent.Further, since more can detailed and comprehensively characterize the specific work(of an application using detail information
Can and it act on, therefore the label information based on application detail information generation is more accurate and the other coverage of domain class is good.
To sum up, the generation method of label information provided in an embodiment of the present invention relies on natural language processing technique, OCR pictures
Character recognition technology, can go out the label information being worth with high reference for each usage mining submitted, and realize that one is answered
Label information can cover the function point that this applies various aspects.User can so be facilitated in such as software classification, trip
The application of itself needs is quickly and easily found in the field classifications such as play classification.In addition, being provided by the embodiment of the present invention
Hierarchically structured label system, can cause user such as software classification page, game classification the page in browse difference
List of application under classification and label, finds the application completion download for meeting oneself demand.
In other words, by the generation method of label information provided in an embodiment of the present invention, ensured that each applies energy
It is enough accurately to be exposed under relevant classification label, and then ensure itself suitable functional requirement can be quickly found when user browses
Application, and be downloaded and use.For conclusion, mainly with following function point:
Label system is abundant detailed first, user can be helped from the application of magnanimity, according to the hierarchical structure mark of foundation
Label system is quickly found out the application of suitable itself functional requirement point.
In addition, can be upgraded in time using the matching degree between label information.In embodiments of the present invention, new opplication
Issue or the generation of new label (such as in Fig. 3 " Quadratic Finite Element " label from scratch), the title of application, using recommended information,
The download of application and user can be handled scoring renewal of application etc. in completion in regular hour window.Due to these because
Son can be influenceed using the matching degree between label information, therefore, may when occurring larger renewal in the above-mentioned factor
Considerable influence is produced with the matching degree between existing label information to application, now the embodiment of the present invention can also be according to above-mentioned several
Individual step stamps new label for the application again.Wherein, using the matching degree between label information generated for application
What tag along sorts at different levels were to determine, such as when for application generation first-level class label, domain classification model can provide the application
The probability of a belonging first-level class label, this probability is the degree of correlation therebetween.
In another embodiment, with reference to Fig. 4 B, the embodiment of the present invention can also according to the application and label information of application it
Between matching degree auto-sequencing is carried out to the application under same tag along sort list so that help user be quickly found out quality it is high and
And the application of functional requirement matching.Certainly, in the sequencer procedure of application, except being outside one's consideration with reference to matching therebetween, may be used also
The scoring of download and user to application based on application, to be weighted accordingly, the deviation of weighting is that download is got over
High or the higher application of scoring sequence is more forward.
In summary, the embodiment of the present invention constructs the complete label system of coverage, wherein first-level class 19, two grades
Classification 118, three-level is classified 923.And using word2vector algorithms to filtering out in app application recommended information
Summary info carries out word cluster, obtains high frequency, the cluster word that the degree of correlation is high under tag along sort titles at different levels, finally gives
The cluster word of participle rank can reach more than 5000 under labels at different levels.In addition, the embodiment of the present invention additionally uses text text
Sorting technique submits the app come up art classification to be corrected to application developer, it is therefore prevented that application developer pair
Its app developed exposes cheating under irrelevant domain classification.In addition, being carried out using textrank algorithms to application recommended information
Summary screening, filters out using the language that unintelligible emphasis is not protruded is told about in recommended information, improves the standard of label information
Exactness.The word in application sectional drawing is identified and is based on recognizing in addition, also using OCR picture characters identification technology
Word be application stamp corresponding label.
Fig. 9 B are a kind of holistic approach flow charts of the label information generation of application provided in an embodiment of the present invention.Referring to figure
9B, method flow provided in an embodiment of the present invention includes:
901st, the full dose information of all applications of submission is periodically pulled from the whole network, structure is carried out to obtained full dose information
Change data to extract, and data prediction operation is performed to the structural data extracted, obtain the application details of each application
Information.
902nd, carry out domain classification model training and obtained based on word2vector word clusters method to whole submissions
Application word cluster result, store the word cluster result.
903rd, for each remaining application, based on the domain classification model pre-established and the application details of the application
Information, obtains the domain classification result to the application, and the domain classification result includes each fraction belonging to the application
The probability score of class label.
904th, in whole first-level class labels, probability score highest at least one first-level class label is filtered out,
By this probability score highest, at least one first-level class label is defined as this using affiliated specified first-level class label.
905th, the application recommended information included in the application detail information of the application is cut at least two short sentences, calculated
Similarity at least two short sentences between any two short sentence, and according to the similarity between any two short sentence, calculate every
The importance value of one short sentence.
906th, the importance value of each short sentence is ranked up according to order from big to small, based on obtained sequence
As a result, the short sentence of specifying number come above is filtered out at least two short sentences;According to this apply recommended information in go out
Existing sequencing, will specify number a short sentence and is combined processing, obtain the summary info of the application.
907th, the cluster word progress for including the summary info of the application and the word cluster result prestored
Match somebody with somebody, obtain matching result;If the matching result indicates to include in the summary info of the application any poly- in word cluster result
Class word, then will apply affiliated specified sub- level contingency table with the sub- level tag along sort of any cluster word match as this
Label.
908th, at least one included in the application detail information for obtaining the application applies sectional drawing;Cut for each Zhang Yingyong
Figure, this is disassembled as at least one image channel using sectional drawing, and logical in each image that at least one image channel is included
In road, positioning include word at least one is text filed.
909th, at least one text filed carry out Text region, and the word recognized is combined, obtains described
The Text region result of at least one application sectional drawing;It is that the application generation removes specified first-level class based on the word recognition result
Label information outside label and specified sub- level tag along sort.
Method provided in an embodiment of the present invention, the application submitted for application developer, the embodiment of the present invention can be certainly
The application detail information based on the application is moved to determine that this applies affiliated first-level class label;Next, in order to set up level
The label system of change, can also further carry out the screening of summary info, and the word based on storage in this applies detail information
Cluster result, carries out Keywords matching, so as to stamp the institute under first-level class label for the application to the summary info of the application
The sub- level tag along sort of category, because the generating process of above-mentioned label information is substantially completely automated, therefore can save substantial amounts of people
It is power and time, more intelligent.Further, since more can detailed and comprehensively characterize the specific work(of an application using detail information
Can and it act on, therefore the label information based on application detail information generation is more accurate and the other coverage of domain class is good.In addition,
Also Text region can be carried out to application sectional drawing based on OCR picture characters identification technology, and based on the word recognized further
Label information supplement is carried out, the accuracy of the label information for each application generation is more ensure that.
Figure 10 is a kind of structural representation of the label information generating means of application provided in an embodiment of the present invention.Referring to figure
10, the device includes:
Acquisition module 1001, for applying detail information, this is used for the application to a submission using detail information and carried out
Functional characteristic is described;
First processing module 1002, for based on detail information is applied, at least two first-level class marks prestored
Determine that this applies affiliated specified first-level class label in label;
Screening module 1003, for carrying out information sifting processing using detail information to this, obtains the summary letter of the application
Breath;
Second processing module 1004, for based on the word cluster result prestored, entering to the summary info of the application
Row Keywords matching, based on obtained matching result, determines that this applies affiliated finger stator stage in the case where specifying first-level class label
Tag along sort, the word cluster result is that the summary info progress word cluster processing that application is had been filed on to preset number is obtained
's.
In another embodiment, first processing module 1002, for based on the domain classification model pre-established and this
The application detail information of application, obtains the domain classification result to the application, and the domain classification result is included belonging to the application
The probability score of each first-level class label at least two first-level class labels;In at least two first-level class labels
In, probability score highest at least one first-level class label is filtered out, by least one first-level class of probability score highest
Label is defined as specifying first-level class label.
In another embodiment, acquisition module 1001, are additionally operable to for each at least two first-level class labels
Individual first-level class label, obtains at least one application of preliminary belonging first-level class label;Obtain at least one application
Manual sort's annotation results;
First processing module 1002, is additionally operable to be based on manual sort's annotation results, and use is filtered out at least one application
In the training sample for carrying out model training, training sample first-level class label belonging to being defined as again after manual sort
Application;
First processing module 1002, is additionally operable to the application details of the training sample to each belonging first-level class label
Information carries out word segmentation processing;Obtained word segmentation result and the corresponding first-level class label of word segmentation result are arrived with specified format storage
In specific training text;
First processing module 1002, is additionally operable to carry out model instruction based on text classification tool function and specific training text
Practice, obtain training pattern;Cross-beta is carried out to training pattern, until the nicety of grading of obtained training pattern meets default bar
Part, obtains domain classification model.
In another embodiment, screening module 1003, for the application recommended information that will be included in application detail information
It is cut at least two short sentences;Calculate the similarity between any two short sentence at least two short sentences;It is short according to any two
Similarity between sentence, calculates the importance value of each short sentence at least two short sentences;It is right according to order from big to small
The importance value of each short sentence is ranked up, and based on obtained ranking results, is filtered out and is come at least two short sentences
Above specify number short sentence;According to the sequencing occurred in application recommended information, a short sentence will be specified number and carried out
Combined treatment, the summary info being applied.
In another embodiment, Second processing module 1004, are additionally operable to, for each first-level class label, obtain one
Level contingency table signs the sub- level tag along sort included;Based on first-level class label and sub- level tag along sort and a fraction
In class label and the term vector model of the sub- level tag along sort matching, obtain and first-level class label and sub- level contingency table
Sign the cluster word of matching;Obtained whole cluster words are combined processing, word cluster result is obtained.
In another embodiment, Second processing module 1004, are additionally operable to according to affiliated first-level class label, to default
The summary info that number has been filed on application carries out division processing, obtains the training summary of each belonging first-level class label
Information;Training summary info based on each belonging first-level class label carries out model training, obtains and each one-level
The term vector model of tag along sort matching.
In another embodiment, Second processing module 1004, for by the summary info of the application and word cluster knot
The cluster word that fruit includes is matched, and obtains matching result;If matching result indicates to include in the summary info of the application
Any cluster word in word cluster result, then regard the sub- level tag along sort with any cluster word match as finger stator stage
Tag along sort.
In another embodiment, the device also includes:
Acquisition module 1001, is additionally operable to obtain and applies sectional drawing using at least one included in detail information;
3rd processing module, for that for each Zhang Yingyong sectional drawings, will be disassembled using sectional drawing as at least one image channel;
In each image channel that at least one image channel is included, positioning include word at least one is text filed;
3rd processing module, is additionally operable to at least one text filed carry out Text region, and the word recognized is entered
Row combination, obtains the Text region result of at least one application sectional drawing;It is that application generation removes specified one based on Text region result
Level tag along sort and refer to label information outside stator stage s tag along sorts.
Device provided in an embodiment of the present invention, the application submitted for application developer, the embodiment of the present invention can be certainly
The application detail information based on the application is moved to determine that this applies affiliated first-level class label;Next, in order to set up level
The label system of change, can also further carry out the screening of summary info in this applies detail information, and based on prestoring
Word cluster result, Keywords matching is carried out to the summary info of the application, so as to be stamped for the application in first-level class label
Lower affiliated sub- level tag along sort, because the generating process of above-mentioned label information is substantially completely automated, therefore can save a large amount of
Manpower and the time, it is more intelligent.Further, since more can detailed and comprehensively characterize the tool of an application using detail information
Body function and effect, therefore the label information of detail information generation is more accurate and the other coverage of domain class is good based on applying.
It should be noted that:Above-described embodiment provide application label information generating means generate label information when,
Only with the division progress of above-mentioned each functional module for example, in practical application, as needed can distribute above-mentioned functions
Completed by different functional modules, i.e., the internal structure of device is divided into different functional modules, it is described above to complete
All or part of function.In addition, the label information generating means for the application that above-described embodiment is provided and the label information of application
Generation method embodiment belongs to same design, and it implements process and refers to embodiment of the method, repeats no more here.
Figure 11 is a kind of server according to an exemplary embodiment, and the server can be used for implementing any of the above-described
The label information generation method of application shown in exemplary embodiment.Specifically:Referring to Figure 11, the server 1100 can because with
Put or performance is different and produce than larger difference, one or more central processing units (Central can be included
Process ing Unit, CPU) 1122 (for example, one or more processors) and memory 1132, one or one with
The storage medium 1130 (such as one or more mass memory units) of upper storage application program 1142 or data 1144.Its
In, memory 1132 and storage medium 1130 can be of short duration storage or persistently storage.It is stored in the program of storage medium 1130
One or more modules can be included (diagram is not marked).
Server 1100 can also include one or more power supplys 1128, one or more wired or wireless nets
Network interface 1150, one or more input/output interfaces 1158, and/or, one or more operating systems 1141, example
Such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..One or more than one journey
Sequence is stored in memory, and is configured to by one or more than one computing device said one or more than one journey
Sequence includes the instruction for the label information generation for being used for performing application.
One of ordinary skill in the art will appreciate that realizing that all or part of step of above-described embodiment can be by hardware
To complete, the hardware of correlation can also be instructed to complete by program, described program can be stored in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only storage, disk or CD etc..
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.
Claims (15)
1. the label information generation method of a kind of application, it is characterised in that methods described includes:
Acquisition application detail information, the application detail information is used for the application to a submission and carries out functional characteristic description;
Based on the application detail information, determined at least two first-level class labels prestored belonging to the application
Specify first-level class label;
Information sifting processing is carried out to the application detail information, the summary info of the application is obtained;
Based on the word cluster result prestored, Keywords matching is carried out to the summary info of the application, based on what is obtained
Matching result, determines the specified sub- level tag along sort belonging to the application, the word under the specified first-level class label
Cluster result is that the summary info progress word cluster processing that application is had been filed on to preset number is obtained.
2. according to the method described in claim 1, it is characterised in that described to be based on the application detail information, prestoring
At least two first-level class labels in determine specified first-level class label belonging to the application, including:
Based on the domain classification model pre-established and the application detail information of the application, the field point to the application is obtained
Class result, the domain classification result include at least two first-level class label belonging to the application each one
The probability score of level tag along sort;
In at least two first-level class label, probability score highest at least one first-level class label is filtered out, will
At least one first-level class label of the probability score highest is defined as the specified first-level class label.
3. method according to claim 2, it is characterised in that methods described also includes:
For each first-level class label at least two first-level class label, the preliminary belonging one-level is obtained
At least one application of tag along sort;
Obtain manual sort's annotation results at least one application;
Based on manual sort's annotation results, the training for carrying out model training is filtered out at least one described application
Sample, the training sample is the application for being defined as the belonging first-level class label again after manual sort;
Word segmentation processing is carried out to the application detail information of the training sample of each belonging first-level class label;
Obtained word segmentation result and the corresponding first-level class label of the word segmentation result are arrived into specific training with specified format storage
In text;
Model training is carried out based on text classification tool function and the specific training text, training pattern is obtained;
Cross-beta is carried out to the training pattern, until the nicety of grading of obtained training pattern meets preparatory condition, obtained
The domain classification model.
4. according to the method described in claim 1, it is characterised in that described that the application detail information is carried out at information sifting
Reason, obtains the summary info of the application, including:
The application recommended information included in the application detail information is cut at least two short sentences;
Calculate the similarity between any two short sentence at least two short sentence;
According to the similarity between any two short sentence, the important journey of each short sentence at least two short sentence is calculated
Angle value;
The importance value of each short sentence is ranked up according to order from big to small, based on obtained ranking results,
The short sentence of specifying number come above is filtered out at least two short sentence;
According to the sequencing occurred in the application recommended information, a short sentence that specifies number is combined processing,
Obtain the summary info of the application.
5. the method according to any claim in Claims 1-4, it is characterised in that methods described also includes:
For each first-level class label, the sub- level tag along sort included under the first-level class label is obtained;
Based on the first-level class label and the sub- level tag along sort and the first-level class label and the sub- fraction
The term vector model of class tag match, obtains the cluster word matched with the first-level class label and the sub- level tag along sort
Language;
Obtained whole cluster words are combined processing, the word cluster result is obtained.
6. method according to claim 5, it is characterised in that methods described also includes:
According to affiliated first-level class label, the summary info that application is had been filed on to the preset number carries out division processing,
Obtain the training summary info of each belonging first-level class label;
Training summary info based on each belonging first-level class label carries out model training, obtains and each fraction
The term vector model of class tag match.
7. method according to claim 5, it is characterised in that described based on the word cluster result prestored, to institute
The summary info for stating application carries out Keywords matching, based on obtained matching result, under the specified first-level class label really
Specified sub- level tag along sort belonging to the fixed application, including:
The summary info of the application is matched with the cluster word that the word cluster result includes, described is obtained
With result;
If the matching result indicates to include any cluster word in the word cluster result in the summary info of the application
Language, then regard the sub- level tag along sort with any cluster word match as the specified sub- level tag along sort.
8. the method according to any claim in Claims 1-4, it is characterised in that methods described also includes:
Obtain at least one included in the application detail information and apply sectional drawing;
For each Zhang Yingyong sectional drawings, the application sectional drawing is disassembled as at least one image channel;
In each image channel that at least one described image channel is included, positioning includes at least one text area of word
Domain;
To at least one described text filed carry out Text region, and the word recognized is combined, obtain it is described at least
The Text region result of one application sectional drawing;
It is that the application generation removes the specified first-level class label and the finger stator stage based on the Text region result
Label information outside tag along sort.
9. the label information generating means of a kind of application, it is characterised in that described device includes:
Acquisition module, detail information is applied for obtaining, and the application detail information is used for the application to a submission and carries out work(
Can characteristic description;
First processing module, for based on the application detail information, at least two first-level class labels prestored
Determine the specified first-level class label belonging to the application;
Screening module, for carrying out information sifting processing to the application detail information, obtains the summary info of the application;
Second processing module, for based on the word cluster result prestored, key to be carried out to the summary info of the application
Word is matched, and based on obtained matching result, the finger stator stage belonging to the application is determined under the specified first-level class label
Tag along sort, the word cluster result is that the summary info progress word cluster that application is had been filed on to preset number is handled
Arrive.
10. device according to claim 9, it is characterised in that the acquisition module, is additionally operable to for described at least two
Each first-level class label in first-level class label, at least one of the preliminary belonging first-level class label of acquisition should
With;Obtain manual sort's annotation results at least one application;
The first processing module, is additionally operable to be based on manual sort's annotation results, is screened at least one described application
Go out the training sample for carrying out model training, the training sample is belonging described to be defined as again after manual sort
The application of first-level class label;
The first processing module, is additionally operable to the application detail information of the training sample to each belonging first-level class label
Carry out word segmentation processing;Obtained word segmentation result and the corresponding first-level class label of the word segmentation result are arrived with specified format storage
In specific training text;
The first processing module, is additionally operable to carry out model instruction based on text classification tool function and the specific training text
Practice, obtain training pattern;Cross-beta is carried out to the training pattern, until the nicety of grading of obtained training pattern meets pre-
If condition, the domain classification model is obtained.
11. device according to claim 9, it is characterised in that the screening module, for by the application detail information
In the application recommended information that includes be cut at least two short sentences;Calculate at least two short sentence between any two short sentence
Similarity;According to the similarity between any two short sentence, each short sentence at least two short sentence is calculated
Importance value;The importance value of each short sentence is ranked up according to order from big to small, based on obtained sequence
As a result, the short sentence of specifying number come above is filtered out at least two short sentence;According in the application reference
The sequencing occurred in breath, is combined processing by a short sentence that specifies number, obtains the summary info of the application.
12. the device according to any claim in claim 9 to 11, it is characterised in that the Second processing module,
It is additionally operable to, for each first-level class label, obtain the sub- level tag along sort included under the first-level class label;Based on institute
State first-level class label and the sub- level tag along sort, matched with the first-level class label and the sub- level tag along sort
Term vector model, obtain the cluster word that is matched with the first-level class label and the sub- level tag along sort;It will obtain
Whole cluster words be combined processing, obtain the word cluster result.
13. device according to claim 12, it is characterised in that the Second processing module, is additionally operable to according to affiliated
First-level class label, the summary info that application is had been filed on to the preset number carries out division processing, obtains belonging each
The training summary info of individual first-level class label;Training summary info based on each belonging first-level class label carries out mould
Type training, obtains the term vector model with each first-level class tag match.
14. device according to claim 12, it is characterised in that the Second processing module, for by the application
The cluster word that summary info includes with the word cluster result is matched, and obtains the matching result;If described
Any cluster word in the word cluster result is included in the summary info that the application is indicated with result, then will with it is described
The sub- level tag along sort of any cluster word match is used as the specified sub- level tag along sort.
15. the device according to any claim in claim 9 to 11, it is characterised in that described device also includes:
The acquisition module, is additionally operable to obtain at least one included in the application detail information and applies sectional drawing;
3rd processing module, for for each Zhang Yingyong sectional drawings, the application sectional drawing to be disassembled as at least one image channel;
In each image channel that at least one described image channel is included, positioning include word at least one is text filed;
3rd processing module, is additionally operable to at least one described text filed carry out Text region, and the text to recognizing
Word is combined, and obtains the Text region result of at least one application sectional drawing;It is described based on the Text region result
Using label information of the generation in addition to the specified first-level class label and the specified sub- level tag along sort.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710279297.3A CN107169049B (en) | 2017-04-25 | 2017-04-25 | Application tag information generation method and device |
PCT/CN2018/081559 WO2018196561A1 (en) | 2017-04-25 | 2018-04-02 | Label information generating method and device for application and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710279297.3A CN107169049B (en) | 2017-04-25 | 2017-04-25 | Application tag information generation method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107169049A true CN107169049A (en) | 2017-09-15 |
CN107169049B CN107169049B (en) | 2023-04-28 |
Family
ID=59813423
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710279297.3A Active CN107169049B (en) | 2017-04-25 | 2017-04-25 | Application tag information generation method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107169049B (en) |
WO (1) | WO2018196561A1 (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108205674A (en) * | 2017-12-22 | 2018-06-26 | 广州爱美互动网络科技有限公司 | Content identification method, electronic equipment, storage medium and the system of social APP |
CN108280202A (en) * | 2018-01-30 | 2018-07-13 | 湖南蚁坊软件股份有限公司 | A kind of real-time streams label frame of dynamic scalable |
CN108563722A (en) * | 2018-04-03 | 2018-09-21 | 有米科技股份有限公司 | Trade classification method, system, computer equipment and the storage medium of text message |
CN108595660A (en) * | 2018-04-28 | 2018-09-28 | 腾讯科技(深圳)有限公司 | Label information generation method, device, storage medium and the equipment of multimedia resource |
WO2018196561A1 (en) * | 2017-04-25 | 2018-11-01 | 腾讯科技(深圳)有限公司 | Label information generating method and device for application and storage medium |
CN108764141A (en) * | 2018-05-25 | 2018-11-06 | 广州虎牙信息科技有限公司 | A kind of scene of game describes method, apparatus, equipment and its storage medium |
CN108764007A (en) * | 2018-02-10 | 2018-11-06 | 集智学园(北京)科技有限公司 | Based on OCR with text analysis technique to the measurement method of attention |
CN108763194A (en) * | 2018-04-27 | 2018-11-06 | 广州优视网络科技有限公司 | Using mark stamp methods, device, storage medium and computer equipment |
CN109657574A (en) * | 2018-12-05 | 2019-04-19 | 深圳市子瑜杰恩科技有限公司 | The stage property classification method and Related product of short-sighted frequency |
CN109784368A (en) * | 2018-12-11 | 2019-05-21 | 同盾控股有限公司 | A kind of determination method and apparatus of application program classification |
CN109795942A (en) * | 2019-01-17 | 2019-05-24 | 杭州海康睿和物联网技术有限公司 | Staircase control system, staircase monitoring device and its intelligent control method |
CN110019663A (en) * | 2017-09-30 | 2019-07-16 | 北京国双科技有限公司 | A kind of method for pushing, system, storage medium and the processor of case information |
CN110069769A (en) * | 2018-01-22 | 2019-07-30 | 腾讯科技(深圳)有限公司 | Using label generating method, device and storage equipment |
CN110427542A (en) * | 2018-04-26 | 2019-11-08 | 北京市商汤科技开发有限公司 | Sorter network training and data mask method and device, equipment, medium |
CN110532394A (en) * | 2019-09-11 | 2019-12-03 | 携程计算机技术(上海)有限公司 | The processing method and system of Order Remarks text |
CN110781292A (en) * | 2018-07-25 | 2020-02-11 | 百度在线网络技术(北京)有限公司 | Text data multi-level classification method and device, electronic equipment and storage medium |
CN110910175A (en) * | 2019-11-26 | 2020-03-24 | 上海景域文化传播股份有限公司 | Tourist ticket product portrait generation method |
CN110909157A (en) * | 2018-09-18 | 2020-03-24 | 阿里巴巴集团控股有限公司 | Text classification method and device, computing equipment and readable storage medium |
CN111079376A (en) * | 2019-11-14 | 2020-04-28 | 贝壳技术有限公司 | Data labeling method, device, medium and electronic equipment |
CN111694962A (en) * | 2019-03-15 | 2020-09-22 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN112506556A (en) * | 2020-11-19 | 2021-03-16 | 杭州云深科技有限公司 | Application program classification method and device, computer equipment and storage medium |
CN112565250A (en) * | 2020-12-04 | 2021-03-26 | 中国移动通信集团内蒙古有限公司 | Website identification method, device, equipment and storage medium |
WO2021092871A1 (en) * | 2019-11-13 | 2021-05-20 | 北京数字联盟网络科技有限公司 | Application preference text classification method based on textrank |
CN115688107A (en) * | 2022-12-28 | 2023-02-03 | 卓望数码技术(深圳)有限公司 | Fraud-related APP detection system and method |
CN117725515A (en) * | 2024-02-07 | 2024-03-19 | 北京肿瘤医院(北京大学肿瘤医院) | Quality classification method, system, storage medium and product for clinical test of medicine |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111858843B (en) * | 2019-04-30 | 2023-12-05 | 北京嘀嘀无限科技发展有限公司 | Text classification method and device |
CN112016582B (en) * | 2019-05-31 | 2023-11-24 | 口口相传(北京)网络技术有限公司 | Dish recommending method and device |
CN112528073A (en) * | 2019-09-03 | 2021-03-19 | 北京国双科技有限公司 | Video generation method and device |
CA3063243A1 (en) * | 2019-11-13 | 2021-05-13 | Beijing Digital Union Web Science And Technology Company Limited | An application preference text classification method based on textrank |
CN111026908B (en) * | 2019-12-10 | 2023-09-08 | 腾讯科技(深圳)有限公司 | Song label determining method, device, computer equipment and storage medium |
CN111353050A (en) * | 2019-12-27 | 2020-06-30 | 北京合力亿捷科技股份有限公司 | Word stock construction method and tool in vertical field of telecommunication customer service |
CN111753060B (en) * | 2020-07-29 | 2023-09-26 | 腾讯科技(深圳)有限公司 | Information retrieval method, apparatus, device and computer readable storage medium |
CN112015898B (en) * | 2020-08-28 | 2023-11-21 | 支付宝(杭州)信息技术有限公司 | Model training and text label determining method and device based on label tree |
CN112597295B (en) * | 2020-12-03 | 2024-02-02 | 京东科技控股股份有限公司 | Digest extraction method, digest extraction device, computer device, and storage medium |
CN112784911B (en) * | 2021-01-29 | 2024-01-19 | 北京百度网讯科技有限公司 | Training sample generation method and device, electronic equipment and storage medium |
CN112905743B (en) * | 2021-02-20 | 2023-08-01 | 北京百度网讯科技有限公司 | Text object detection method, device, electronic equipment and storage medium |
WO2023178205A1 (en) * | 2022-03-16 | 2023-09-21 | Aviagames, Inc. | Automated computer game application classification based on a mixed effects model |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101620596A (en) * | 2008-06-30 | 2010-01-06 | 东北大学 | Multi-document auto-abstracting method facing to inquiry |
US20110103682A1 (en) * | 2009-10-29 | 2011-05-05 | Xerox Corporation | Multi-modality classification for one-class classification in social networks |
CN102609539A (en) * | 2012-02-16 | 2012-07-25 | 北京搜狗信息服务有限公司 | Search method and search system |
CN103324628A (en) * | 2012-03-21 | 2013-09-25 | 腾讯科技(深圳)有限公司 | Industry classification method and system for text publishing |
CN104021185A (en) * | 2014-06-11 | 2014-09-03 | 北京奇虎科技有限公司 | Method and device for identifying information attributes of data in web pages |
CN104750754A (en) * | 2013-12-31 | 2015-07-01 | 北龙中网(北京)科技有限责任公司 | Website industry classification method and server |
CN104834735A (en) * | 2015-05-18 | 2015-08-12 | 大连理工大学 | Automatic document summarization extraction method based on term vectors |
CN105488021A (en) * | 2014-09-15 | 2016-04-13 | 华为技术有限公司 | Method and device generating multi-file summary |
CN105787025A (en) * | 2016-02-24 | 2016-07-20 | 腾讯科技(深圳)有限公司 | Network platform public account classifying method and device |
CN106227722A (en) * | 2016-09-12 | 2016-12-14 | 中山大学 | A kind of extraction method based on listed company's bulletin summary |
CN106453033A (en) * | 2016-08-31 | 2017-02-22 | 电子科技大学 | Multilevel Email classification method based on Email content |
CN106484266A (en) * | 2016-10-18 | 2017-03-08 | 北京锤子数码科技有限公司 | A kind of text handling method and device |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101763367A (en) * | 2008-12-08 | 2010-06-30 | 新奥特硅谷视频技术有限责任公司 | Method and device for setting file labels |
CN101609450A (en) * | 2009-04-10 | 2009-12-23 | 南京邮电大学 | Web page classification method based on training set |
CN107169049B (en) * | 2017-04-25 | 2023-04-28 | 腾讯科技(深圳)有限公司 | Application tag information generation method and device |
-
2017
- 2017-04-25 CN CN201710279297.3A patent/CN107169049B/en active Active
-
2018
- 2018-04-02 WO PCT/CN2018/081559 patent/WO2018196561A1/en active Application Filing
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101620596A (en) * | 2008-06-30 | 2010-01-06 | 东北大学 | Multi-document auto-abstracting method facing to inquiry |
US20110103682A1 (en) * | 2009-10-29 | 2011-05-05 | Xerox Corporation | Multi-modality classification for one-class classification in social networks |
CN102609539A (en) * | 2012-02-16 | 2012-07-25 | 北京搜狗信息服务有限公司 | Search method and search system |
CN103324628A (en) * | 2012-03-21 | 2013-09-25 | 腾讯科技(深圳)有限公司 | Industry classification method and system for text publishing |
CN104750754A (en) * | 2013-12-31 | 2015-07-01 | 北龙中网(北京)科技有限责任公司 | Website industry classification method and server |
CN104021185A (en) * | 2014-06-11 | 2014-09-03 | 北京奇虎科技有限公司 | Method and device for identifying information attributes of data in web pages |
CN105488021A (en) * | 2014-09-15 | 2016-04-13 | 华为技术有限公司 | Method and device generating multi-file summary |
CN104834735A (en) * | 2015-05-18 | 2015-08-12 | 大连理工大学 | Automatic document summarization extraction method based on term vectors |
CN105787025A (en) * | 2016-02-24 | 2016-07-20 | 腾讯科技(深圳)有限公司 | Network platform public account classifying method and device |
CN106453033A (en) * | 2016-08-31 | 2017-02-22 | 电子科技大学 | Multilevel Email classification method based on Email content |
CN106227722A (en) * | 2016-09-12 | 2016-12-14 | 中山大学 | A kind of extraction method based on listed company's bulletin summary |
CN106484266A (en) * | 2016-10-18 | 2017-03-08 | 北京锤子数码科技有限公司 | A kind of text handling method and device |
Non-Patent Citations (2)
Title |
---|
何伟骏: "基于层次—互斥模型的多标签分类算法的研究与应用" * |
李杨: "分类学术文献搜索引擎的应用和研究" * |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018196561A1 (en) * | 2017-04-25 | 2018-11-01 | 腾讯科技(深圳)有限公司 | Label information generating method and device for application and storage medium |
CN110019663A (en) * | 2017-09-30 | 2019-07-16 | 北京国双科技有限公司 | A kind of method for pushing, system, storage medium and the processor of case information |
CN108205674B (en) * | 2017-12-22 | 2022-04-15 | 广州爱美互动网络科技有限公司 | Social APP content identification method, electronic device, storage medium and system |
CN108205674A (en) * | 2017-12-22 | 2018-06-26 | 广州爱美互动网络科技有限公司 | Content identification method, electronic equipment, storage medium and the system of social APP |
CN110069769A (en) * | 2018-01-22 | 2019-07-30 | 腾讯科技(深圳)有限公司 | Using label generating method, device and storage equipment |
CN110069769B (en) * | 2018-01-22 | 2023-05-02 | 腾讯科技(深圳)有限公司 | Application label generation method and device and storage device |
CN108280202A (en) * | 2018-01-30 | 2018-07-13 | 湖南蚁坊软件股份有限公司 | A kind of real-time streams label frame of dynamic scalable |
CN108280202B (en) * | 2018-01-30 | 2020-10-30 | 湖南蚁坊软件股份有限公司 | Dynamic extensible real-time flow label system |
CN108764007A (en) * | 2018-02-10 | 2018-11-06 | 集智学园(北京)科技有限公司 | Based on OCR with text analysis technique to the measurement method of attention |
CN108563722A (en) * | 2018-04-03 | 2018-09-21 | 有米科技股份有限公司 | Trade classification method, system, computer equipment and the storage medium of text message |
CN110427542A (en) * | 2018-04-26 | 2019-11-08 | 北京市商汤科技开发有限公司 | Sorter network training and data mask method and device, equipment, medium |
CN108763194A (en) * | 2018-04-27 | 2018-11-06 | 广州优视网络科技有限公司 | Using mark stamp methods, device, storage medium and computer equipment |
CN108763194B (en) * | 2018-04-27 | 2022-09-27 | 阿里巴巴(中国)有限公司 | Method and device for applying label labeling, storage medium and computer equipment |
CN108595660A (en) * | 2018-04-28 | 2018-09-28 | 腾讯科技(深圳)有限公司 | Label information generation method, device, storage medium and the equipment of multimedia resource |
CN108764141A (en) * | 2018-05-25 | 2018-11-06 | 广州虎牙信息科技有限公司 | A kind of scene of game describes method, apparatus, equipment and its storage medium |
CN108764141B (en) * | 2018-05-25 | 2021-07-02 | 广州虎牙信息科技有限公司 | Game scene description method, device, equipment and storage medium thereof |
CN110781292A (en) * | 2018-07-25 | 2020-02-11 | 百度在线网络技术(北京)有限公司 | Text data multi-level classification method and device, electronic equipment and storage medium |
CN110909157B (en) * | 2018-09-18 | 2023-04-11 | 阿里巴巴集团控股有限公司 | Text classification method and device, computing equipment and readable storage medium |
CN110909157A (en) * | 2018-09-18 | 2020-03-24 | 阿里巴巴集团控股有限公司 | Text classification method and device, computing equipment and readable storage medium |
CN109657574A (en) * | 2018-12-05 | 2019-04-19 | 深圳市子瑜杰恩科技有限公司 | The stage property classification method and Related product of short-sighted frequency |
CN109784368A (en) * | 2018-12-11 | 2019-05-21 | 同盾控股有限公司 | A kind of determination method and apparatus of application program classification |
CN109795942A (en) * | 2019-01-17 | 2019-05-24 | 杭州海康睿和物联网技术有限公司 | Staircase control system, staircase monitoring device and its intelligent control method |
CN111694962A (en) * | 2019-03-15 | 2020-09-22 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN110532394A (en) * | 2019-09-11 | 2019-12-03 | 携程计算机技术(上海)有限公司 | The processing method and system of Order Remarks text |
CN110532394B (en) * | 2019-09-11 | 2023-04-07 | 携程计算机技术(上海)有限公司 | Order remark text processing method and system |
WO2021092871A1 (en) * | 2019-11-13 | 2021-05-20 | 北京数字联盟网络科技有限公司 | Application preference text classification method based on textrank |
CN111079376A (en) * | 2019-11-14 | 2020-04-28 | 贝壳技术有限公司 | Data labeling method, device, medium and electronic equipment |
CN111079376B (en) * | 2019-11-14 | 2021-04-16 | 北京房江湖科技有限公司 | Data labeling method, device, medium and electronic equipment |
CN110910175A (en) * | 2019-11-26 | 2020-03-24 | 上海景域文化传播股份有限公司 | Tourist ticket product portrait generation method |
CN110910175B (en) * | 2019-11-26 | 2023-07-28 | 上海景域文化传播股份有限公司 | Image generation method for travel ticket product |
CN112506556A (en) * | 2020-11-19 | 2021-03-16 | 杭州云深科技有限公司 | Application program classification method and device, computer equipment and storage medium |
CN112506556B (en) * | 2020-11-19 | 2023-08-25 | 杭州云深科技有限公司 | Application program classification method, device, computer equipment and storage medium |
CN112565250B (en) * | 2020-12-04 | 2022-12-06 | 中国移动通信集团内蒙古有限公司 | Website identification method, device, equipment and storage medium |
CN112565250A (en) * | 2020-12-04 | 2021-03-26 | 中国移动通信集团内蒙古有限公司 | Website identification method, device, equipment and storage medium |
CN115688107A (en) * | 2022-12-28 | 2023-02-03 | 卓望数码技术(深圳)有限公司 | Fraud-related APP detection system and method |
CN117725515A (en) * | 2024-02-07 | 2024-03-19 | 北京肿瘤医院(北京大学肿瘤医院) | Quality classification method, system, storage medium and product for clinical test of medicine |
Also Published As
Publication number | Publication date |
---|---|
CN107169049B (en) | 2023-04-28 |
WO2018196561A1 (en) | 2018-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107169049A (en) | The label information generation method and device of application | |
EP3866026A1 (en) | Theme classification method and apparatus based on multimodality, and storage medium | |
CN111026842B (en) | Natural language processing method, natural language processing device and intelligent question-answering system | |
Li et al. | Localizing and quantifying damage in social media images | |
CN109960800A (en) | Weakly supervised file classification method and device based on Active Learning | |
Hoque et al. | Real time bangladeshi sign language detection using faster r-cnn | |
CN106886580B (en) | Image emotion polarity analysis method based on deep learning | |
CN112287157A (en) | Automatic detection of user-requested objects in an image | |
CN112270196A (en) | Entity relationship identification method and device and electronic equipment | |
CN109599187A (en) | A kind of online interrogation point examines method, server, terminal, equipment and medium | |
CN110232112A (en) | Keyword extracting method and device in article | |
CN107436916B (en) | Intelligent answer prompting method and device | |
CN113051914A (en) | Enterprise hidden label extraction method and device based on multi-feature dynamic portrait | |
CN109582788A (en) | Comment spam training, recognition methods, device, equipment and readable storage medium storing program for executing | |
CN106227836B (en) | Unsupervised joint visual concept learning system and unsupervised joint visual concept learning method based on images and characters | |
CN113157859A (en) | Event detection method based on upper concept information | |
CN110008365A (en) | A kind of image processing method, device, equipment and readable storage medium storing program for executing | |
CN110516259A (en) | A kind of recognition methods, device, computer equipment and the storage medium of key problem in technology word | |
CN109657096A (en) | A kind of ancillary statistics report-generating method based on teaching of low school age audio-video | |
CN114661951A (en) | Video processing method and device, computer equipment and storage medium | |
CN113806574A (en) | Software and hardware integrated artificial intelligent image recognition data processing method | |
CN116341519A (en) | Event causal relation extraction method, device and storage medium based on background knowledge | |
Shaharabany et al. | Similarity maps for self-training weakly-supervised phrase grounding | |
CN112800259B (en) | Image generation method and system based on edge closure and commonality detection | |
CN114708462A (en) | Method, system, device and storage medium for generating detection model for multi-data training |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |