CN107169049A - The label information generation method and device of application - Google Patents

The label information generation method and device of application Download PDF

Info

Publication number
CN107169049A
CN107169049A CN201710279297.3A CN201710279297A CN107169049A CN 107169049 A CN107169049 A CN 107169049A CN 201710279297 A CN201710279297 A CN 201710279297A CN 107169049 A CN107169049 A CN 107169049A
Authority
CN
China
Prior art keywords
application
level
class label
level class
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710279297.3A
Other languages
Chinese (zh)
Other versions
CN107169049B (en
Inventor
何泉昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201710279297.3A priority Critical patent/CN107169049B/en
Publication of CN107169049A publication Critical patent/CN107169049A/en
Priority to PCT/CN2018/081559 priority patent/WO2018196561A1/en
Application granted granted Critical
Publication of CN107169049B publication Critical patent/CN107169049B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses the label information generation method and device of a kind of application, belong to technical field of data processing.Method includes:Acquisition application detail information, this is used for the application to a submission using detail information and carries out functional characteristic description;Based on detail information is applied, determine to apply affiliated specified first-level class label at least two first-level class labels prestored;Information sifting processing, the summary info being applied are carried out to application detail information;Based on the word cluster result prestored, Keywords matching is carried out to the summary info of application, based on obtained matching result, determines to apply affiliated specified sub- level tag along sort in the case where specifying first-level class label.Because the generating process of label information is substantially completely automated, therefore substantial amounts of manpower and time can be saved, it is more intelligent.Further, since a concrete function applied and effect more can detailed and be comprehensively characterized using detail information, therefore the label information of generation is more accurate.

Description

The label information generation method and device of application
Technical field
The present invention relates to technical field of data processing, the label information generation method and device of more particularly to a kind of application.
Background technology
With the continuous progress of society, the smart electronicses product such as smart mobile phone progressively becomes in people's daily life Indispensability with oneself instrument.At present in order to strengthen Consumer's Experience, it is easy to user fast to carry out types of applications by smart electronicses product Download, software developer, which have developed, can provide the application resource management platform that abundant application resource is downloaded.Wherein, apply Resource management platform is essentially a application management software, and it applies the different type of magnanimity the mark according to each application Label information is classified, such as tourism trip, social communication, financing and shopping etc. is categorized as, to facilitate user in magnanimity application In quickly search a certain application and be downloaded.Wherein, label information be used for application be identified, with screen application function and Effect etc..
From above-mentioned narration, the label information of application is particularly significant for subsequent steps such as application class, and it is application Where one core of resource management platform exploitation, therefore how to generate the label information always art technology of each application One focus of personnel's concern.And prior art relies on manually to complete when for application generation label information, typically.Than Such as, application developer to application resource management platform submit one apply when, can for this application carry out a label The remarks of information, the developer of such application resource management platform side is just this application generation label letter according to this remarks Breath.Wherein, when generating label information, if label system includes the label information of many levels, the label letter generated In breath in addition to this applies affiliated first-level class label, in addition to this applies affiliated son under the first-level class label Level tag along sort.Each application submitted carries out the generation of label information in the manner described above, and then according to this generation Label information complete application classification.
During the present invention is realized, inventor has found that prior art at least has problems with:
Because label information depends on manually generated, and the application magnanimity of application resource management platform, therefore this kind The generating mode of label information can consume substantial amounts of manpower and time, not intelligent enough;Further, since label information is relied primarily on In the remarks of the developer of application, and the usual accuracy of this remarks is poor, therefore the label information of generation may be present Not accurate enough the and not good enough defect of the other coverage of domain class.
The content of the invention
In order to solve problem of the prior art, the embodiments of the invention provide a kind of label information generation method of application and Device.The technical scheme is as follows:
First aspect includes there is provided a kind of label information generation method of application, methods described:
Acquisition application detail information, the application detail information is used to retouch the application progress functional characteristic of a submission State;
Based on the application detail information, the application institute is determined at least two first-level class labels prestored The specified first-level class label of category;
To carrying out information sifting processing in the application detail information, the summary info of the application is obtained;
Based on the word cluster result prestored, Keywords matching is carried out to the summary info of the application, based on The matching result arrived, determines the specified sub- level tag along sort belonging to the application under the specified first-level class label, described Word cluster result is that the summary info progress word cluster processing that application is had been filed on to preset number is obtained.
Second aspect includes there is provided a kind of label information generating means of application, described device:
Acquisition module, detail information is applied for obtaining, and the application detail information is used for the application to a submission and entered Row functional characteristic is described;
First processing module, for based on the application detail information, at least two first-level class marks prestored The specified first-level class label belonging to the application is determined in label;
Screening module, for carrying out information sifting processing in the application detail information, obtaining the summary of the application Information;
Second processing module, for based on the word cluster result prestored, being carried out to the summary info of the application Keywords matching, based on obtained matching result, determines specifying belonging to the application under the specified first-level class label Sub- level tag along sort, the word cluster result is that the summary info that application is had been filed on to preset number is carried out at word cluster What reason was obtained.
The beneficial effect that technical scheme provided in an embodiment of the present invention is brought is:
The application submitted for application developer, the embodiment of the present invention can be automatically based upon the application details letter of the application Cease to determine that this applies affiliated first-level class label;Next, the label system in order to set up stratification, can also be in the application The screening of summary info is further carried out in detail information, and based on the word cluster result prestored, the application is plucked Information is wanted to carry out Keywords matching, so that the sub- level tag along sort belonging to being stamped for the application under first-level class label, due to The generating process of above-mentioned label information is substantially completely automated, therefore can save substantial amounts of manpower and time, more intelligent.This Outside, because application detail information more can detailed and comprehensively characterize a concrete function characteristic applied and effect, therefore it is based on The label information generated using detail information is more accurate and the other coverage of domain class is good.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, makes required in being described below to embodiment Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for For those of ordinary skill in the art, on the premise of not paying creative work, other can also be obtained according to these accompanying drawings Accompanying drawing.
Fig. 1 is a kind of configuration diagram of the label information generation method of application provided in an embodiment of the present invention;
Fig. 2 is a kind of level schematic diagram of tag along sort provided in an embodiment of the present invention;
Fig. 3 is a kind of level schematic diagram of tag along sort provided in an embodiment of the present invention;
Fig. 4 A are a kind of flow charts of the label information generation method of application provided in an embodiment of the present invention;
Fig. 4 B are a kind of label information generating process schematic diagrames of application provided in an embodiment of the present invention;
Fig. 5 is a kind of flow chart of the label information generation method of application provided in an embodiment of the present invention;
Fig. 6 is a kind of composition schematic diagram of label system provided in an embodiment of the present invention;
Fig. 7 is a kind of schematic diagram of the sample data of application sectional drawing provided in an embodiment of the present invention;
Fig. 8 is a kind of text filed schematic diagram of application sectional drawing provided in an embodiment of the present invention;
Fig. 9 A are a kind of schematic diagrames of application sectional drawing provided in an embodiment of the present invention;
Fig. 9 B are a kind of overall flow schematic diagrams of label information generation provided in an embodiment of the present invention;
Figure 10 is a kind of structural representation of the label information generating means of application provided in an embodiment of the present invention;
Figure 11 is a kind of structural representation of server provided in an embodiment of the present invention.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention Formula is described in further detail.
To the embodiment of the present invention carry out explanation is explained in detail before, first to the present embodiments relate to some names Word, and the application scenarios and system architecture of the embodiment of the present invention are simply introduced.
Using:The software program being arranged on the terminal devices such as smart mobile phone, tablet personal computer is referred to, it usually can A certain or multinomial particular job is completed, user model is operated in, can be interacted with user, typically with visual user circle Face.
Domain classification:, automatically will be using progress domain classification using the application detail information of application (APP).Wherein, lead Domain classification may include such as service for life, employment, home improvement, auto Life, booking service etc..And apply details letter Breath may include Apply Names and application recommended information etc..The embodiment of the present invention covers to field classification and using detail information Content be not especially limited.
Summary info is screened:The important sentence filtered out in the application detail information (mainly using recommended information) of application Son.That is, some are screened in application detail information (mainly using recommended information) knows that this application is specific for user Important sentence for function, the summary info that these important sentences screened are applied as this.
Word cluster:Using word2vector (term vector) method, the word feature under same tag along sort is converged.
OCR (Optical Character Recognition, optical character identification) picture character identification:Utilize OCR skills Art carries out Text region, and text information is identified from picture (spy of the embodiment of the present invention refers to applying sectional drawing).
It is well known that at present application developer after an application is developed, in order to this application can in crowd it is fast Speed popularization so that more users use the application, it will usually submit the application to application resource management platform.Submitting application When, generally require application developer synchronously submit some on this using art classification label information remarks, application Profile information etc..
Under the premise of this, the problem of usually coming at similar following therewith:One, individual other application developer is in order to allow oneself Application obtain higher light exposure, can exist and deliberately this is set to multiple using affiliated field classification, or even be set to Altogether irrelevant field classification, causes that the field classification of an application covering is excessive, and severe jamming domain class is other pure Degree.Two, the developer of application resource management platform side manually applies according to the remarks for this label information submitted for this Tagged information, not only wastes time and energy, and the label information of generation largely all has not enough accurate and field The coverage of classification or excessive or too small defect.In order to solve the above problems, the embodiment of the present invention proposes one kind certainly The method for moving the tagged information of application for submission.
Referring to Fig. 1, the label information generation method of application provided in an embodiment of the present invention is broadly divided into two parts, wherein One part is that the field classification of application differentiates process, and another part is the tag resolution process of application.
Wherein, for this part differentiated on field classification, it is based primarily upon natural language processing technique and application is carried out The differentiation of art classification.It is divided into three steps:Screen fraction has been filed on but does not take the embodiment of the present invention to carry first The application of the tagged information of method of confession, using these applications as training sample, obtains such as Apply Names of training sample And the application detail information of application recommended information;Next, training domain classification mould according to the application detail information got Type;Finally, preliminary category filter is carried out to each application of submission according to this field training pattern, is that each should With stamping first-level class label.
Another part is mainly used in the domain classification that each application to submission is become more meticulous.Wherein, in correspondence During with the domain classification become more meticulous, summary new screening, word2vector word clusters, OCR picture characters are mainly concerned with Identification technology etc..That is, domain classification differentiation is being carried out to an application, after the first-level class label for determining an application, also needed Determine that this applies affiliated sub- level tag along sort.In embodiments of the present invention, sub- level tag along sort mainly includes secondary classification Label and three-level tag along sort.Certainly, sub- level tag along sort may also include more subtly level Four tag along sort, or Pyatyi Tag along sort etc., the embodiment of the present invention is limited without specific this, only with including secondary classification label and three-level contingency table It is illustrated exemplified by label.
Wherein, 19, first-level class label, 118, secondary classification label, three-level tag along sort 923, these ranks Tag along sort constitutes tree structure label system that is complete and accurate, can covering all trades and professions and different crowd demand.
With reference to a small example, the relation between classifying to first-level class label, secondary classification label and three-level It is briefly described.Referring to Fig. 2, such as, two fractions as " booking service " can be included under first-level class label " life " Class label, and under secondary classification label " booking service " can include such as " film ticket ", " drama ticket ", " concert admission ticket " this The three-level tag along sort of sample.Again such as, referring to Fig. 3, can include under first-level class label " game classification " " online game " this The secondary classification label of sample, and such as " three states ", " Quadratic Finite Element ", " west can be included under secondary classification label " online game " Three-level tag along sort as trip ", " celestial chivalrous ".
To sum up, the embodiment of the present invention realizes the tagged letter of application submitted automatically for each based on application detail information Breath, compared to the manual tag information generating mode of prior art, can save substantial amounts of manpower and time.Further, since using Detail information more can be detailed and comprehensively characterizes a concrete function applied and effect, therefore based on application detail information generation Label information it is more accurate and the other coverage of domain class is good, be not in field classification coverage or excessive or too small Defect.Even if in addition, the field classification belonging to an application is deliberately set to multiple by application developer, or even being set to Complete incoherent field classification, takes label information generation method provided in an embodiment of the present invention, can also pass through such as field Disaggregated model etc. to carry out classification correction to this application, and correct tag along sort is stamped for it.
Fig. 4 A are a kind of flow charts of the label information generation method of application provided in an embodiment of the present invention.With reference to figure Label information product process figure shown in 4B carries out detailed to the label information generation method of application provided in an embodiment of the present invention Ground is illustrated.Referring to Fig. 4 A, method flow provided in an embodiment of the present invention includes:
Detail information is applied in 401a, acquisition, and this is used for the application to a submission using detail information and carries out functional characteristic Description, and detail information is applied based on this, determine this using affiliated at least two first-level class labels prestored Specify first-level class label.
In embodiments of the present invention, in order to carry out the data syn-chronization of the whole network in time, it will usually every periodically pulling daily The full dose information of one application.Wherein, the full dose information of an application is referred to applies related all information to one.Such as unite One carries out pulling for full dose information in daily zero point.Afterwards, the extraction of structured message is carried out in these full dose information, with Obtain the application detail information for each application submitted.Wherein, it may include Apply Names, download using detail information, comment Point, using recommended information, using sectional drawing etc. all to this apply related content, the embodiment of the present invention to this without specifically limit It is fixed.And after structural data extraction is carried out, can also be first to extracting for the ease of proceeding data processing in subsequent process The structural data arrived performs data prediction operation, such as removes noise, mess code or punctuation mark are filtered, to knot Content of text in structure data carries out word segmentation processing and filtering stop words etc..Wherein, stop words can such as, , etc. modal particle.
After above-mentioned processing is carried out, one is had been filed on and label information generation side provided in an embodiment of the present invention is not taken For the application of the tagged information of method, the embodiment of the present invention can first according to the application detail information of the application got, The differentiation of domain classification is carried out to the application, i.e., stamps a first-level class label first for the application.It should be noted that this Inventive embodiments expose to prevent one to apply under the classification not theed least concerned, can be to being done using affiliated first-level class label The classification of one institutionalize a so that application is at most only capable of being labeled with the first-level class label of at most two, i.e., one application At most it is only capable of exposing under two domain classifications.Such as, baby * * are bred using at most can be while belonging to " social activity ", " health " The two domain classifications.
In embodiments of the present invention, it is specifically by the application of the application when stamping first-level class label to an application In the good domain classification model of detail information input training in advance, this is provided using affiliated specified one by the domain classification model Level tag along sort.Wherein, with reference to Fig. 4 B, the training process of the domain classification model generally includes following several steps:
(a), for each the first-level class label prestored, preliminary belonging first-level class label is obtained extremely A few application.
Wherein, the embodiment of the present invention has prestored 19 first-level class labels altogether, will also apply affiliated domain class It is not divided into 19 classes altogether.It should be noted that at least one for obtaining the preliminary belonging first-level class label applies this sentence Words are meant that:It is because application developer can provide the domain classification of the application in one application of submission, i.e., simple to be answered to be somebody's turn to do With the remarks for carrying out a label information, so at least one application of a preliminary belonging first-level class label here, refers to It is generation, according to remarks of the application developer to each label information applied, each application is first preliminary according to this 19 necks Domain is classified, to construct training sample set.
Such as, for amounting to 19 first-level class labels, for each first-level class label, according to application and development Person collects the application detail information of 1000 applications under each affiliated first-level class label to the domain classification of application.That The application detail information of 19000 applications can be collected into altogether.
In addition, in the training process for carrying out domain classification model, what is mainly utilized includes using detail information Using recommended information.So the application detail information that subsequent step (b) is related into step (d) specifically refer mainly to generation be its bag The application recommended information contained.
(b) manual sort's annotation results at least one application, are obtained, based on manual sort's annotation results, at least The training sample for carrying out model training is filtered out in one application, the training sample is to be determined again after manual sort For the application of the belonging first-level class label.
Classify inaccurate because at least one application under each first-level class label in step (a) there may be Suspicion, therefore also need manually to mark at least one application under each tag along sort again.Such as, if one should It is correct with the domain classification belonging to being determined after manual sort verifies, i.e., its belonging first-level class label is determined again, Then mark 1;If an application determines affiliated domain classification mistake, the i.e. application substantially after artificial verify and is not belonging to The first-level class label, then mark 0.
Afterwards, for each first-level class label, according to manual sort's annotation results, in above-mentioned collection at least Classification correct application when those application developer preliminary classifications are filtered out in one application, should belonging to using these applications The training sample of first-level class label.In other words, for a first-level class label, the embodiment of the present invention only classifies those Correct application, which is placed in training sample set, to be trained, to lift the classification accuracy of domain classification model.Such structure The mode of training sample set is made, be may result in for 19 first-level class labels, the instruction under each tag along sort The number for practicing sample is all different.
(c) word segmentation processing, is carried out to the application detail information of the training sample of each belonging first-level class label, obtained To word segmentation result, obtained word segmentation result and the corresponding first-level class label of word segmentation result are arrived into specific instruction with specified format storage Practice in text.
After training sample is obtained, the embodiment of the present invention, first can be to belonging each in order to train domain classification model The application detail information of the training sample of individual first-level class label carries out word segmentation processing.Wherein, this hair when carrying out word segmentation processing Bright embodiment is directed to using the application recommended information included in detail information.
In embodiments of the present invention, it is main that this kind of natural language processing technique of participle instrument is increased income to application using jieba Recommended information carries out word segmentation processing.If it should be noted that the length of Apply Names it is long and need carry out word segmentation processing, that Apply Names can be also directed to when performing word segmentation processing.For this kind of situation, application name is just included in obtained word segmentation result Claim and using the word segmentation result of both recommended informations.
Wherein, jieba participle instruments of increasing income mainly support three kinds of participle patterns:One kind is accurate model, it is intended to by sentence Most accurately cut, be primarily adapted for use in text analyzing;Another is syntype, by sentence it is all can into word word all Scan and, although speed is very fast, but can not solve ambiguity problem;Last one kind is search engine pattern, in accurate mould On the basis of formula, to long word cutting again, recall rate is improved, is suitable for search engine participle.The embodiment of the present invention is base Word segmentation processing is carried out to application recommended information in a kind of last participle pattern.
After word segmentation processing is finished, text classification next is carried out using TextGrocery Open-Source Tools, to train field Disaggregated model, details content is as follows:It is " affiliated with specified format for each training sample in above-mentioned training sample First-level class label title+t+ word segmentation results ", store it in specific training text train.txt.That is, will instruction The word segmentation result for practicing the application of each in sample set is stored in train.txt according to above-mentioned specified format.In addition, in this hair Why it is because its classifying quality quality to short text is high and use using TextGrocery Open-Source Tools in bright embodiment It is convenient.It is of course also possible to carry out text classification, this hair with the functionally similar instrument of TextGrocery Open-Source Tools using other Bright embodiment is to this without specific restriction.
If it should be noted that word segmentation processing has been carried out in data preprocessing phase, then the step can be straight Connect and skip, the word segmentation result directly obtained using data preprocessing phase just may be used.
(d) model training, is carried out based on text classification tool function and specific training text, training pattern is obtained;Afterwards, Cross-beta is carried out to training pattern, until the nicety of grading of training pattern meets preparatory condition, domain classification model is obtained.
For the step, after specific training text train.txt is obtained, text classification tool function can be called Grocery.train () carries out model training, and text classification tool function grocery.train () input is just train.txt.In another embodiment, in order to ensure the classification accuracy of domain classification model trained, can also to The training pattern arrived carries out cross-beta, until the nicety of grading of obtained training pattern meets preparatory condition, now obtains Training pattern is just for the domain classification model needed for the embodiment of the present invention.
Wherein, cross-beta be meant that training sample set and test sample set exchange identity carry out model training with And model measurement.Such as, 4000 samples are used for model training and model measurement.Wherein, 3000 samples are instructed for model Practice, remaining 1000 samples are used for model measurement.Respectively with A, B and C to every 1000 samples in above-mentioned 3000 samples Originally it is identified, remaining 1000 samples is identified with D, then when performing cross-beta, if carries out model training Use A+B+C, then model measurement is to use D;After a wheel cross-beta, it may carry out using B+C+D during model training, A is used in model measurement, it is such.
Wherein, the preparatory condition that nicety of grading is met can be that the classification accuracy of obtained training pattern is more than default threshold Value, or the recall rate of obtained training pattern are more than predetermined threshold value, and the embodiment of the present invention is to this without specific restriction.
In embodiments of the present invention, just can be by the domain classification model, to institute after above-mentioned domain classification model is obtained There is application that application developer submits and pending classification to be classified.Specifically, the application details of each application are believed Breath input is into above-mentioned domain classification model;Afterwards, the domain classification result of domain classification model output is obtained.Wherein, should Domain classification result includes the probability of each first-level class label at least two first-level class labels belonging to the application Score value;Finally, at least two first-level class labels, at least one first-level class label of screening probability score highest will At least one first-level class label of probability score highest is defined as this using affiliated specified first-level class label.
That is, the domain classification model is specifically provided belonging to the application when providing a domain classification result applied The probability score of each first-level class label in 19 first-level class labels, i.e., carry out domain classification marking to the application, Most at last 1 to 2 first-level class label of probability score highest as the application specified first-level class label.It is mentioned here The application of pending classification, reference is the application for treating to be classified by method provided in an embodiment of the present invention.
To sum up, after being differentiated by the domain classification shown in step 401, the classification results exported based on domain classification model, Complete the classification correction for each application submitted to application developer.
402a, the application detail information to the application carry out information sifting processing, obtain the summary info of the application.
In embodiments of the present invention, above-mentioned steps 401 are being used to enter for each application of classification submit and pending After the other differentiation of row domain class, also it need to yet further determine that this applies affiliated two grades on the basis of first-level class label Sub- level tag along sort as tag along sort and three-level tag along sort.Wherein, in generation secondary classification label and three-level classification During label, Main Basiss are the application recommended informations included using detail information and apply sectional drawing.This step only for should With the mode of recommended information.
For some apply recommended information, wherein can have substantial amounts of invalid and redundant information often.For example, It is this kind of that the application recommended information of " friend-making of * * videos " includes such as " we do not do word, and phrase sound is not done, and short-sighted frequency is not done " Content, these contents can not therefrom extract effective tag along sort it is clear that some are unimportant.Therefore In order to stamp accurate sub- level label information to an application, the application recommended information to application is also needed to carry out summary screening.Its In, the embodiment of the present invention mainly takes TextRank algorithm when screening summary info in application recommended information, including following several Individual step:
(1) the application recommended information included in the application detail information of the application, is cut at least two short sentences, calculated Similarity at least two short sentences between any two short sentence.
For the step, at least two short sentence S will be cut into using recommended information T first1To Sm.That is, T=[S are obtained1, S2..., Sm], next, for S1To SmIn any two sentence SiAnd Sj, similarity therebetween can use following public affairs Formula is represented:
Wherein, the molecule in formula represents two sentence SiAnd SjIn the number of word that occurs jointly, that is, represent both to have belonged to Sentence SiWord, fall within sentence SjWord;Wherein, symbol " ∨ " is disjunction sign, and symbol " ∧ " is conjunction symbol.Point In mother | Si| and | Sj| that represent respectively is sentence SiAnd SjIn word number.tkWhat is represented is a word, Similarity(Si,Sj) be used to represent two sentence SiAnd SjBetween similarity.
(2), according to the similarity between any two short sentence, the important journey of each short sentence at least two short sentences is calculated Angle value.
In embodiments of the present invention, the importance value of each short sentence is calculated using following formula.
Wherein, WS (Vi) refer to sentence ViImportance value, d is damped coefficient, usual value be 0.85, wji= Similarity(Si,Sj) it is sentence SiAnd SjBetween similarity, WS (Vj) refer to sentence VjImportance value, wjk= Similarity(Sk,Sj) it is sentence SkAnd SjBetween similarity.
Wherein, TextRank algorithm regards at least two short sentences splitted out as Yi Zhangbao when screening summary info The authorized graph of multiple nodes is included, each sentence is as a node in figure, if having similitude between two sentences, then it is assumed that right There is a side of having the right between two nodes answered, weights are similarities.Sentence V in above-mentioned formulaiJust it is a node in figure. With reference to the above, In (Vi) what is referred to is the in-degree of a node, i.e. the bar number into the side of the node, Out (Vj) refer to The out-degree of a node, i.e. the bar number from the side of the node.
(3), the importance value of each short sentence is ranked up according to order from big to small, at least two short sentences In filter out the short sentence of specifying number come above, according to the sequencing occurred in application recommended information, by specified number Mesh short sentence is combined, the summary info being applied.
For the step, because the basic thought of TextRank algorithm is as plucking using some sentences of significance level highest Will, therefore after the importance value of each short sentence is obtained, the embodiment of the present invention can be entered to the importance value of each short sentence Row sequence, such as be ranked up according to order from big to small.Afterwards, based on this ranking results, filter out and come above A short sentence is specified number, summary info of a short sentence as the application will be specified number.Wherein, the row of importance value is being carried out During sequence, also it just can so be filtered out based on ranking results and come specifying number below according to from the progress of the order of small arrival Short sentence.The big I specified number is 4 or 5 etc., and the embodiment of the present invention is to this without specific restriction.
In another embodiment, in order to ensure the continuity of content of summary info screened, the embodiment of the present invention is also It can be combined according to the sequencing that a short sentence occurs in application recommended information is specified number to specifying number a short sentence, The summary info for obtaining the application is combined according to original text order.
It should be noted that except the screening of summary info can be carried out in application recommended information using TextRank algorithm Outside, the screening of keyword can be also carried out in application recommended information, and for follow-up sub- level label generating process.Wherein, close Keyword screening detailed process be:At least two short sentences will be cut into using recommended information, deactivation is filtered out in each sentence Word, and only retain the word for specifying part of speech.It is hereby achieved that the set and the set of word of sentence.It regard each word as nothing A node in weight graph.Window size is set as k, it is assumed that a sentence is made up of following word successively:W1, w2, w3, W4, w5 ..., wn, wherein, w1, w2 ..., wk, w2, w3 ..., wk+1, w3, w4 ..., wk+2 etc. are a windows.One There is a side had no right between the corresponding node of any two words in individual window.And based on above-mentioned composition without weight graph, The significance level of each word can be calculated.Finally, the maximum some words of most important degree are used as keyword.
403rd, based on the word cluster result prestored, Keywords matching is carried out to the summary info of the application, is based on Obtained matching result, determines that this applies affiliated specified sub- level tag along sort in the case where specifying first-level class label.
In embodiments of the present invention, after the summary info of application is filtered out according to above-mentioned steps 403, according further to The summary info determines that this applies affiliated sub- level tag along sort.Specifically the practice is:First according to the preset number of the whole network Submit the summary info of application to carry out word cluster, the summary info of the application is closed again by obtained word cluster result Keyword is matched, and then stamps sub- level tag along sort according to matching result for the application.Wherein, the whole network is primarily directed at present The summary info of about 3,000,000 applications carries out word cluster altogether, and detailed process is as follows:
403a, according to affiliated first-level class label, the summary info that preset number has been filed on application is divided Processing, obtains the training summary info of each first-level class label.
For the step, so that the summary info using above-mentioned about 3,000,000 applications carries out word cluster as an example, due to the present invention Embodiment has prestored 19 first-level class labels, therefore this step is based on 19 first-level class labels, will about 3,000,000 applications Summary info be divided into 19 parts.For a first-level class label, by be allocated to this first-level class label those The summary info of application, is used as follow-up training and the training summary info of the term vector model of this first-level class label.
403b, training summary info based on each first-level class label carry out model training, obtain with each one The term vector model of level tag along sort matching.
For the step, how many first-level class label will train the term vector model of how many.In the present invention In embodiment, using word2vector instruments, the training summary info for being respectively adopted 19 parts trains 19 term vectors Word2vector models.Wherein, term vector is a kind of technology for word being processed into vector, and ensures the relative phase between vector It is related like spending to semantic similarity.In other words, word2vec technologies are a kind of height that word is characterized as to real number value vector Algorithm model is imitated, it utilizes deep learning thought, the processing to content of text can be reduced to empty by K dimensional vectors by training Between in vector operation, and similarity in vector space can be for representing similar on text semantic.
In addition, during a term vector model is trained, can also be first under belonging correspondence first-level class label Summary info is trained to carry out word segmentation processing, and by the storage of obtained word segmentation result into an input text (form is txt), it Word2vector source codes are run afterwards, and this input text is subjected to model training as input, so as to obtain term vector mould Type.And after 19 term vector models are obtained, word cluster is just carried out to each tag along sort using each term vector model, Detailed process is referring to following step 403c.
The son included under 403c, the first-level class label prestored for each first-level class label, acquisition Level tag along sort;The first-level class label and the sub- level tag along sort are inputted into the term vector model of matching, obtain with The first-level class label and the cluster word of the sub- level tag along sort matching;Obtained whole cluster words are combined, Obtain above-mentioned word cluster result.
Wherein, label system is prestored, and wherein label system is divided into three layers, 19, first-level class label, 118, secondary classification label, three-level tag along sort 923.In embodiments of the present invention, for each first-level class label, The secondary classification label and three-level tag along sort included under the first-level class label is obtained respectively, then respectively by this fraction Class label, the secondary classification label included and three-level tag along sort are inputted into above-mentioned 19 term vector models as input That term vector model of matching, and then obtain and this first-level class label, the secondary classification label included and three-level The cluster word of tag along sort matching, that is, obtain the participle list close with the title of above-mentioned tag along sort.
It should be noted that above-mentioned, substantially refer to tag along sort input word vector model is by the name of tag along sort Claim input to term vector model.In addition, after a participle list is obtained, can also be by manual examination and verification in this participle list The cluster word of appearance is further checked so that the degree of purity of the word in participle list is higher, the embodiment of the present invention To this without specific restriction.Finally, obtained multiple participle lists are combined, under three obtained grade classification label The cluster word covered can reach more than 5000.That is, by the word cluster of word2vector technologies, it can obtain such as Fig. 6 The label system of described stratification.
And after word cluster is completed, just Keywords matching can be carried out to the summary info of above-mentioned application, if matching word Any keyword in language cluster result, just stamps corresponding two grades and three-level tag along sort for the application, and detailed process is: The summary info of the application is matched with the cluster word that above-mentioned word cluster result includes, matching result is obtained;If The matching result indicates to include any cluster word in above-mentioned word cluster result in the summary info of the application, then will be with this It is any cluster word match sub- level tag along sort as the application specified sub- level tag along sort.
For citing a plain example, it is assumed that stamp " amusement and leisure " this first-level class label for one, this Include " physical culture " this secondary classification label under individual " amusement and leisure " this first-level class label, and in " physical culture " this two fraction Category is signed and including " ball " this three-level tag along sort, wherein covered under this three-level tag along sort such as " football ", Participle is clustered as " vollyball ", " basketball " etc., if the summary info of the application includes " football " this keyword, then " ball " this three-level tag along sort and " physical culture " this secondary classification label will be stamped to the application.
Method provided in an embodiment of the present invention, the application submitted for application developer, the embodiment of the present invention can be certainly The application detail information based on the application is moved to determine that this applies affiliated first-level class label;Next, in order to set up level The label system of change, can also further carry out the screening of summary info, and the word based on storage in this applies detail information Cluster result, carries out Keywords matching, so as to stamp the institute under first-level class label for the application to the summary info of the application The sub- level tag along sort of category, because the generating process of above-mentioned label information is substantially completely automated, therefore can save substantial amounts of people It is power and time, more intelligent.Further, since more can detailed and comprehensively characterize the specific work(of an application using detail information Can and it act on, therefore the label information based on application detail information generation is more accurate and the other coverage of domain class is good.
Understood with reference to above-mentioned Fig. 4 B, completed in the step 401 by above-described embodiment and a fraction is carried out to an application After the correction of class label, then completed by the step 402 and step 403 of above-described embodiment level contingency table is stamped to the application Label are completed after label information generation, if it find that the tag along sort generated is inaccurate or coverage of domain classification is inadequate It is good, then also label information correction can be carried out by schemes such as APP black and white lists as shown in Figure 4 B, or according to OCR pictures text Word identification technology carries out label information supplement to the application.Wherein, the process for carrying out label information supplement asks implementation as described below Example.
Fig. 5 is a kind of flow chart of the label information generation method of application provided in an embodiment of the present invention.Referring to Fig. 5, sheet The method flow that inventive embodiments are provided includes:
501st, the application detail information of the application of a submission is obtained, detail information is applied based on this, what is prestored Determine this with affiliated specified first-level class label at least two first-level class labels.
The step 501 is similar with above-mentioned steps 401, and here is omitted.
502nd, Screening Treatment is carried out to the application detail information of the application, obtains the summary info of the application.
The step 502 is similar with above-mentioned steps 402, and here is omitted.
503rd, based on the word cluster result prestored, Keywords matching is carried out to the summary info of the application, is based on Obtained matching result, determines that this applies affiliated specified sub- level tag along sort in the case where specifying first-level class label.
The step 503 is similar with above-mentioned steps 403, and here is omitted.
504th, label information supplement is carried out to the application according to OCR picture characters identification technology.
Wherein, the process for carrying out label information supplement to the application according to OCR picture characters identification technology can be divided into detail Following several steps:
(I) obtains at least one included in the application detail information of the application and applies sectional drawing.
Wherein, OCR technique has in the industry cycle had wide practical use at present, such as identity card identification, bank card identification, Case identification, business card recognition etc., and take OCR technique to be directed to that one application of opening is installed first in the embodiment of the present invention When the application sectional drawing seen of user.That is, the embodiment of the present invention is attempted to carry out Text region to application sectional drawing using OCR technique, And stamp corresponding label.
Different from traditional OCR technique, claim for the character recognition technology such as using this kind of natural scene image of sectional drawing Be STR (Scene Text Recognition, scene Text region).Wherein, in order to application sectional drawing in exactly Text region is carried out, the embodiment of the present invention can mark a large amount of high-quality samples such as shown in Fig. 7.Wherein, these are high-quality Sample both from application sectional drawing data, and each apply sectional drawing in word segment occupy certain proportion.Such as, In the application sectional drawing of Investment & Financing type in Fig. 7, the application sectional drawing of electronic contract type, the application sectional drawing of recognition of face type Word segment occupies very big proportion, and such high quality samples, which just more have, utilizes smart in application sectional drawing in subsequent process Really carry out Text region.
(II) will be disassembled as at least one image channel, at least one figure for each Zhang Yingyong sectional drawings using sectional drawing In each image channel included as passage, positioning include word at least one is text filed.
It is first when carrying out Text region at least one application sectional drawing of the application based on STR technologies for the step First need to also be text filed where location character in application sectional drawing.Wherein, MSER is mainly taken in embodiments of the present invention (Maximally Stable Extremal Regions, maximum extreme value stability region) and SWT (Stroke Width Transform, the conversion of person's handwriting width) both approaches.Regardless of whether taking MSER methods or taking SWT methods, obtaining After alternative textbox, there can be the grader of supervision using one to recognize whether these alternative textboxs really belong to Word segment.
In embodiments of the present invention, SVM (Support Vector Machine, SVMs) can be used to come to upper Alternative textbox is stated to be classified, so therefrom determine including word at least one is text filed.In addition, in order to be lifted The recall rate of Text region, when being handled using sectional drawing one, will can be disassembled as at least one image using sectional drawing Passage, above-mentioned character machining algorithm is independently executed to different image channels, finally by the results of different passages carry out duplicate removal with Merging treatment, so that at least one oriented in above-mentioned at least one application sectional drawing including word is text filed.Wherein, wrap Text filed locating effect containing word can come be to navigate to as shown in figure 8, the square frame of i.e. black overstriking is outlined Include the text filed of word.
Referring to Fig. 8, for first application sectional drawing of first row, navigate to it is text filed have two, be respectively " vertical Body repaiies face " and " possessing fair and tender maiden's flesh " the two.For second application sectional drawing of first row, what is navigated to is text filed Have two, be respectively " U.S. face filter " and " have U.S. face want to broadcast just broadcast " the two.For first ranked third an application sectional drawing, Navigate to it is text filed also have two, be respectively " live audio " and " entertaining audio is live " the two.For remaining Three figures of two rows, by that analogy, therefrom can be navigated to text filed exactly.
(III) is combined at least one text filed carry out Text region to the word recognized, is obtained at least The Text region result of one application sectional drawing, above-mentioned specified first-level class mark is removed based on the word recognition result for the application generation Label information outside label and specified sub- level tag along sort.
Locking application sectional drawing in it is text filed after, identification of the embodiment of the present invention just based on character gradient statistical information Method, come at least one the text filed carry out Text region navigated to from application sectional drawing.In addition, the embodiment of the present invention exists , can be according to the characteristics of character (such as size, direction, density, position etc.), to construct corresponding energy about when carrying out Text region Beam function is by the text filed line of text for being combined as independent and pending Text region.It is automatic to carry out for each line of text Cutting or combined treatment, for each the primitive rectangular block obtained after processing, can obtain a corresponding recognition result, Wherein this recognition result includes the word recognized and corresponding confidence level.Above-mentioned to each text filed progress After Text region processing, take respective algorithms to be combined the word recognized, just obtain the text of at least one application sectional drawing Word recognition result.
And after the Text region result to the application sectional drawing of an application is obtained, just can be according to the word recognition result pair Row label supplement is entered in the application.Label supplement process is described with a specific example below.Referring to Fig. 9, for For XX map applications, it drives to lead using real-time road, crossing actual scene, location information seniority among brothers and sisters and dynamic is included in sectional drawing Multiple sectional drawings such as boat, the content of text included in these sectional drawings is substantially for introducing this application and to this using leading Domain classification has obvious help, therefore recognizes that these apply the word content in sectional drawing based on OCR picture characters identification technology.Than Such as, stamped after " national real-time road avoid congestion save worry trip " this content of text for the XX map applications " real recognizing Shi Lukuang " label.5% full dose can be contributed by being currently based on the label information method for digging of OCR picture character identification technologies The overlay capacity of label information, and the degree of accuracy is up to 99%.
Furthermore, it is necessary to which explanation, the above-mentioned Text region result recognized in application sectional drawing is also applied to above-mentioned In the step of step 402 and 502 screening summary info.That is, can application recommended information and it is above-mentioned at least one should With the screening that summary info is carried out in the Text region result of sectional drawing, the embodiment of the present invention is to this without specific restriction.
Method provided in an embodiment of the present invention, the application submitted for application developer, the embodiment of the present invention can be certainly The application detail information based on the application is moved to determine that this applies affiliated first-level class label;Next, in order to set up level The label system of change, can also further carry out the screening of summary info, and the word based on storage in this applies detail information Cluster result, carries out Keywords matching, so as to stamp the institute under first-level class label for the application to the summary info of the application The sub- level tag along sort of category, because the generating process of above-mentioned label information is substantially completely automated, therefore can save substantial amounts of people It is power and time, more intelligent.Further, since more can detailed and comprehensively characterize the specific work(of an application using detail information Can and it act on, therefore the label information based on application detail information generation is more accurate and the other coverage of domain class is good.
To sum up, the generation method of label information provided in an embodiment of the present invention relies on natural language processing technique, OCR pictures Character recognition technology, can go out the label information being worth with high reference for each usage mining submitted, and realize that one is answered Label information can cover the function point that this applies various aspects.User can so be facilitated in such as software classification, trip The application of itself needs is quickly and easily found in the field classifications such as play classification.In addition, being provided by the embodiment of the present invention Hierarchically structured label system, can cause user such as software classification page, game classification the page in browse difference List of application under classification and label, finds the application completion download for meeting oneself demand.
In other words, by the generation method of label information provided in an embodiment of the present invention, ensured that each applies energy It is enough accurately to be exposed under relevant classification label, and then ensure itself suitable functional requirement can be quickly found when user browses Application, and be downloaded and use.For conclusion, mainly with following function point:
Label system is abundant detailed first, user can be helped from the application of magnanimity, according to the hierarchical structure mark of foundation Label system is quickly found out the application of suitable itself functional requirement point.
In addition, can be upgraded in time using the matching degree between label information.In embodiments of the present invention, new opplication Issue or the generation of new label (such as in Fig. 3 " Quadratic Finite Element " label from scratch), the title of application, using recommended information, The download of application and user can be handled scoring renewal of application etc. in completion in regular hour window.Due to these because Son can be influenceed using the matching degree between label information, therefore, may when occurring larger renewal in the above-mentioned factor Considerable influence is produced with the matching degree between existing label information to application, now the embodiment of the present invention can also be according to above-mentioned several Individual step stamps new label for the application again.Wherein, using the matching degree between label information generated for application What tag along sorts at different levels were to determine, such as when for application generation first-level class label, domain classification model can provide the application The probability of a belonging first-level class label, this probability is the degree of correlation therebetween.
In another embodiment, with reference to Fig. 4 B, the embodiment of the present invention can also according to the application and label information of application it Between matching degree auto-sequencing is carried out to the application under same tag along sort list so that help user be quickly found out quality it is high and And the application of functional requirement matching.Certainly, in the sequencer procedure of application, except being outside one's consideration with reference to matching therebetween, may be used also The scoring of download and user to application based on application, to be weighted accordingly, the deviation of weighting is that download is got over High or the higher application of scoring sequence is more forward.
In summary, the embodiment of the present invention constructs the complete label system of coverage, wherein first-level class 19, two grades Classification 118, three-level is classified 923.And using word2vector algorithms to filtering out in app application recommended information Summary info carries out word cluster, obtains high frequency, the cluster word that the degree of correlation is high under tag along sort titles at different levels, finally gives The cluster word of participle rank can reach more than 5000 under labels at different levels.In addition, the embodiment of the present invention additionally uses text text Sorting technique submits the app come up art classification to be corrected to application developer, it is therefore prevented that application developer pair Its app developed exposes cheating under irrelevant domain classification.In addition, being carried out using textrank algorithms to application recommended information Summary screening, filters out using the language that unintelligible emphasis is not protruded is told about in recommended information, improves the standard of label information Exactness.The word in application sectional drawing is identified and is based on recognizing in addition, also using OCR picture characters identification technology Word be application stamp corresponding label.
Fig. 9 B are a kind of holistic approach flow charts of the label information generation of application provided in an embodiment of the present invention.Referring to figure 9B, method flow provided in an embodiment of the present invention includes:
901st, the full dose information of all applications of submission is periodically pulled from the whole network, structure is carried out to obtained full dose information Change data to extract, and data prediction operation is performed to the structural data extracted, obtain the application details of each application Information.
902nd, carry out domain classification model training and obtained based on word2vector word clusters method to whole submissions Application word cluster result, store the word cluster result.
903rd, for each remaining application, based on the domain classification model pre-established and the application details of the application Information, obtains the domain classification result to the application, and the domain classification result includes each fraction belonging to the application The probability score of class label.
904th, in whole first-level class labels, probability score highest at least one first-level class label is filtered out, By this probability score highest, at least one first-level class label is defined as this using affiliated specified first-level class label.
905th, the application recommended information included in the application detail information of the application is cut at least two short sentences, calculated Similarity at least two short sentences between any two short sentence, and according to the similarity between any two short sentence, calculate every The importance value of one short sentence.
906th, the importance value of each short sentence is ranked up according to order from big to small, based on obtained sequence As a result, the short sentence of specifying number come above is filtered out at least two short sentences;According to this apply recommended information in go out Existing sequencing, will specify number a short sentence and is combined processing, obtain the summary info of the application.
907th, the cluster word progress for including the summary info of the application and the word cluster result prestored Match somebody with somebody, obtain matching result;If the matching result indicates to include in the summary info of the application any poly- in word cluster result Class word, then will apply affiliated specified sub- level contingency table with the sub- level tag along sort of any cluster word match as this Label.
908th, at least one included in the application detail information for obtaining the application applies sectional drawing;Cut for each Zhang Yingyong Figure, this is disassembled as at least one image channel using sectional drawing, and logical in each image that at least one image channel is included In road, positioning include word at least one is text filed.
909th, at least one text filed carry out Text region, and the word recognized is combined, obtains described The Text region result of at least one application sectional drawing;It is that the application generation removes specified first-level class based on the word recognition result Label information outside label and specified sub- level tag along sort.
Method provided in an embodiment of the present invention, the application submitted for application developer, the embodiment of the present invention can be certainly The application detail information based on the application is moved to determine that this applies affiliated first-level class label;Next, in order to set up level The label system of change, can also further carry out the screening of summary info, and the word based on storage in this applies detail information Cluster result, carries out Keywords matching, so as to stamp the institute under first-level class label for the application to the summary info of the application The sub- level tag along sort of category, because the generating process of above-mentioned label information is substantially completely automated, therefore can save substantial amounts of people It is power and time, more intelligent.Further, since more can detailed and comprehensively characterize the specific work(of an application using detail information Can and it act on, therefore the label information based on application detail information generation is more accurate and the other coverage of domain class is good.In addition, Also Text region can be carried out to application sectional drawing based on OCR picture characters identification technology, and based on the word recognized further Label information supplement is carried out, the accuracy of the label information for each application generation is more ensure that.
Figure 10 is a kind of structural representation of the label information generating means of application provided in an embodiment of the present invention.Referring to figure 10, the device includes:
Acquisition module 1001, for applying detail information, this is used for the application to a submission using detail information and carried out Functional characteristic is described;
First processing module 1002, for based on detail information is applied, at least two first-level class marks prestored Determine that this applies affiliated specified first-level class label in label;
Screening module 1003, for carrying out information sifting processing using detail information to this, obtains the summary letter of the application Breath;
Second processing module 1004, for based on the word cluster result prestored, entering to the summary info of the application Row Keywords matching, based on obtained matching result, determines that this applies affiliated finger stator stage in the case where specifying first-level class label Tag along sort, the word cluster result is that the summary info progress word cluster processing that application is had been filed on to preset number is obtained 's.
In another embodiment, first processing module 1002, for based on the domain classification model pre-established and this The application detail information of application, obtains the domain classification result to the application, and the domain classification result is included belonging to the application The probability score of each first-level class label at least two first-level class labels;In at least two first-level class labels In, probability score highest at least one first-level class label is filtered out, by least one first-level class of probability score highest Label is defined as specifying first-level class label.
In another embodiment, acquisition module 1001, are additionally operable to for each at least two first-level class labels Individual first-level class label, obtains at least one application of preliminary belonging first-level class label;Obtain at least one application Manual sort's annotation results;
First processing module 1002, is additionally operable to be based on manual sort's annotation results, and use is filtered out at least one application In the training sample for carrying out model training, training sample first-level class label belonging to being defined as again after manual sort Application;
First processing module 1002, is additionally operable to the application details of the training sample to each belonging first-level class label Information carries out word segmentation processing;Obtained word segmentation result and the corresponding first-level class label of word segmentation result are arrived with specified format storage In specific training text;
First processing module 1002, is additionally operable to carry out model instruction based on text classification tool function and specific training text Practice, obtain training pattern;Cross-beta is carried out to training pattern, until the nicety of grading of obtained training pattern meets default bar Part, obtains domain classification model.
In another embodiment, screening module 1003, for the application recommended information that will be included in application detail information It is cut at least two short sentences;Calculate the similarity between any two short sentence at least two short sentences;It is short according to any two Similarity between sentence, calculates the importance value of each short sentence at least two short sentences;It is right according to order from big to small The importance value of each short sentence is ranked up, and based on obtained ranking results, is filtered out and is come at least two short sentences Above specify number short sentence;According to the sequencing occurred in application recommended information, a short sentence will be specified number and carried out Combined treatment, the summary info being applied.
In another embodiment, Second processing module 1004, are additionally operable to, for each first-level class label, obtain one Level contingency table signs the sub- level tag along sort included;Based on first-level class label and sub- level tag along sort and a fraction In class label and the term vector model of the sub- level tag along sort matching, obtain and first-level class label and sub- level contingency table Sign the cluster word of matching;Obtained whole cluster words are combined processing, word cluster result is obtained.
In another embodiment, Second processing module 1004, are additionally operable to according to affiliated first-level class label, to default The summary info that number has been filed on application carries out division processing, obtains the training summary of each belonging first-level class label Information;Training summary info based on each belonging first-level class label carries out model training, obtains and each one-level The term vector model of tag along sort matching.
In another embodiment, Second processing module 1004, for by the summary info of the application and word cluster knot The cluster word that fruit includes is matched, and obtains matching result;If matching result indicates to include in the summary info of the application Any cluster word in word cluster result, then regard the sub- level tag along sort with any cluster word match as finger stator stage Tag along sort.
In another embodiment, the device also includes:
Acquisition module 1001, is additionally operable to obtain and applies sectional drawing using at least one included in detail information;
3rd processing module, for that for each Zhang Yingyong sectional drawings, will be disassembled using sectional drawing as at least one image channel; In each image channel that at least one image channel is included, positioning include word at least one is text filed;
3rd processing module, is additionally operable to at least one text filed carry out Text region, and the word recognized is entered Row combination, obtains the Text region result of at least one application sectional drawing;It is that application generation removes specified one based on Text region result Level tag along sort and refer to label information outside stator stage s tag along sorts.
Device provided in an embodiment of the present invention, the application submitted for application developer, the embodiment of the present invention can be certainly The application detail information based on the application is moved to determine that this applies affiliated first-level class label;Next, in order to set up level The label system of change, can also further carry out the screening of summary info in this applies detail information, and based on prestoring Word cluster result, Keywords matching is carried out to the summary info of the application, so as to be stamped for the application in first-level class label Lower affiliated sub- level tag along sort, because the generating process of above-mentioned label information is substantially completely automated, therefore can save a large amount of Manpower and the time, it is more intelligent.Further, since more can detailed and comprehensively characterize the tool of an application using detail information Body function and effect, therefore the label information of detail information generation is more accurate and the other coverage of domain class is good based on applying.
It should be noted that:Above-described embodiment provide application label information generating means generate label information when, Only with the division progress of above-mentioned each functional module for example, in practical application, as needed can distribute above-mentioned functions Completed by different functional modules, i.e., the internal structure of device is divided into different functional modules, it is described above to complete All or part of function.In addition, the label information generating means for the application that above-described embodiment is provided and the label information of application Generation method embodiment belongs to same design, and it implements process and refers to embodiment of the method, repeats no more here.
Figure 11 is a kind of server according to an exemplary embodiment, and the server can be used for implementing any of the above-described The label information generation method of application shown in exemplary embodiment.Specifically:Referring to Figure 11, the server 1100 can because with Put or performance is different and produce than larger difference, one or more central processing units (Central can be included Process ing Unit, CPU) 1122 (for example, one or more processors) and memory 1132, one or one with The storage medium 1130 (such as one or more mass memory units) of upper storage application program 1142 or data 1144.Its In, memory 1132 and storage medium 1130 can be of short duration storage or persistently storage.It is stored in the program of storage medium 1130 One or more modules can be included (diagram is not marked).
Server 1100 can also include one or more power supplys 1128, one or more wired or wireless nets Network interface 1150, one or more input/output interfaces 1158, and/or, one or more operating systems 1141, example Such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..One or more than one journey Sequence is stored in memory, and is configured to by one or more than one computing device said one or more than one journey Sequence includes the instruction for the label information generation for being used for performing application.
One of ordinary skill in the art will appreciate that realizing that all or part of step of above-described embodiment can be by hardware To complete, the hardware of correlation can also be instructed to complete by program, described program can be stored in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only storage, disk or CD etc..
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.

Claims (15)

1. the label information generation method of a kind of application, it is characterised in that methods described includes:
Acquisition application detail information, the application detail information is used for the application to a submission and carries out functional characteristic description;
Based on the application detail information, determined at least two first-level class labels prestored belonging to the application Specify first-level class label;
Information sifting processing is carried out to the application detail information, the summary info of the application is obtained;
Based on the word cluster result prestored, Keywords matching is carried out to the summary info of the application, based on what is obtained Matching result, determines the specified sub- level tag along sort belonging to the application, the word under the specified first-level class label Cluster result is that the summary info progress word cluster processing that application is had been filed on to preset number is obtained.
2. according to the method described in claim 1, it is characterised in that described to be based on the application detail information, prestoring At least two first-level class labels in determine specified first-level class label belonging to the application, including:
Based on the domain classification model pre-established and the application detail information of the application, the field point to the application is obtained Class result, the domain classification result include at least two first-level class label belonging to the application each one The probability score of level tag along sort;
In at least two first-level class label, probability score highest at least one first-level class label is filtered out, will At least one first-level class label of the probability score highest is defined as the specified first-level class label.
3. method according to claim 2, it is characterised in that methods described also includes:
For each first-level class label at least two first-level class label, the preliminary belonging one-level is obtained At least one application of tag along sort;
Obtain manual sort's annotation results at least one application;
Based on manual sort's annotation results, the training for carrying out model training is filtered out at least one described application Sample, the training sample is the application for being defined as the belonging first-level class label again after manual sort;
Word segmentation processing is carried out to the application detail information of the training sample of each belonging first-level class label;
Obtained word segmentation result and the corresponding first-level class label of the word segmentation result are arrived into specific training with specified format storage In text;
Model training is carried out based on text classification tool function and the specific training text, training pattern is obtained;
Cross-beta is carried out to the training pattern, until the nicety of grading of obtained training pattern meets preparatory condition, obtained The domain classification model.
4. according to the method described in claim 1, it is characterised in that described that the application detail information is carried out at information sifting Reason, obtains the summary info of the application, including:
The application recommended information included in the application detail information is cut at least two short sentences;
Calculate the similarity between any two short sentence at least two short sentence;
According to the similarity between any two short sentence, the important journey of each short sentence at least two short sentence is calculated Angle value;
The importance value of each short sentence is ranked up according to order from big to small, based on obtained ranking results, The short sentence of specifying number come above is filtered out at least two short sentence;
According to the sequencing occurred in the application recommended information, a short sentence that specifies number is combined processing, Obtain the summary info of the application.
5. the method according to any claim in Claims 1-4, it is characterised in that methods described also includes:
For each first-level class label, the sub- level tag along sort included under the first-level class label is obtained;
Based on the first-level class label and the sub- level tag along sort and the first-level class label and the sub- fraction The term vector model of class tag match, obtains the cluster word matched with the first-level class label and the sub- level tag along sort Language;
Obtained whole cluster words are combined processing, the word cluster result is obtained.
6. method according to claim 5, it is characterised in that methods described also includes:
According to affiliated first-level class label, the summary info that application is had been filed on to the preset number carries out division processing, Obtain the training summary info of each belonging first-level class label;
Training summary info based on each belonging first-level class label carries out model training, obtains and each fraction The term vector model of class tag match.
7. method according to claim 5, it is characterised in that described based on the word cluster result prestored, to institute The summary info for stating application carries out Keywords matching, based on obtained matching result, under the specified first-level class label really Specified sub- level tag along sort belonging to the fixed application, including:
The summary info of the application is matched with the cluster word that the word cluster result includes, described is obtained With result;
If the matching result indicates to include any cluster word in the word cluster result in the summary info of the application Language, then regard the sub- level tag along sort with any cluster word match as the specified sub- level tag along sort.
8. the method according to any claim in Claims 1-4, it is characterised in that methods described also includes:
Obtain at least one included in the application detail information and apply sectional drawing;
For each Zhang Yingyong sectional drawings, the application sectional drawing is disassembled as at least one image channel;
In each image channel that at least one described image channel is included, positioning includes at least one text area of word Domain;
To at least one described text filed carry out Text region, and the word recognized is combined, obtain it is described at least The Text region result of one application sectional drawing;
It is that the application generation removes the specified first-level class label and the finger stator stage based on the Text region result Label information outside tag along sort.
9. the label information generating means of a kind of application, it is characterised in that described device includes:
Acquisition module, detail information is applied for obtaining, and the application detail information is used for the application to a submission and carries out work( Can characteristic description;
First processing module, for based on the application detail information, at least two first-level class labels prestored Determine the specified first-level class label belonging to the application;
Screening module, for carrying out information sifting processing to the application detail information, obtains the summary info of the application;
Second processing module, for based on the word cluster result prestored, key to be carried out to the summary info of the application Word is matched, and based on obtained matching result, the finger stator stage belonging to the application is determined under the specified first-level class label Tag along sort, the word cluster result is that the summary info progress word cluster that application is had been filed on to preset number is handled Arrive.
10. device according to claim 9, it is characterised in that the acquisition module, is additionally operable to for described at least two Each first-level class label in first-level class label, at least one of the preliminary belonging first-level class label of acquisition should With;Obtain manual sort's annotation results at least one application;
The first processing module, is additionally operable to be based on manual sort's annotation results, is screened at least one described application Go out the training sample for carrying out model training, the training sample is belonging described to be defined as again after manual sort The application of first-level class label;
The first processing module, is additionally operable to the application detail information of the training sample to each belonging first-level class label Carry out word segmentation processing;Obtained word segmentation result and the corresponding first-level class label of the word segmentation result are arrived with specified format storage In specific training text;
The first processing module, is additionally operable to carry out model instruction based on text classification tool function and the specific training text Practice, obtain training pattern;Cross-beta is carried out to the training pattern, until the nicety of grading of obtained training pattern meets pre- If condition, the domain classification model is obtained.
11. device according to claim 9, it is characterised in that the screening module, for by the application detail information In the application recommended information that includes be cut at least two short sentences;Calculate at least two short sentence between any two short sentence Similarity;According to the similarity between any two short sentence, each short sentence at least two short sentence is calculated Importance value;The importance value of each short sentence is ranked up according to order from big to small, based on obtained sequence As a result, the short sentence of specifying number come above is filtered out at least two short sentence;According in the application reference The sequencing occurred in breath, is combined processing by a short sentence that specifies number, obtains the summary info of the application.
12. the device according to any claim in claim 9 to 11, it is characterised in that the Second processing module, It is additionally operable to, for each first-level class label, obtain the sub- level tag along sort included under the first-level class label;Based on institute State first-level class label and the sub- level tag along sort, matched with the first-level class label and the sub- level tag along sort Term vector model, obtain the cluster word that is matched with the first-level class label and the sub- level tag along sort;It will obtain Whole cluster words be combined processing, obtain the word cluster result.
13. device according to claim 12, it is characterised in that the Second processing module, is additionally operable to according to affiliated First-level class label, the summary info that application is had been filed on to the preset number carries out division processing, obtains belonging each The training summary info of individual first-level class label;Training summary info based on each belonging first-level class label carries out mould Type training, obtains the term vector model with each first-level class tag match.
14. device according to claim 12, it is characterised in that the Second processing module, for by the application The cluster word that summary info includes with the word cluster result is matched, and obtains the matching result;If described Any cluster word in the word cluster result is included in the summary info that the application is indicated with result, then will with it is described The sub- level tag along sort of any cluster word match is used as the specified sub- level tag along sort.
15. the device according to any claim in claim 9 to 11, it is characterised in that described device also includes:
The acquisition module, is additionally operable to obtain at least one included in the application detail information and applies sectional drawing;
3rd processing module, for for each Zhang Yingyong sectional drawings, the application sectional drawing to be disassembled as at least one image channel; In each image channel that at least one described image channel is included, positioning include word at least one is text filed;
3rd processing module, is additionally operable to at least one described text filed carry out Text region, and the text to recognizing Word is combined, and obtains the Text region result of at least one application sectional drawing;It is described based on the Text region result Using label information of the generation in addition to the specified first-level class label and the specified sub- level tag along sort.
CN201710279297.3A 2017-04-25 2017-04-25 Application tag information generation method and device Active CN107169049B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710279297.3A CN107169049B (en) 2017-04-25 2017-04-25 Application tag information generation method and device
PCT/CN2018/081559 WO2018196561A1 (en) 2017-04-25 2018-04-02 Label information generating method and device for application and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710279297.3A CN107169049B (en) 2017-04-25 2017-04-25 Application tag information generation method and device

Publications (2)

Publication Number Publication Date
CN107169049A true CN107169049A (en) 2017-09-15
CN107169049B CN107169049B (en) 2023-04-28

Family

ID=59813423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710279297.3A Active CN107169049B (en) 2017-04-25 2017-04-25 Application tag information generation method and device

Country Status (2)

Country Link
CN (1) CN107169049B (en)
WO (1) WO2018196561A1 (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108205674A (en) * 2017-12-22 2018-06-26 广州爱美互动网络科技有限公司 Content identification method, electronic equipment, storage medium and the system of social APP
CN108280202A (en) * 2018-01-30 2018-07-13 湖南蚁坊软件股份有限公司 A kind of real-time streams label frame of dynamic scalable
CN108563722A (en) * 2018-04-03 2018-09-21 有米科技股份有限公司 Trade classification method, system, computer equipment and the storage medium of text message
CN108595660A (en) * 2018-04-28 2018-09-28 腾讯科技(深圳)有限公司 Label information generation method, device, storage medium and the equipment of multimedia resource
WO2018196561A1 (en) * 2017-04-25 2018-11-01 腾讯科技(深圳)有限公司 Label information generating method and device for application and storage medium
CN108764141A (en) * 2018-05-25 2018-11-06 广州虎牙信息科技有限公司 A kind of scene of game describes method, apparatus, equipment and its storage medium
CN108764007A (en) * 2018-02-10 2018-11-06 集智学园(北京)科技有限公司 Based on OCR with text analysis technique to the measurement method of attention
CN108763194A (en) * 2018-04-27 2018-11-06 广州优视网络科技有限公司 Using mark stamp methods, device, storage medium and computer equipment
CN109657574A (en) * 2018-12-05 2019-04-19 深圳市子瑜杰恩科技有限公司 The stage property classification method and Related product of short-sighted frequency
CN109784368A (en) * 2018-12-11 2019-05-21 同盾控股有限公司 A kind of determination method and apparatus of application program classification
CN109795942A (en) * 2019-01-17 2019-05-24 杭州海康睿和物联网技术有限公司 Staircase control system, staircase monitoring device and its intelligent control method
CN110019663A (en) * 2017-09-30 2019-07-16 北京国双科技有限公司 A kind of method for pushing, system, storage medium and the processor of case information
CN110069769A (en) * 2018-01-22 2019-07-30 腾讯科技(深圳)有限公司 Using label generating method, device and storage equipment
CN110427542A (en) * 2018-04-26 2019-11-08 北京市商汤科技开发有限公司 Sorter network training and data mask method and device, equipment, medium
CN110532394A (en) * 2019-09-11 2019-12-03 携程计算机技术(上海)有限公司 The processing method and system of Order Remarks text
CN110781292A (en) * 2018-07-25 2020-02-11 百度在线网络技术(北京)有限公司 Text data multi-level classification method and device, electronic equipment and storage medium
CN110910175A (en) * 2019-11-26 2020-03-24 上海景域文化传播股份有限公司 Tourist ticket product portrait generation method
CN110909157A (en) * 2018-09-18 2020-03-24 阿里巴巴集团控股有限公司 Text classification method and device, computing equipment and readable storage medium
CN111079376A (en) * 2019-11-14 2020-04-28 贝壳技术有限公司 Data labeling method, device, medium and electronic equipment
CN111694962A (en) * 2019-03-15 2020-09-22 阿里巴巴集团控股有限公司 Data processing method and device
CN112506556A (en) * 2020-11-19 2021-03-16 杭州云深科技有限公司 Application program classification method and device, computer equipment and storage medium
CN112565250A (en) * 2020-12-04 2021-03-26 中国移动通信集团内蒙古有限公司 Website identification method, device, equipment and storage medium
WO2021092871A1 (en) * 2019-11-13 2021-05-20 北京数字联盟网络科技有限公司 Application preference text classification method based on textrank
CN115688107A (en) * 2022-12-28 2023-02-03 卓望数码技术(深圳)有限公司 Fraud-related APP detection system and method
CN117725515A (en) * 2024-02-07 2024-03-19 北京肿瘤医院(北京大学肿瘤医院) Quality classification method, system, storage medium and product for clinical test of medicine

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858843B (en) * 2019-04-30 2023-12-05 北京嘀嘀无限科技发展有限公司 Text classification method and device
CN112016582B (en) * 2019-05-31 2023-11-24 口口相传(北京)网络技术有限公司 Dish recommending method and device
CN112528073A (en) * 2019-09-03 2021-03-19 北京国双科技有限公司 Video generation method and device
CA3063243A1 (en) * 2019-11-13 2021-05-13 Beijing Digital Union Web Science And Technology Company Limited An application preference text classification method based on textrank
CN111026908B (en) * 2019-12-10 2023-09-08 腾讯科技(深圳)有限公司 Song label determining method, device, computer equipment and storage medium
CN111353050A (en) * 2019-12-27 2020-06-30 北京合力亿捷科技股份有限公司 Word stock construction method and tool in vertical field of telecommunication customer service
CN111753060B (en) * 2020-07-29 2023-09-26 腾讯科技(深圳)有限公司 Information retrieval method, apparatus, device and computer readable storage medium
CN112015898B (en) * 2020-08-28 2023-11-21 支付宝(杭州)信息技术有限公司 Model training and text label determining method and device based on label tree
CN112597295B (en) * 2020-12-03 2024-02-02 京东科技控股股份有限公司 Digest extraction method, digest extraction device, computer device, and storage medium
CN112784911B (en) * 2021-01-29 2024-01-19 北京百度网讯科技有限公司 Training sample generation method and device, electronic equipment and storage medium
CN112905743B (en) * 2021-02-20 2023-08-01 北京百度网讯科技有限公司 Text object detection method, device, electronic equipment and storage medium
WO2023178205A1 (en) * 2022-03-16 2023-09-21 Aviagames, Inc. Automated computer game application classification based on a mixed effects model

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620596A (en) * 2008-06-30 2010-01-06 东北大学 Multi-document auto-abstracting method facing to inquiry
US20110103682A1 (en) * 2009-10-29 2011-05-05 Xerox Corporation Multi-modality classification for one-class classification in social networks
CN102609539A (en) * 2012-02-16 2012-07-25 北京搜狗信息服务有限公司 Search method and search system
CN103324628A (en) * 2012-03-21 2013-09-25 腾讯科技(深圳)有限公司 Industry classification method and system for text publishing
CN104021185A (en) * 2014-06-11 2014-09-03 北京奇虎科技有限公司 Method and device for identifying information attributes of data in web pages
CN104750754A (en) * 2013-12-31 2015-07-01 北龙中网(北京)科技有限责任公司 Website industry classification method and server
CN104834735A (en) * 2015-05-18 2015-08-12 大连理工大学 Automatic document summarization extraction method based on term vectors
CN105488021A (en) * 2014-09-15 2016-04-13 华为技术有限公司 Method and device generating multi-file summary
CN105787025A (en) * 2016-02-24 2016-07-20 腾讯科技(深圳)有限公司 Network platform public account classifying method and device
CN106227722A (en) * 2016-09-12 2016-12-14 中山大学 A kind of extraction method based on listed company's bulletin summary
CN106453033A (en) * 2016-08-31 2017-02-22 电子科技大学 Multilevel Email classification method based on Email content
CN106484266A (en) * 2016-10-18 2017-03-08 北京锤子数码科技有限公司 A kind of text handling method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763367A (en) * 2008-12-08 2010-06-30 新奥特硅谷视频技术有限责任公司 Method and device for setting file labels
CN101609450A (en) * 2009-04-10 2009-12-23 南京邮电大学 Web page classification method based on training set
CN107169049B (en) * 2017-04-25 2023-04-28 腾讯科技(深圳)有限公司 Application tag information generation method and device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620596A (en) * 2008-06-30 2010-01-06 东北大学 Multi-document auto-abstracting method facing to inquiry
US20110103682A1 (en) * 2009-10-29 2011-05-05 Xerox Corporation Multi-modality classification for one-class classification in social networks
CN102609539A (en) * 2012-02-16 2012-07-25 北京搜狗信息服务有限公司 Search method and search system
CN103324628A (en) * 2012-03-21 2013-09-25 腾讯科技(深圳)有限公司 Industry classification method and system for text publishing
CN104750754A (en) * 2013-12-31 2015-07-01 北龙中网(北京)科技有限责任公司 Website industry classification method and server
CN104021185A (en) * 2014-06-11 2014-09-03 北京奇虎科技有限公司 Method and device for identifying information attributes of data in web pages
CN105488021A (en) * 2014-09-15 2016-04-13 华为技术有限公司 Method and device generating multi-file summary
CN104834735A (en) * 2015-05-18 2015-08-12 大连理工大学 Automatic document summarization extraction method based on term vectors
CN105787025A (en) * 2016-02-24 2016-07-20 腾讯科技(深圳)有限公司 Network platform public account classifying method and device
CN106453033A (en) * 2016-08-31 2017-02-22 电子科技大学 Multilevel Email classification method based on Email content
CN106227722A (en) * 2016-09-12 2016-12-14 中山大学 A kind of extraction method based on listed company's bulletin summary
CN106484266A (en) * 2016-10-18 2017-03-08 北京锤子数码科技有限公司 A kind of text handling method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何伟骏: "基于层次—互斥模型的多标签分类算法的研究与应用" *
李杨: "分类学术文献搜索引擎的应用和研究" *

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018196561A1 (en) * 2017-04-25 2018-11-01 腾讯科技(深圳)有限公司 Label information generating method and device for application and storage medium
CN110019663A (en) * 2017-09-30 2019-07-16 北京国双科技有限公司 A kind of method for pushing, system, storage medium and the processor of case information
CN108205674B (en) * 2017-12-22 2022-04-15 广州爱美互动网络科技有限公司 Social APP content identification method, electronic device, storage medium and system
CN108205674A (en) * 2017-12-22 2018-06-26 广州爱美互动网络科技有限公司 Content identification method, electronic equipment, storage medium and the system of social APP
CN110069769A (en) * 2018-01-22 2019-07-30 腾讯科技(深圳)有限公司 Using label generating method, device and storage equipment
CN110069769B (en) * 2018-01-22 2023-05-02 腾讯科技(深圳)有限公司 Application label generation method and device and storage device
CN108280202A (en) * 2018-01-30 2018-07-13 湖南蚁坊软件股份有限公司 A kind of real-time streams label frame of dynamic scalable
CN108280202B (en) * 2018-01-30 2020-10-30 湖南蚁坊软件股份有限公司 Dynamic extensible real-time flow label system
CN108764007A (en) * 2018-02-10 2018-11-06 集智学园(北京)科技有限公司 Based on OCR with text analysis technique to the measurement method of attention
CN108563722A (en) * 2018-04-03 2018-09-21 有米科技股份有限公司 Trade classification method, system, computer equipment and the storage medium of text message
CN110427542A (en) * 2018-04-26 2019-11-08 北京市商汤科技开发有限公司 Sorter network training and data mask method and device, equipment, medium
CN108763194A (en) * 2018-04-27 2018-11-06 广州优视网络科技有限公司 Using mark stamp methods, device, storage medium and computer equipment
CN108763194B (en) * 2018-04-27 2022-09-27 阿里巴巴(中国)有限公司 Method and device for applying label labeling, storage medium and computer equipment
CN108595660A (en) * 2018-04-28 2018-09-28 腾讯科技(深圳)有限公司 Label information generation method, device, storage medium and the equipment of multimedia resource
CN108764141A (en) * 2018-05-25 2018-11-06 广州虎牙信息科技有限公司 A kind of scene of game describes method, apparatus, equipment and its storage medium
CN108764141B (en) * 2018-05-25 2021-07-02 广州虎牙信息科技有限公司 Game scene description method, device, equipment and storage medium thereof
CN110781292A (en) * 2018-07-25 2020-02-11 百度在线网络技术(北京)有限公司 Text data multi-level classification method and device, electronic equipment and storage medium
CN110909157B (en) * 2018-09-18 2023-04-11 阿里巴巴集团控股有限公司 Text classification method and device, computing equipment and readable storage medium
CN110909157A (en) * 2018-09-18 2020-03-24 阿里巴巴集团控股有限公司 Text classification method and device, computing equipment and readable storage medium
CN109657574A (en) * 2018-12-05 2019-04-19 深圳市子瑜杰恩科技有限公司 The stage property classification method and Related product of short-sighted frequency
CN109784368A (en) * 2018-12-11 2019-05-21 同盾控股有限公司 A kind of determination method and apparatus of application program classification
CN109795942A (en) * 2019-01-17 2019-05-24 杭州海康睿和物联网技术有限公司 Staircase control system, staircase monitoring device and its intelligent control method
CN111694962A (en) * 2019-03-15 2020-09-22 阿里巴巴集团控股有限公司 Data processing method and device
CN110532394A (en) * 2019-09-11 2019-12-03 携程计算机技术(上海)有限公司 The processing method and system of Order Remarks text
CN110532394B (en) * 2019-09-11 2023-04-07 携程计算机技术(上海)有限公司 Order remark text processing method and system
WO2021092871A1 (en) * 2019-11-13 2021-05-20 北京数字联盟网络科技有限公司 Application preference text classification method based on textrank
CN111079376A (en) * 2019-11-14 2020-04-28 贝壳技术有限公司 Data labeling method, device, medium and electronic equipment
CN111079376B (en) * 2019-11-14 2021-04-16 北京房江湖科技有限公司 Data labeling method, device, medium and electronic equipment
CN110910175A (en) * 2019-11-26 2020-03-24 上海景域文化传播股份有限公司 Tourist ticket product portrait generation method
CN110910175B (en) * 2019-11-26 2023-07-28 上海景域文化传播股份有限公司 Image generation method for travel ticket product
CN112506556A (en) * 2020-11-19 2021-03-16 杭州云深科技有限公司 Application program classification method and device, computer equipment and storage medium
CN112506556B (en) * 2020-11-19 2023-08-25 杭州云深科技有限公司 Application program classification method, device, computer equipment and storage medium
CN112565250B (en) * 2020-12-04 2022-12-06 中国移动通信集团内蒙古有限公司 Website identification method, device, equipment and storage medium
CN112565250A (en) * 2020-12-04 2021-03-26 中国移动通信集团内蒙古有限公司 Website identification method, device, equipment and storage medium
CN115688107A (en) * 2022-12-28 2023-02-03 卓望数码技术(深圳)有限公司 Fraud-related APP detection system and method
CN117725515A (en) * 2024-02-07 2024-03-19 北京肿瘤医院(北京大学肿瘤医院) Quality classification method, system, storage medium and product for clinical test of medicine

Also Published As

Publication number Publication date
CN107169049B (en) 2023-04-28
WO2018196561A1 (en) 2018-11-01

Similar Documents

Publication Publication Date Title
CN107169049A (en) The label information generation method and device of application
EP3866026A1 (en) Theme classification method and apparatus based on multimodality, and storage medium
CN111026842B (en) Natural language processing method, natural language processing device and intelligent question-answering system
Li et al. Localizing and quantifying damage in social media images
CN109960800A (en) Weakly supervised file classification method and device based on Active Learning
Hoque et al. Real time bangladeshi sign language detection using faster r-cnn
CN106886580B (en) Image emotion polarity analysis method based on deep learning
CN112287157A (en) Automatic detection of user-requested objects in an image
CN112270196A (en) Entity relationship identification method and device and electronic equipment
CN109599187A (en) A kind of online interrogation point examines method, server, terminal, equipment and medium
CN110232112A (en) Keyword extracting method and device in article
CN107436916B (en) Intelligent answer prompting method and device
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
CN109582788A (en) Comment spam training, recognition methods, device, equipment and readable storage medium storing program for executing
CN106227836B (en) Unsupervised joint visual concept learning system and unsupervised joint visual concept learning method based on images and characters
CN113157859A (en) Event detection method based on upper concept information
CN110008365A (en) A kind of image processing method, device, equipment and readable storage medium storing program for executing
CN110516259A (en) A kind of recognition methods, device, computer equipment and the storage medium of key problem in technology word
CN109657096A (en) A kind of ancillary statistics report-generating method based on teaching of low school age audio-video
CN114661951A (en) Video processing method and device, computer equipment and storage medium
CN113806574A (en) Software and hardware integrated artificial intelligent image recognition data processing method
CN116341519A (en) Event causal relation extraction method, device and storage medium based on background knowledge
Shaharabany et al. Similarity maps for self-training weakly-supervised phrase grounding
CN112800259B (en) Image generation method and system based on edge closure and commonality detection
CN114708462A (en) Method, system, device and storage medium for generating detection model for multi-data training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant