CN106997340A - The generation of dictionary and the Document Classification Method and device using dictionary - Google Patents

The generation of dictionary and the Document Classification Method and device using dictionary Download PDF

Info

Publication number
CN106997340A
CN106997340A CN201610048630.5A CN201610048630A CN106997340A CN 106997340 A CN106997340 A CN 106997340A CN 201610048630 A CN201610048630 A CN 201610048630A CN 106997340 A CN106997340 A CN 106997340A
Authority
CN
China
Prior art keywords
series
document
keyword
word
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610048630.5A
Other languages
Chinese (zh)
Other versions
CN106997340B (en
Inventor
肖汉平
刘健鹏
沈栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610048630.5A priority Critical patent/CN106997340B/en
Publication of CN106997340A publication Critical patent/CN106997340A/en
Application granted granted Critical
Publication of CN106997340B publication Critical patent/CN106997340B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses the generation of dictionary and the Document Classification Method and device of utilization dictionary.The generation method of dictionary includes:Obtain the heading message for specifying each document in network interaction platform under each series of given level;By the heading message of each document, determine that each series distinguishes the weight point of corresponding each keyword and each keyword under correspondence series;It is mitogenetic into dictionary according to the weight that each series, each series distinguish corresponding each keyword and each keyword.The dictionary that the generation method of the dictionary passes through each document structure tree under each series in specified network interaction platform, for by the document classification in objective network interaction platform to specified network interaction platform, so as to be laid a good foundation for the resource consolidation between homogeneous network interaction platform.

Description

The generation of dictionary and the Document Classification Method and device using dictionary
Technical field
The application is related to field of computer technology, more particularly to the generation of dictionary and the document of utilization dictionary divide Class method and device.
Background technology
With the sustainable development of Internet technology, increasing network interaction platform is occurred in that, these networks Interaction platform provides a great convenience for daily life.For example, network electric business platform makes User can easily buy commodity on the net;The appearance of network game platform allows user to enter by network Row electronic game;E-learning Platform allows user easily to learn by network.
However, the appearance of a large amount of network interaction platforms, while the life given people offers convenience, also for Resource consolidation between homogeneous network interaction platform adds difficulty.For example, for two network electric business platforms A and B, if necessary to which the commodity of B platforms are incorporated into A platforms, due to right on A platforms and B platforms It is typically different in the standard of commodity classification, or even series may also be different, are at this time generally difficult to B The commodity of platform are incorporated into A platforms.Therefore, when being implement resource integration between homogeneous network interaction platform, Due to the criteria for classification even difference of series, it is generally difficult to implement resource integration.Therefore, homogeneous network How to be implement resource integration between interaction platform is current industry urgent problem to be solved.
The content of the invention
The embodiment of the present application provides the generation of dictionary and the Document Classification Method and device using dictionary, is used for Solve in the prior art, when being implement resource integration between homogeneous network interaction platform, due to criteria for classification even The difference of series, the problem of being generally difficult to implement resource integration.
The embodiment of the present application provides a kind of generation method of dictionary, and the dictionary, which is used to interact objective network, puts down Document classification in platform is to specified network interaction platform, and this method includes:
Obtain the heading message for specifying each document in network interaction platform under each series of given level;
By the heading message of each document, determine each series distinguish corresponding each keyword with And weight point of each keyword under correspondence series, the weight point is used to assessing by keyword Discrimination when place series makes a distinction with other series outside the series, it is described Keyword refers under the corresponding series of the keyword according to the specified word of pre-defined rule;
Corresponding each keyword and each institute are distinguished according to each series, each series The weight for stating keyword is mitogenetic into dictionary.
It is preferred that, weight point of the keyword under correspondence series is determined by the following method:
Determine word frequency of the keyword in corresponding series;
By in the title of each document under each series comprising the keyword number of files and The sum of each series Documents, determines the anti-document frequency of the keyword;
Pass through word frequency of the keyword in corresponding series and the anti-text of the keyword Shelves frequency determines weight point of the keyword under correspondence series.
It is preferred that, the heading message by each document determines that each series is corresponded to respectively Each keyword specifically include:
The title of each document is normalized;
Cutting word processing and entity word identification are carried out to the title of each document after normalized, obtained Entity word for describing entity things;
Determine that each series distinguishes the word frequency of corresponding each entity word, the word frequency of entity word is used to describe The times or frequency that the entity word occurs in the corresponding series;
Using each entity word as current entity word, judge whether the word frequency of the current entity word is big In predetermined threshold value, if so, the current entity word to be then defined as to the keyword of corresponding series.
It is preferred that, methods described also includes, when the word frequency of the corresponding each entity word of current class classification is respectively less than During predetermined threshold value, the maximum entity word of word frequency in the current class classification is defined as the current class class Keyword in mesh, the current class classification includes any one series in each series.
It is preferred that, it is described according to each series, each series difference is corresponding each described closes The weight of keyword and each keyword is mitogenetic to be specifically included into dictionary:
Using each keyword as current key word, according to the current key word, the current pass The weight of the corresponding series of keyword and the current key word under correspondence series is mitogenetic into pass Joint group;
According to each associated group generation dictionary.
It is preferred that, each series of the given level specifically includes each one-level classification and/or leaf classification.
The embodiment of the present application also provides a kind of Document Classification Method, and this method includes:
The heading message of document is specified in acquisition objective network interaction platform and determines the mark of the specified document Corresponding each entity word is inscribed, the entity word is used to describe entity things;
By inquiring about dictionary, the weight point under each series of each entity word in the dictionary is determined, The dictionary that the dictionary generates for the generation method of the dictionary according to above-mentioned any one;
By each entity word weight under each series point, determine the specified document each Respective total score under the series;
By the specified document, respective total score is entered to the specified document under each series Row classification.
It is preferred that, it is described by the specified document under each series respective total score to described Specify document to carry out classification to specifically include:
By the total score being sequentially generated from high to low of each series according to the specified document Queue, the series of predetermined quantity, institute are chosen according to the order in queue from the original position of the queue Stating the series of predetermined quantity at least includes two series;
Obtain initial classification of the specified document in objective network interaction platform;
Each series and the phase of initial classes purpose first of selected predetermined quantity are determined respectively Like spending, first similarity is used to describe the initial classification and the similarity degree of each series;
Each series of predetermined quantity selected by determining respectively and the title of the specified document Second similarity, second similarity is used for the title for describing the specified document and each series Similarity degree;
, will be described pre- using each series of selected predetermined quantity as pre-selection series The total score corresponding to series, first similarity and second similarity is selected to be added Power summation, classifies according to the result of weighted sum to the specified document.
It is preferred that, it is described by the specified document under each series respective total score to described Specify document to carry out classification to specifically include:
By the specified document under each series respective total score, determine the specified document Maximum total score corresponding to maximum total score series and by the specified document classification to it is described most Big total score series.
It is preferred that, the weight by each entity word under each series point, it is determined that described Document respective total score under each series is specified to specifically include:
Using each series as current class classification, by the way that each entity word is worked as described Weight under preceding series point summed, and determines the specified document in the current class class now Total score.
It is preferred that, the corresponding each entity word of title for determining the specified document is specifically included:
The title of the specified document is normalized;
Cutting word processing and entity word identification are carried out to the title of the specified document after normalized, really The corresponding each entity word of title of the fixed specified document.
The embodiment of the present application also provides a kind of generating means of dictionary, and the dictionary is used to interact objective network Document classification in platform is to specified network interaction platform, and the device includes:
Acquiring unit, determining unit and generation unit, wherein:
Acquiring unit, for obtaining each text in specified network interaction platform under each series of given level The heading message of shelves;
Determining unit, for the heading message by each document, determines that each series is right respectively Weight point of each keyword and each keyword answered under correspondence series, the weight point is used for The series where by keyword is assessed to make a distinction with other series outside the series When discrimination, the keyword refer under the corresponding series of the keyword according to pre-defined rule specify Word;
Generation unit, for according to each series, each series difference it is corresponding it is each described in The weight of keyword and each keyword is mitogenetic into dictionary.
The embodiment of the present application also provides a kind of document sorting apparatus, and the device includes:
Acquiring unit, query unit, determining unit and taxon, wherein:
Acquiring unit, the heading message of document is specified for obtaining and is determined described in objective network interaction platform The corresponding each entity word of title of document is specified, the entity word is used to describe entity things;
Query unit, for by inquiring about dictionary, determining each classification of each entity word in the dictionary Class weight now point, the dictionary is the dictionary generated according to a kind of generating means of above-mentioned dictionary;
Determining unit, for the weight point by each entity word under each series, determines institute State specified document respective total score under each series;
Taxon, for by the specified document under each series respective total score to institute State specified document and carry out a kind of device of classification, including:.
At least one above-mentioned technical scheme that the embodiment of the present application is used can reach following beneficial effect:
The heading message of document is specified by obtaining and entity word is determined by heading message, then passes through inquiry Dictionary determines entity word corresponding weight point under each series, determines that this specifies document by weight point Total score under each series, by total score to specifying document to classify.It is existing so as to solve There is the difference due to criteria for classification, the problem of being difficult to resource consolidation between homogeneous network interaction platform.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, The schematic description and description of the application is used to explain the application, does not constitute the improper limit to the application It is fixed.In the accompanying drawings:
Fig. 1 implements schematic flow sheet for a kind of generation method of dictionary of the offer of the embodiment of the present application 1;
Fig. 2 is a kind of tool of the generation method for dictionary that the embodiment of the present application 1 is provided in practical application scene Body implementation process schematic diagram;
Fig. 3 implements schematic flow sheet for a kind of Document Classification Method of the offer of the embodiment of the present application 2;
By total score to specifying document in a kind of Document Classification Method that Fig. 4 provides for the embodiment of the present application 2 That is classified implements schematic flow sheet;
Fig. 5 is a kind of concrete structure schematic diagram of the generating means for dictionary that the embodiment of the present application 3 is provided;
Fig. 6 is a kind of concrete structure schematic diagram for document sorting apparatus that the embodiment of the present application 4 is provided.
Embodiment
It is specifically real below in conjunction with the application to make the purpose, technical scheme and advantage of the application clearer Apply example and technical scheme is clearly and completely described corresponding accompanying drawing.Obviously, it is described Embodiment is only some embodiments of the present application, rather than whole embodiments.Based on the implementation in the application Example, the every other implementation that those of ordinary skill in the art are obtained under the premise of creative work is not made Example, belongs to the scope of the application protection.
Below in conjunction with accompanying drawing, the technical scheme that each embodiment of the application is provided is described in detail.
Embodiment 1
Embodiment 1 provides a kind of generation method of dictionary, and the dictionary of this method generation is used for objective network Document classification in interaction platform is to specified network interaction platform, so as to be the money between homogeneous network interaction platform Integrate and lay a good foundation in source.The idiographic flow schematic diagram of this method is as shown in figure 1, comprise the steps:
Step S11:Each document under each series of given level in network interaction platform is specified in acquisition Heading message.
Specified network interaction platform mentioned here be it is relative with for objective network interaction platform, objective network Interaction platform refers to the platform that resource is integrated, and specified network interaction platform refers to receive the platform reallocated resources. For example, when by the resource consolidation of first network game platform to second network game platform, first network game platform For objective network interaction platform, while second network game platform is to specify network interaction platform.In addition, for The type of network interaction platform and objective network interaction platform is specified not limit herein, that is to say, that it Can be network electric business platform, network game platform etc..
In network interaction platform, for convenience user find need resource, it usually needs by resource according to Certain standard is divided into multiple series, and these series can also include multiple different grades.Example Such as, in network electric business platform, first-level class classification includes books, phonotapes and videotapes, electronic apparatus and household building materials etc.; Wherein the books, phonotapes and videotapes of first-level class classification, includes secondary classification classification children's book, musical instrument etc. again.Specify The series of grade can be one-level classification or two grades of classifications or leaf classification, It can be the series for the other grades specified.The series of the given level is excellent in actual applications The scheme of choosing can be one-level classification and/or leaf classification, and wherein leaf classification refers to flat in specified network interaction The subclassification classification of series least significant end in platform, that is to say, that do not have in this specifies network interaction platform The leaf classification is continued into downward subdivision.
In network interaction platform, various resources are generally embodied in server end in the form of document.For example, On network electric business platform, for each commodity in the generally all respective corresponding document of service end, a commodity Corresponding document includes Document Title, and the description information of the classification of the commodity classification, and the document are existed Address on network.
It can be obtained by this step in specified network interaction platform, under each series of given level Each document heading message.
Step S12:By the heading message of each document, determine that each series difference is corresponding each The weight point of keyword and each keyword under correspondence series.
Herein, the weight point be used to assess where by keyword series and the series it Discrimination when outer other series make a distinction, the keyword refers to the corresponding classification of the keyword Class is now according to the specified word of pre-defined rule.
This step can be handled by the heading message of each document by the title to these documents, Determine that each series distinguishes the weight of corresponding keyword and these keywords under correspondence series Point.The title of document is handled in actual applications, determines that each series distinguishes corresponding keyword Method can include it is a variety of, a kind of preferred scheme is as shown in step 121.
Step 121:The title of each document is normalized;To each institute after normalized The title for stating document carries out cutting word processing and entity word identification, obtains the entity word for describing entity things; Determine that each series distinguishes the word frequency of corresponding each entity word, the word frequency of entity word is used to describe described The times or frequency that entity word occurs in the corresponding series;Using each entity word as Current entity word, judges whether the word frequency of the current entity word is more than predetermined threshold value, if so, then will be described Current entity word is defined as the keyword of corresponding series.
Normalized can include in actual applications letter it is big first write conversion, traditional font and simplified conversion and Tense conversion etc..Title progress cutting word processing to document can generally use tokenizer, conventional cutting word Instrument has word2vec.Entity word identification can generally be completed by entity dictionary, and entity word refers to description The word of entity things, such as trade name, trade mark, title, game name.After entity word identification, The word frequency of each entity word under each series is counted respectively;Then by comparing the word frequency of these entity words With the size of predetermined threshold value, word frequency is selected more than predetermined threshold value entity word as the corresponding classification of the entity word The keyword of class now.For example, 3 entity words under electronic apparatus series be respectively " computer ", " mobile phone " and " charger ", wherein the word frequency of " computer " is 300, the word frequency of " mobile phone " is 350, " is filled The word frequency of electrical equipment " is 100, and predetermined threshold value can be set according to actual needs, if predetermined threshold value is set to 200, Then " computer " and " mobile phone " will be confirmed as keyword under the electronic apparatus series.
It should be noted that in order to avoid under some series in step 121, word frequency highest entity The word frequency of word is again smaller than predetermined threshold value, so that the series is excluded.Therefore when some classification of appearance In classification, the word frequency of word frequency highest entity word again smaller than predetermined threshold value situation when, can be by suitably reducing The mode of predetermined threshold value to solve.Certainly, it is a kind of preferred when there is such case in actual applications Scheme is, when the word frequency of the corresponding each entity word of current class classification is respectively less than predetermined threshold value, described will work as The maximum entity word of word frequency is defined as the keyword in the current class classification in preceding series, described to work as Preceding series includes any one series in each series.For example, musical instrument series Lower word frequency highest entity word is " guitar ", and the word frequency of " guitar " is 86 under the series, if in advance If threshold value is 100, in order to prevent musical instrument series to be excluded, " guitar " can be defined as to musical instrument point The keyword of class class now.
In fact, it should be noted that first can also be cut to the title of each document in step 121 Word processing, then the result after cutting word is normalized and entity word identification, so as to obtain being used to retouch State the entity word of entity things.
Weight point of each keyword under correspondence series can also be determined in step 12, can be direct Using word frequency of each keyword under correspondence series as weight point, it can also determine by other means Weight point.In actual applications, a kind of preferred scheme for determining keyword weight point under correspondence series As shown at 122.
Step 122:Determine word frequency of the keyword in corresponding series;Pass through each classification The number of files and each series of the keyword are included in the title of each document of class now hereafter The sum of shelves, determines the anti-document frequency of the keyword;By the keyword in corresponding series In word frequency and the keyword the anti-document frequency determine the keyword correspondence series Under weight point.
In step 122, in order to determine some keyword weight under corresponding series point, first really Fixed word frequency (Term Frequency, TF) of the keyword under corresponding series;It is then determined that each In the title of each document under individual series under number of files n and each series comprising the keyword Document total N, and determine by n and N anti-document frequency (the Inverse Document of the keyword Frequency, IDF);Finally by word frequency under correspondence series of the keyword and the keyword Anti- document frequency determines weight point of the keyword under correspondence series.
In actual applications, the anti-document frequency of the keyword can be determined by following formula:
IDF=log (N/n)
Wherein, N is the sum of the document under each series, and n is each document under each series Title in include the number of files of the keyword.Log (N/n) refers to take the logarithm to N/n value, the logarithm Bottom can be set according to actual needs, in actual applications log (N/n) typically natural logrithms ln(N/n)。
In actual applications, weight point of the keyword under correspondence series can pass through, by the keyword Word frequency under correspondence series is multiplied with the anti-document frequency of the keyword to be obtained.
Step S13:Corresponding each keyword is distinguished according to each series, each series And the weight of each keyword is mitogenetic into dictionary.
According to distinguishing each corresponding keyword under each series, each series and each is crucial Correspondence series under weight it is mitogenetic into dictionary.Keyword, classification class can be included in the dictionary Three attribute items of mesh and weight point.
It is a kind of generate dictionary preferred scheme be, using each keyword as current key word, according to The current key word, the corresponding series of the current key word and the current key word exist Weight under correspondence series is mitogenetic into associated group;According to each associated group generation dictionary.
The word stock generating method provided using embodiment 1, is obtained and specifies given level in network interaction platform Each series under each document heading message, by the heading message of each document, determine each classification class Mesh distinguishes the weight point of corresponding each keyword and each keyword under correspondence series, then according to each Series, each series distinguish corresponding each keyword and the weight of each keyword is mitogenetic into word Storehouse.This method is used for by the dictionary of each document structure tree under each series in specified network interaction platform By the document classification in objective network interaction platform to specified network interaction platform, so as to be homogeneous network interaction Resource consolidation between platform is laid a good foundation.
In order to make it easy to understand, the embodiment of the present application is additionally provided under a kind of practical application scene, implement described The schematic diagram of word stock generating method, as shown in Figure 2.Under the application scenarios, specified network interaction platform is Network electric business platform A, objective network interaction platform is network electric business platform B, the series of given level For leaf classification.Methods described specifically may comprise steps of:
Step S21:Obtain the heading message of each document of each leaf class now in network electric business platform A.
Step S22:The title of each document is normalized.
Step S23:Cutting word processing and entity word are carried out to the title of each document after normalized Identification, obtains the entity word for describing entity things.
Step S24:Determine that each leaf classification distinguishes the word frequency of corresponding each entity word.
Step S25:Using each entity word as current entity word, the current entity word is judged Whether word frequency is more than predetermined threshold value, if so, then performing step S26.
Step S26:The current entity word is defined as to the keyword of corresponding leaf classification.
Step S27:The text of the keyword is included in title by each document of each leaf class now The sum of gear number and each leaf classification Documents, determines the anti-document frequency of the keyword.
Step S28:Pass through word frequency of the keyword in corresponding leaf classification and the keyword The anti-document frequency determines the weight point of the keyword in correspondence leaf class now.
Step S29:Corresponding each keyword is distinguished according to each leaf classification, each leaf classification And the weight of each keyword is mitogenetic into dictionary.
In actual applications, a kind of dictionary generated according to this method is as shown in table 1.
A kind of dictionary in the practical application of table 1..
Keyword Leaf classification Weight point
Notebook Computer 7
Apple Mobile phone 4
Notebook Stationery 5
It should be noted that the executive agent that embodiment 1 provides each step of method may each be same and set It is standby, or, each step of this method can also be used as executive agent by distinct device.Such as, step 11 Executive agent with step 12 can be equipment 1;Again such as, the executive agent of step 11 can be equipment 1, the executive agent of step 12 sum can be equipment 2;Etc..
Embodiment 2
Embodiment 2 provides a kind of Document Classification Method, and this method utilizes the dictionary that embodiment 1 is generated, will Document classification in objective network interaction platform is to specified network interaction platform, so as to solve existing due to dividing The difference of class standard, the problem of being difficult to resource consolidation between homogeneous network interaction platform.This method Idiographic flow schematic diagram is as shown in figure 3, comprise the steps:
Step S31:Obtain the heading message that document is specified in objective network interaction platform and determine described specify The corresponding each entity word of title of document, the entity word is used to describe entity things.
Objective network interaction platform mentioned here refers to the platform that resource is integrated.For example, by first network When the resource consolidation of gaming platform is to second network game platform, first network game platform is that objective network interaction is flat Platform.In addition, the type for objective network interaction platform is not limited herein, that is to say, that they can To be network electric business platform, network game platform etc..
In network interaction platform, various resources are generally embodied in server end in the form of document.For example, On network electric business platform, for each commodity in the generally all respective corresponding document of service end, a commodity Corresponding document includes Document Title, and the description information of the classification of the commodity classification, and the document are existed Address on network.Specified document refers in objective network interaction platform, it is necessary to be integrated into specified network The corresponding document of resource in interaction platform.
By specifying the heading message of document, determine that this specifies each corresponding entity word of title of document, institute Stating entity word is used to describe entity things, such as trade name, trade mark, title, game name.
In actual applications, by specify document title, determine this specify document title it is corresponding each The method of entity word has a lot, and a kind of preferred scheme is as shown in step S311.
Step S311:The title of the specified document is normalized;To the institute after normalized The title for stating specified document carries out cutting word processing and entity word identification, determines the title pair of the specified document Each entity word answered.
Normalized can include in actual applications letter it is big first write conversion, traditional font and simplified conversion and Tense conversion etc..Cutting word processing is carried out to the title for specifying document can generally use tokenizer, conventional Tokenizer has word2vec.Entity word identification can generally be completed by entity dictionary.Know in entity word After not, the word frequency of each entity word under each series is counted respectively;Then by comparing these entity words Word frequency and predetermined threshold value size, select word frequency more than predetermined threshold value entity word as the entity word correspondence Series under keyword.
The entity word determined by step S131, due to be normalized and entity word identification, because This more accurate and standardization.
Step S32:By inquiring about dictionary, each series of each entity word in the dictionary is determined Under weight point.
The dictionary is the dictionary of the word stock generating method generation provided according to embodiment 1.Lead in the dictionary Often include keyword, series and weight point three attribute items, wherein, weight point is used to assessing will Discrimination when series makes a distinction with other series outside the series where keyword, Keyword refers under the corresponding series of keyword according to the specified word of pre-defined rule.
By inquiring about the dictionary, determine that each entity word is corresponding respectively under each series of the dictionary Weight point.In actual applications, it will usually which appearing in does not have the entity word to be searched under some series Situation, at this time weight of the entity word under these series can be divided into predetermined constant, lead to The normal predetermined constant can be 0.For example, entity word is " rice ", in the series " computer " of dictionary Do not occur the entity word below, therefore the weight of " rice " under the series can be set up separately and be set to Preset constant.
Step S33:By each entity word weight under each series point, the finger is determined Determine document respective total score under each series.
This step is used to determine that this specifies document respective total score under each series of dictionary, with this Illustrate the determination process of total score exemplified by specified total score of the document under one of series of dictionary, The series is called current class classification herein.Generally each reality in Document Title can be specified by this The weight of pronouns, general term for nouns, numerals and measure words respectively in current class classification point, determines that this specifies document in current class class now total Score.
In actual applications, each entity word is specified in Document Title respectively in current class classification by this Weight point, determine that this specifies the mode of total score of the document in current class class now to have a variety of.For example, The weight of each entity word point can be weighted summation, determine the specified document in current class class now Total score, the wherein weights of entity word can determine according to the entity word own type, entity word itself class Type has title, place name, brand word etc.;Can also using each series as current class classification, Summed by the weight point by each entity word in the current class class now, determine described specify Total score of the document in the current class class now.
Step S34:By the specified document, respective total score refers to described under each series Determine document to be classified.
It is determined that this specifies document under each series after respective total score, by each classification class Now respective total score is to specifying document to classify.
It is this under each series respective total score to specify document classify by way of have It is a variety of.For example, can be arranged according to the size order of total score, this is specified into document classification to must Under point maximum series, namely by the specified document is respective under each series must Point, determine the maximum total score series corresponding to the maximum total score of the specified document and refer to described Document classification is determined to the maximum total score series.
Further, the problem of would generally encountering various in actual applications, therefore passing through total score Classification is may result in when being classified not accurate enough.For example, Document Title is " apple iphone4s data Line ", the entity word after the document title is split includes " apple ", " iphone4s " and " data wire ", by It is higher in " apple " and " iphone4s " the two entity words weight under series " mobile phone " point, Normally result in and be categorized into the document under " mobile phone this " series.Therefore, such as Fig. 4, Yi Zhongtong The respective total score under each series is crossed, to the preferred scheme such as step for specifying document to be classified Shown in S341 to step S345.
Step S341:By each series according to the specified document the total score from high to low Queue is sequentially generated, the classification of predetermined quantity is chosen according to the order in queue from the original position of the queue Classification.
The series of the predetermined quantity at least includes two series, in actual applications generally can be with Multiple series are chosen as needed, for example, choose total score 100 series of highest successively.
Step S342:Obtain initial classification of the specified document in objective network interaction platform.
Due to this in objective network interaction platform specify document belonging to initial classification in, generally also can comprising pair This specifies the useful information of document classification, therefore can obtain the specified document in objective network interaction platform Initial classes purpose information.
In addition, the document classification of each in network interaction platform typically there are multiple grades, this is caused to specify document Initial classification may be by multiple, and one kind, which chooses the initial classes purpose preferred scheme, is, selected is described initial Classification is identical with the series classification grade.For example, when the series is one-level classification, choosing The initial classification taken is also one-level classification;When the series is leaf classification, selected initial classes Mesh is also leaf classification.
Step S343:Each series and the initial classification of selected predetermined quantity are determined respectively The first similarity, first similarity be used for the initial classification and the phase of each series are described Like degree.
Determining the mode of series and the similarity of initial classes purpose first has a variety of, for example, can pass through word The mode of matching determines their the first similarity, and their similarity can also be determined by other means. A kind of preferred scheme be determine respectively each series of selected predetermined quantity with it is described initial The Jacard of classification is like degree.
Using each word in series as set A, using each word in initial classification as set B, It is as follows to set A and B, Jaccard Similarity Measure:
Jaccard (A, B)=| A intersect B |/| A union B |
Wherein | A intersect B | represent A and B common factor number of words, | A union B | represent A and B's and The number of words of collection.Generally, Jaccard (A, B) numerical value is between [0,1], as A==B namely A and B When identical, Jaccard (A, B)=1.
Step S344:Each series and the specified document of selected predetermined quantity are determined respectively Title the second similarity, second similarity be used to describing the title of the specified document with it is each described The similarity degree of series.
The mode of series and the second similarity of the title of specified document is determined, can be with determining the first phase Mode like degree is identical, can also be different with the mode for determining the first similarity.In practical application, it is determined that point The title of class classification and specified document the second similarity it is preferable that, determine series with specifying text The Jacard similarities of the title of shelves.
Step S345:Using each series of selected predetermined quantity as pre-selection series, By the total score corresponding to the pre-selection series, first similarity and described second similar Degree is weighted summation, and the specified document is classified according to the result of weighted sum.
By each using each series of selected predetermined quantity as pre-selection series, By the way that the total score of the pre-selection series, the first similarity and the second similarity are weighted into summation, By the result of the weighted sum to specifying document to classify.Wherein, for pre-selection series must Divide, the first similarity and the respective corresponding weights of the second similarity can be determined according to actual conditions, lead to In the case of often, the corresponding weights of each of which can also be disposed as 1.
In actual applications, generally for the accurate fixed of inspection-classification result, classification results will can also be entered Line sampling is examined, for example, predict hit rate according to first-level class classification sampling verification.
A kind of Document Classification Method provided using embodiment 2, this method utilizes the dictionary that embodiment 1 is generated, By the document classification in objective network interaction platform to specified network interaction platform, thus solve it is existing due to The difference of criteria for classification, the problem of being difficult to resource consolidation between homogeneous network interaction platform
Embodiment 3
Based on inventive concept same as Example 1, embodiment 3 provides a kind of generating means of dictionary, The dictionary of device generation is used to put down the document classification in objective network interaction platform to specified network interaction Platform, so as to be laid a good foundation for the resource consolidation between homogeneous network interaction platform.As shown in figure 5, the device 40 include:
Acquiring unit 401, determining unit 402 and generation unit 403, wherein:
Acquiring unit 401, for obtaining in specified network interaction platform under each series of given level The heading message of each document;
Determining unit 403, for the heading message by each document, determines each series difference The weight point of corresponding each keyword and each keyword under correspondence series, the weight point is used In assessing the series where by keyword area is carried out with other series outside the series The discrimination of timesharing, the keyword refers under the corresponding series of the keyword according to the finger of pre-defined rule Determine word;
Generation unit 403, for corresponding each according to each series, each series difference The weight of the keyword and each keyword is mitogenetic into dictionary.
The word stock generation device provided using embodiment 3, acquiring unit, which is obtained, specifies in network interaction platform The heading message of each document under each series of given level, determining unit is believed by the title of each document Breath, determines that each series distinguishes the power of corresponding each keyword and each keyword under correspondence series Divide again, generation unit distinguishes corresponding each keyword and each key according to each series, each series The weight of word is mitogenetic into dictionary.This method is each under each series in network interaction platform by specifying The dictionary of document structure tree, for by the document classification in objective network interaction platform to specified network interaction platform, So as to be laid a good foundation for the resource consolidation between homogeneous network interaction platform.
Embodiment 4
Based on inventive concept same as Example 2, embodiment 4 provides a kind of document sorting apparatus, should Device utilizes the dictionary that embodiment 3 is generated, by the document classification in objective network interaction platform to specified network Interaction platform, so as to solve existing due to the difference of criteria for classification, the hardly possible between homogeneous network interaction platform The problem of to implement resource integration.As shown in fig. 6, the device 50 includes:
Acquiring unit 501, query unit 502, determining unit 503 and taxon 504, wherein:
Acquiring unit 501, heading message and the determination of document are specified for obtaining in objective network interaction platform The corresponding each entity word of title of the specified document, the entity word is used to describe entity things;
Query unit 502, for by inquiring about dictionary, determining that each entity word is each in the dictionary Weight point under series, the dictionary generates for the word stock generation device according to claim 12 Dictionary;
Determining unit 503, for being divided by weight of each entity word under each series, really The fixed specified document respective total score under each series;
Taxon 504, for by the specified document under each series respective total score The specified document is classified
A kind of document sorting apparatus provided using embodiment 4, the mark for specifying document is obtained by acquiring unit Topic information simultaneously determines entity word by heading message, and then query unit is by inquiring about the word that embodiment 3 is generated Storehouse determines entity word corresponding weight point under each series, and determining unit determines that this refers to by weight point Determine total score of the document under each series, taxon is classified by total score.So as to solve It is existing due to the difference of criteria for classification, asking for resource consolidation is difficult between homogeneous network interaction platform Topic.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or meter Calculation machine program product.Therefore, the application can be using complete hardware embodiment, complete software embodiment or knot The form of embodiment in terms of conjunction software and hardware.Wherein wrapped one or more moreover, the application can be used Containing computer usable program code computer-usable storage medium (include but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) on the form of computer program product implemented.
The application is produced with reference to according to the method, equipment (system) and computer program of the embodiment of the present application The flow chart and/or block diagram of product is described.It should be understood that can by computer program instructions implementation process figure and / or each flow and/or square frame in block diagram and the flow in flow chart and/or block diagram and/ Or the combination of square frame.These computer program instructions can be provided to all-purpose computer, special-purpose computer, insertion Formula processor or the processor of other programmable data processing devices are to produce a machine so that pass through and calculate The instruction of the computing device of machine or other programmable data processing devices is produced for realizing in flow chart one The device for the function of being specified in individual flow or multiple flows and/or one square frame of block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or the processing of other programmable datas to set In the standby computer-readable memory worked in a specific way so that be stored in the computer-readable memory Instruction produce include the manufacture of command device, the command device realization in one flow or multiple of flow chart The function of being specified in one square frame of flow and/or block diagram or multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices, made Obtain and perform series of operation steps on computer or other programmable devices to produce computer implemented place Reason, so that the instruction performed on computer or other programmable devices is provided for realizing in flow chart one The step of function of being specified in flow or multiple flows and/or one square frame of block diagram or multiple square frames.
In a typical configuration, computing device includes one or more processors (CPU), input/defeated Outgoing interface, network interface and internal memory.
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory And/or the form, such as read-only storage (ROM) or flash memory (flash RAM) such as Nonvolatile memory (RAM). Internal memory is the example of computer-readable medium.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by appointing What method or technique realizes that information is stored.Information can be computer-readable instruction, data structure, program Module or other data.The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic random access memory (DRAM), its Random access memory (RAM), read-only storage (ROM), the electrically erasable of his type are read-only Memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic cassette tape, tape magnetic Disk storage or other magnetic storage apparatus or any other non-transmission medium, can be calculated available for storage The information that equipment is accessed.Defined according to herein, computer-readable medium does not include temporary computer-readable matchmaker The data-signal and carrier wave of body (transitory media), such as modulation.
It should also be noted that, term " comprising ", "comprising" or its any other variant be intended to it is non- It is exclusive to include, so that process, method, commodity or equipment including a series of key elements are not only wrapped Include those key elements, but also other key elements including being not expressly set out, or also include for this process, Method, commodity or the intrinsic key element of equipment.In the absence of more restrictions, by sentence " including One ... " limit key element, it is not excluded that in the process including key element, method, commodity or equipment Also there is other identical element.
It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer journey Sequence product.Therefore, the application can using complete hardware embodiment, complete software embodiment or combine software and The form of the embodiment of hardware aspect.Moreover, the application can be used wherein includes calculating one or more Machine usable program code computer-usable storage medium (include but is not limited to magnetic disk storage, CD-ROM, Optical memory etc.) on the form of computer program product implemented.
Embodiments herein is these are only, the application is not limited to.For people in the art For member, the application can have various modifications and variations.It is all to be made within spirit herein and principle Any modification, equivalent substitution and improvements etc., should be included within the scope of claims hereof.

Claims (13)

1. a kind of generation method of dictionary, it is characterised in that the dictionary, which is used to interact objective network, puts down Document classification in platform to specified network interaction platform, including:
Obtain the heading message for specifying each document in network interaction platform under each series of given level;
By the heading message of each document, determine each series distinguish corresponding each keyword with And weight point of each keyword under correspondence series, the weight point is used to assessing by keyword Discrimination when place series makes a distinction with other series outside the series, it is described Keyword refers under the corresponding series of the keyword according to the specified word of pre-defined rule;
Corresponding each keyword and each institute are distinguished according to each series, each series The weight for stating keyword is mitogenetic into dictionary.
2. method as claimed in claim 1, it is characterised in that power of the keyword under correspondence series Divide again and determine by the following method:
Determine word frequency of the keyword in corresponding series;
By in the title of each document under each series comprising the keyword number of files and The sum of each series Documents, determines the anti-document frequency of the keyword;
Pass through word frequency of the keyword in corresponding series and the anti-text of the keyword Shelves frequency determines weight point of the keyword under correspondence series.
3. method as claimed in claim 1, it is characterised in that the title by each document is believed Breath, determines that corresponding each keyword is specifically included each series respectively:
The title of each document is normalized;
Cutting word processing and entity word identification are carried out to the title of each document after normalized, obtained Entity word for describing entity things;
Determine that each series distinguishes the word frequency of corresponding each entity word, the word frequency of entity word is used to describe The times or frequency that the entity word occurs in the corresponding series;
Using each entity word as current entity word, judge whether the word frequency of the current entity word is big In predetermined threshold value, if so, the current entity word to be then defined as to the keyword of corresponding series.
4. method as claimed in claim 2, it is characterised in that methods described also includes, works as current class When the word frequency of the corresponding each entity word of classification is respectively less than predetermined threshold value, by word frequency in the current class classification most Big entity word is defined as the keyword in the current class classification, and the current class classification includes each institute State any one series in series.
5. method as claimed in claim 1, it is characterised in that it is described according to each series, it is each The series distinguish corresponding each keyword and each keyword the weight it is mitogenetic into Dictionary is specifically included:
Using each keyword as current key word, according to the current key word, the current pass The weight of the corresponding series of keyword and the current key word under correspondence series is mitogenetic into pass Joint group;
According to each associated group generation dictionary.
6. such as claim 1 to 5 any one methods described, it is characterised in that the given level Each series specifically includes each one-level classification and/or each leaf classification.
7. a kind of Document Classification Method, it is characterised in that including:
The heading message of document is specified in acquisition objective network interaction platform and determines the mark of the specified document Corresponding each entity word is inscribed, the entity word is used to describe entity things;
By inquiring about dictionary, the weight point under each series of each entity word in the dictionary is determined, The dictionary is the dictionary generated according to claim 1 to 6 any one methods described;
By each entity word weight under each series point, determine the specified document each Respective total score under the series;
By the specified document, respective total score is entered to the specified document under each series Row classification.
8. method as claimed in claim 7, it is characterised in that it is described by the specified document in each institute Respective total score under series is stated to specifically include the specified document progress classification:
By the total score being sequentially generated from high to low of each series according to the specified document Queue, the series of predetermined quantity, institute are chosen according to the order in queue from the original position of the queue Stating the series of predetermined quantity at least includes two series;
Obtain initial classification of the specified document in objective network interaction platform;
Each series and the phase of initial classes purpose first of selected predetermined quantity are determined respectively Like spending, first similarity is used to describe the initial classification and the similarity degree of each series;
Each series of predetermined quantity selected by determining respectively and the title of the specified document Second similarity, second similarity is used for the title for describing the specified document and each series Similarity degree;
, will be described pre- using each series of selected predetermined quantity as pre-selection series The total score corresponding to series, first similarity and second similarity is selected to be added Power summation, classifies according to the result of weighted sum to the specified document.
9. method as claimed in claim 7, it is characterised in that it is described by the specified document in each institute Respective total score under series is stated to specifically include the specified document progress classification:
By the specified document under each series respective total score, determine the specified document Maximum total score corresponding to maximum total score series and by the specified document classification to it is described most Big total score series.
10. method as claimed in claim 7, it is characterised in that it is described by each entity word in each institute The weight point under series is stated, the specified document respective total score under each series is determined Specifically include:
Using each series as current class classification, by the way that each entity word is worked as described Weight under preceding series point summed, and determines the specified document in the current class class now Total score.
11. method as claimed in claim 7, it is characterised in that the title of the determination specified document Corresponding each entity word is specifically included:
The title of the specified document is normalized;
Cutting word processing and entity word identification are carried out to the title of the specified document after normalized, really The corresponding each entity word of title of the fixed specified document.
12. a kind of generating means of dictionary, it is characterised in that the dictionary, which is used to interact objective network, puts down Document classification in platform to specified network interaction platform, including:
Acquiring unit, determining unit and generation unit, wherein:
Acquiring unit, for obtaining each text in specified network interaction platform under each series of given level The heading message of shelves;
Determining unit, for the heading message by each document, determines that each series is right respectively Weight point of each keyword and each keyword answered under correspondence series, the weight point is used for The series where by keyword is assessed to make a distinction with other series outside the series When discrimination, the keyword refer under the corresponding series of the keyword according to pre-defined rule specify Word;
Generation unit, for according to each series, each series difference it is corresponding it is each described in The weight of keyword and each keyword is mitogenetic into dictionary.
13. a kind of document sorting apparatus, it is characterised in that including:
Acquiring unit, query unit, determining unit and taxon, wherein:
Acquiring unit, the heading message of document is specified for obtaining and is determined described in objective network interaction platform The corresponding each entity word of title of document is specified, the entity word is used to describe entity things;
Query unit, for by inquiring about dictionary, determining each classification of each entity word in the dictionary Class weight now point, the dictionary that the dictionary generates for the word stock generation device according to claim 12;
Determining unit, for the weight point by each entity word under each series, determines institute State specified document respective total score under each series;
Taxon, for by the specified document under each series respective total score to institute Specified document is stated to be classified.
CN201610048630.5A 2016-01-25 2016-01-25 Word stock generation method and device and document classification method and device using word stock Active CN106997340B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610048630.5A CN106997340B (en) 2016-01-25 2016-01-25 Word stock generation method and device and document classification method and device using word stock

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610048630.5A CN106997340B (en) 2016-01-25 2016-01-25 Word stock generation method and device and document classification method and device using word stock

Publications (2)

Publication Number Publication Date
CN106997340A true CN106997340A (en) 2017-08-01
CN106997340B CN106997340B (en) 2020-07-31

Family

ID=59428279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610048630.5A Active CN106997340B (en) 2016-01-25 2016-01-25 Word stock generation method and device and document classification method and device using word stock

Country Status (1)

Country Link
CN (1) CN106997340B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933692A (en) * 2019-04-01 2019-06-25 北京百度网讯科技有限公司 Establish the method and apparatus of mapping relations, the method and apparatus of information recommendation
CN110135264A (en) * 2019-04-16 2019-08-16 深圳壹账通智能科技有限公司 Data entry method, device, computer equipment and storage medium
CN110390094A (en) * 2018-04-20 2019-10-29 伊姆西Ip控股有限责任公司 Method, electronic equipment and the computer program product classified to document
CN112307210A (en) * 2020-11-06 2021-02-02 中冶赛迪工程技术股份有限公司 Document tag prediction method, system, medium and electronic device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7275052B2 (en) * 2004-08-20 2007-09-25 Sap Ag Combined classification based on examples, queries, and keywords
CN102141978A (en) * 2010-02-02 2011-08-03 阿里巴巴集团控股有限公司 Method and system for classifying texts
CN102411592A (en) * 2010-09-21 2012-04-11 阿里巴巴集团控股有限公司 Text classification method and device
CN103123636A (en) * 2011-11-21 2013-05-29 北京百度网讯科技有限公司 Method to build vocabulary entry classification models, method of vocabulary entry automatic classification and device
CN103605815A (en) * 2013-12-11 2014-02-26 焦点科技股份有限公司 Automatic commodity information classifying and recommending method applicable to B2B (Business to Business) e-commerce platform

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7275052B2 (en) * 2004-08-20 2007-09-25 Sap Ag Combined classification based on examples, queries, and keywords
CN102141978A (en) * 2010-02-02 2011-08-03 阿里巴巴集团控股有限公司 Method and system for classifying texts
CN102411592A (en) * 2010-09-21 2012-04-11 阿里巴巴集团控股有限公司 Text classification method and device
CN103123636A (en) * 2011-11-21 2013-05-29 北京百度网讯科技有限公司 Method to build vocabulary entry classification models, method of vocabulary entry automatic classification and device
CN103605815A (en) * 2013-12-11 2014-02-26 焦点科技股份有限公司 Automatic commodity information classifying and recommending method applicable to B2B (Business to Business) e-commerce platform

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390094A (en) * 2018-04-20 2019-10-29 伊姆西Ip控股有限责任公司 Method, electronic equipment and the computer program product classified to document
CN110390094B (en) * 2018-04-20 2023-05-23 伊姆西Ip控股有限责任公司 Method, electronic device and computer program product for classifying documents
CN109933692A (en) * 2019-04-01 2019-06-25 北京百度网讯科技有限公司 Establish the method and apparatus of mapping relations, the method and apparatus of information recommendation
CN109933692B (en) * 2019-04-01 2022-04-08 北京百度网讯科技有限公司 Method and device for establishing mapping relation and method and device for recommending information
CN110135264A (en) * 2019-04-16 2019-08-16 深圳壹账通智能科技有限公司 Data entry method, device, computer equipment and storage medium
CN112307210A (en) * 2020-11-06 2021-02-02 中冶赛迪工程技术股份有限公司 Document tag prediction method, system, medium and electronic device

Also Published As

Publication number Publication date
CN106997340B (en) 2020-07-31

Similar Documents

Publication Publication Date Title
US7949643B2 (en) Method and apparatus for rating user generated content in search results
US9106698B2 (en) Method and server for intelligent categorization of bookmarks
US20120203584A1 (en) System and method for identifying potential customers
US20140172415A1 (en) Apparatus, system, and method of providing sentiment analysis result based on text
CN107958014B (en) Search engine
CN106598949B (en) A kind of determination method and device of word to text contribution degree
CN104978332B (en) User-generated content label data generation method, device and correlation technique and device
CN107133238A (en) A kind of text message clustering method and text message clustering system
CN116911312B (en) Task type dialogue system and implementation method thereof
CN104462399B (en) The processing method and processing device of search result
CN106997340A (en) The generation of dictionary and the Document Classification Method and device using dictionary
CN108228612B (en) Method and device for extracting network event keywords and emotional tendency
US9002832B1 (en) Classifying sites as low quality sites
CN108681564A (en) The determination method, apparatus and computer readable storage medium of keyword and answer
CN113918806A (en) Method for automatically recommending training courses and related equipment
CN114330329A (en) Service content searching method and device, electronic equipment and storage medium
CN106407316A (en) Topic model-based software question and answer recommendation method and device
CN107133835A (en) A kind of method and device for analyzing commercial quality
CN107908649B (en) Text classification control method
Srisopha et al. Learning features that predict developer responses for ios app store reviews
CN105787004A (en) Text classification method and device
Roszkowska et al. Can the holistic preference elicitation be used to determine an accurate negotiation offer scoring system? A comparison of direct rating and UTASTAR techniques
CN108550019A (en) A kind of resume selection method and device
JP2013174988A (en) Similar document retrieval support apparatus and similar document retrieval support program
CN111882224A (en) Method and device for classifying consumption scenes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1241056

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.