CN105868193A - Device and method used to detect product relevant information in electronic text - Google Patents

Device and method used to detect product relevant information in electronic text Download PDF

Info

Publication number
CN105868193A
CN105868193A CN201510025848.4A CN201510025848A CN105868193A CN 105868193 A CN105868193 A CN 105868193A CN 201510025848 A CN201510025848 A CN 201510025848A CN 105868193 A CN105868193 A CN 105868193A
Authority
CN
China
Prior art keywords
related information
product
product related
text
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510025848.4A
Other languages
Chinese (zh)
Inventor
宋双永
孟遥
郑仲光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201510025848.4A priority Critical patent/CN105868193A/en
Publication of CN105868193A publication Critical patent/CN105868193A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

The invention discloses a device and a method used to detect product relevant information in an electronic text. The device comprises an acquisition unit, a first labeling unit, a second labeling unit, a training unit, an identification unit, and a combining unit. The acquisition unit acquires a list of products, and each entry in the list recording brand information of a product. The first labeling unit labels ambiguity signs or unambiguity signs on each entry in the list based on input of a user, to obtain a sign list. The second labeling unit uses the sign list to automatically label aimed at products in an electronic text, to obtain automatic labeled first product relevant information. The training unit trains a product information identification model based on the first product relevant information, so as to generate a training model. The identification unit uses the training model to identify product relevant information in the electronic text, to obtain identified second product relevant information. The combining unit combines the first product relevant information with the second product relevant information, to obtain final product relevant information.

Description

For detecting the apparatus and method of the product related information in e-text
Technical field
The present invention relates to the Internet and Data Mining, more particularly to being used for detecting in e-text The apparatus and method of product related information.
Background technology
This part provides the background information relevant with the disclosure, and this is not necessarily prior art.
The most popular along with microblogging, increasing people select from microblog deliver ownness and Comment.By micro-blog information is analyzed, it is possible to convenient the most promptly understand user to focus incident, The view of the aspect contents such as dairy products and video display star and emotion.In microblogging, automobile is user One of a lot of content is discussed, according to statistics, have more than the content of microblog of 0.5% all with concrete automobile product Board is relevant with vehicle, that is, often less than in 200 microbloggings, just have one about automobile, can See that microblogging has become as automobile brand marketing, the Important Platform of user vehicle suggestion feedback.
The accurate detection of automobile relevant information in microblogging, it is possible in user's purchasing intention analysis, use the registered permanent residence The application such as upright stone tablet evaluation plays an important role.In former automobile relevant information detection work, mainly It is by the way of directly coupling automobile brand name list.Such as, all " Toyota " two word is comprised Microblogging be i.e. confirmed as talking about the content of Toyota Motor.But, the matching result bag that the method obtains Containing a lot of problems: first, a lot of automobile brands or vehicle are represented by some ambiguity words, such as " long Peace " refer not only to the brand of car, also refer to this city, Xi'an, and " golf " is not only Volkswagen A vehicle, also can refer to golf this motion.So utilizing the method for directly coupling to know Can not bring a lot of mistake to recognition result.Secondly, directly the method for coupling cannot find new automobile Mention mode, including new vehicle or new automobile nickname.This is because automobile brand name list Likely uncomplete content face, and can not usually update.Such as at us from certain well-known car website In the automobile list downloaded above, the most do not comprise " Porsche GT9CS " vehicle.And in reality During identification, it is to need this vehicle is mentioned what information was identified.Further, since microblogging The randomness of information format, user expresses the mode about the same meaning of automobile also can be different.Such as " the brand-new sounds of nature of Nissan " and " the new sounds of nature of Nissan ", " benz S level " and " benz S system Row ", " Cherry E5 " and " Cherry E5 ", " Subaru 9 generation WRX " and " Subaru WRX9 Generation ", to different types of, this is accomplished by final recognition methods can mention that mode is identified, and straight The method connecing coupling obviously can not solve this difficult problem.
Summary of the invention
This part provides the general summary of the disclosure rather than its four corner or its whole features Full disclosure.
The purpose of the disclosure is to provide the dress of a kind of product related information for detecting in e-text Putting and method, it can more accurately identify the product related information in e-text.
One side according to the disclosure, it is provided that a kind of for detecting the relevant letter of the product in e-text The device of breath, including: acquiring unit, it obtains the list of product, each in described list The brand message of product described in program recording;First mark unit, its input based on user is described row In table each entry mark ambiguity tag or non-ambiguity tag to obtain list, wherein, institute Stating ambiguity tag indicates the declaration of will of the sequence of terms in described entry to have an ambiguity, and described non-discrimination Justice labelling then indicates the declaration of will of the sequence of terms in described entry not have ambiguity;Second mark is single Unit, it utilizes described list to carry out automatic marking for described product in described e-text, To obtain the first product related information of automatic marking;Training unit, it is based on described first product phase Product information identification model is trained by pass information, thus generates training pattern;Recognition unit, its Described training pattern is utilized to be identified for described product related information in described e-text, with Obtain the second product related information identified;And combining unit, it is by relevant for described first product letter Breath and described second product related information merge, to obtain final product related information.
According to another aspect of the present disclosure, it is provided that a kind of relevant for detecting the product in e-text The method of information, including: obtain the list of product, described in each program recording in described list The brand message of product;Input based on user is each entry mark ambiguity mark in described list Note or non-ambiguity tag are to obtain list, and wherein, described ambiguity tag indicates in described entry The declaration of will of sequence of terms has ambiguity, and described non-ambiguity tag then indicates the word in described entry The declaration of will of word order row does not have ambiguity;Utilize described list in described e-text for Described product carries out automatic marking, to obtain the first product related information of automatic marking;Based on described Product information identification model is trained by the first product related information, thus generates training pattern;Profit It is identified for described product related information in described e-text by described training pattern, to obtain Take the second product related information of identification;And described first product related information and described second are produced Product relevant information merges, to obtain final product related information.
According to another aspect of the present disclosure, it is provided that a kind of program product, this program product includes storage Machine readable instructions code wherein, wherein, described instruction code is when being read by computer and performing Time, it is possible to make described computer perform being correlated with for the product detecting in e-text according to the disclosure The method of information.
According to another aspect of the present disclosure, it is provided that a kind of machinable medium, it carries Program product according to the disclosure.
Use device and side for detecting the product related information in e-text according to the disclosure Method, owing to both obtaining the first product related information of automatic marking, obtains again the second product of identification Product relevant information, and merge to obtain by the first product related information and the second product related information Take final product related information, it is possible to the product more accurately identified in e-text is correlated with Information, thus preferably improve recognition effect.
Description in this summary and specific examples are intended merely to the purpose of signal, and are not intended to limit this Scope of disclosure.
Accompanying drawing explanation
Accompanying drawing described here is intended merely to the purpose of the signal of selected embodiment and not all possible reality Execute, and be not intended to limit the scope of the present disclosure.In the accompanying drawings:
Fig. 1 illustrates the total system flow chart of the technical scheme according to the disclosure;
Embodiment of the disclosure according to Fig. 2 for detecting product related information in e-text The block diagram of device;
Fig. 3 illustrates character string label symbol and symbol implication list;
It is the most transformed that Fig. 4 illustrates between automobile information recognition result and symbol annotation results Journey;
Fig. 5 A, Fig. 5 B and Fig. 5 C illustrate initial probability distribution, state transition probability respectively Distribution and the example of observation probability distribution;
Fig. 5 D illustrates the optimal path solution procedure of an embodiment according to the disclosure;
Fig. 5 E illustrates the optimal path solving result of Fig. 5 D;
Fig. 6 is that being correlated with for the product detecting in e-text of another embodiment according to the disclosure is believed The block diagram of the device of breath;
Embodiment of the disclosure according to Fig. 7 for detecting product related information in e-text The flow chart of method;And
Fig. 8 be wherein can realize according to embodiment of the disclosure for the product detecting in e-text The block diagram of the example arrangement of the general purpose personal computer of the method for product relevant information.
Although the disclosure is subjected to various amendment and alternative forms, but the conduct of its specific embodiment Example is shown in the drawings, and is described in detail here.It should be understood, however, that at this to specific The description of embodiment is not intended to be restricted to the disclosure disclosed concrete form, but on the contrary, this Open purpose is intended to cover all modifications within the spirit and scope of the disclosure, equivalence and replaces. It should be noted that run through several accompanying drawing, corresponding label indicates corresponding parts.
Embodiment
With reference now to accompanying drawing, it is described more fully the example of the disclosure.Hereinafter describe and be substantially Exemplary, and be not intended to limit the disclosure, application or purposes.
Provide below example embodiment, in order to the disclosure will become detailed, and will be to ability Field technique personnel pass on its scope fully.Elaborate numerous specific detail such as discrete cell, device With the example of method, to provide the detailed understanding of embodiment of this disclosure.For people in the art It will be obvious that need not use specific details for Yuan, example embodiment can by many not Same form is implemented, and they shall not be interpreted to limit the scope of the present disclosure.In some example In embodiment, do not describe in detail well-known process, well-known structure and it is known that Technology.
Use the apparatus and method for detecting the product related information in e-text according to the disclosure By the recognition result that direct matching process is obtained and the recognition result obtained by machine learning model It is combined, has obtained the most final recognition result.Specifically, it relates to automatically examine Survey the product (such as automobile, computer or camera) comprised in e-text (such as microblogging) and mention letter Breath.The defect that the disclosure is comprised for method based on directly coupling, devises a kind of rule-based Coupling and dual stage process based on Model Matching.The method is first with " ambiguity/non-ambiguity " The product brand list of labelling and some simple matched rules, carried out e-text content automatically Mark, obtains e-text product and mentions information automatic marking result;Secondly, this automatic marking is utilized As a result, the training such as HMM is carried out.In the training process, e-text is considered as by The character string of the composition such as Chinese character, letter and punctuation mark, and according to each Chinese character, letter and punctuate The effect that symbol is played during product mentions information identification, labels it as that ' product is mentioned Lead-in ', ' product of part mentions middle the word of part ' and the class such as ' product mentions tail word partly ' The character of type, utilizes the training of these data to obtain HMM;Then, utilize this model, right Primary electron text data carries out product based on model and mentions information identification, obtains product based on model Mention information recognition result;Finally, result and the recognition result of automatic marking are merged.Optional Ground, after consolidation, can be combined result and carry out corresponding post processing, obtains final identification knot Really.Although it should be noted that in the disclosure is to mention information with HMM as product The training pattern identified, but other such as conditional random field models (Conditional Random Fields Model), maximum entropy model (Maximum Entropy Model) etc. can also be applied to The disclosure.
The apparatus and method being used for detecting the product related information in e-text according to the disclosure are permissible That discuss warmly for the e-texts such as automobile, computer and digital camera such as microblogging etc. and trademark quantity is limited The product type of (hundreds to thousands).Apparatus and method in the disclosure can be correlated with merely with brand The ambiguity of word/non-ambiguity information, identifies related content automatically.Use the disclosure without setting up substantial amounts of Matched rule, without the training data of artificial mark, and also has certain knowledge to new product name Other property.Further, the product of the disclosure mention recognition methods build the training corpus stage utilize ambiguity/ The product correlation word of non-ambiguity differentiates, adds the accuracy of data automatic marking.Additionally, these public affairs The statistical model opening middle training can utilize the corpus of structure automatically, analyzes the linguistic context mentioning product Information, plays the effect preferably differentiating that ambiguity word, identification new product mention expression way.And, The combination of automatic marking result and Model Identification result and the post processing of result after combining, it is possible to more Improve recognition effect well.
Fig. 1 illustrates the total system flow chart of the technical scheme according to the disclosure.As it is shown in figure 1, root Product related information according to the product in the e-text of detection automatically of the disclosure with predetermined trademark quantity Device totally can include three parts (but these three part might not be all the disclosure institute necessary ): list of brands based on the product (such as automobile) with " ambiguity/non-ambiguity " labelling automatic Annotation process;Product mentions model (such as HMM) training method and the profit of information identification Information identification process is mentioned with the product of this model;And recognition result of based on automatic marking method with The merging of recognition result based on model and the rear place of (in Fig. 1 shown in dotted line frame) amalgamation result Reason process.Specifically, first, based on raw language material (i.e. without the e-text content of any process), Utilize the product brand list with " ambiguity/non-ambiguity " labelling (such as the automobile correlation word in Fig. 1 Ambiguity/non-ambiguity list) and some simple matched rules, e-text content is marked automatically Note, obtains e-text product and mentions information automatic marking result (i.e. automatic marking language material in Fig. 1). In conjunction with Fig. 1, automobile correlation word ambiguity/non-ambiguity list refers to identify each automobile brand, model Whether there is the word list of ambiguousness.Such as " Chang'an " is the automobile brand with ambiguousness, because Chang'an can also refer to Chang an City;" Porsche " is then unambiguous automobile brand.Automatic marking language material Refer to, merely with automobile correlation word ambiguity/non-ambiguity list, raw language material be labeled the automobile obtained Mention information recognition result.Secondly, utilize this automatic marking result, carry out such as HMM Training, to generate training pattern (i.e. automobile in Fig. 1 mention identification model).In conjunction with Fig. 1, hidden Markov model refers to a kind of model with sequence labelling ability, and this model can utilize mark automatically Note language material is trained, such that it is able to automatically raw language material is carried out automobile to mention information identification.Then, Utilize this model, primary electron text data can be carried out product based on model and mention the model of information Identify, obtain product based on model and mention information model recognition result.Finally, by automatic marking Result and recognition result merge to obtain the recognition result after merging.Alternatively, in merging After, result can be combined and carry out corresponding post processing, obtain final recognition result (i.e. in Fig. 1 Post processing result).Herein, it should be pointed out that the last handling process shown in dotted line frame in Fig. 1 Not being that the disclosure is requisite, it will describe in detail in the following embodiments.
The total system flow process of technical scheme according to the disclosure has been briefly described above.With reference next to Accompanying drawing describes the technical scheme of the disclosure in further detail.
Fig. 2 show according to embodiment of the disclosure for detecting in e-text (such as microblogging) The device 200 of product (such as automobile) relevant information.As in figure 2 it is shown, according to embodiment of the disclosure Can include acquiring unit 210, mark for detecting the device of product related information in e-text Note unit 220, mark unit 230, training unit 240, recognition unit 250 and combining unit 260.
Acquiring unit 210 can obtain the list of product, and each program recording in described list produces The brand message of product.An embodiment according to the disclosure, acquiring unit 210 can obtain for vapour The list of brands of the brand of car, vehicle etc..Such as, above car website obtain with automobile brand and The word list that model is relevant.
It follows that as first mark unit mark unit 220 can input based on user for obtaining Take each entry mark ambiguity tag in the list that unit 210 obtains or non-ambiguity tag to obtain List, wherein, the declaration of will of the sequence of terms in ambiguity tag instruction entry has ambiguity, Rather than the declaration of will that ambiguity tag then indicates the sequence of terms in entry does not have ambiguity.According to these public affairs The embodiment opened, mark unit 220 can be for the list of brands of the brand of automobile, vehicle etc. In each entry, input based on user marks ambiguity tag or non-ambiguity tag, i.e. base An automobile brand vehicle list with " ambiguity/non-ambiguity " labelling is set up in input in user. Herein, " ambiguity " refers to that certain automobile brand or vehicle title have in addition to representing this automobile Other meaning, the most above-mentioned " Chang'an ", " golf " etc..Alternatively, mark unit 220 Can also be for each entry in list of brands, list of brands is divided into discrimination by input based on user Adopted and non-two lists of ambiguity, and corresponding between retained product brand message with product type information Relation.
Then, the mark unit 230 as the second mark unit can utilize mark unit 220 to obtain List in e-text, carry out automatic marking for product, to obtain the first of automatic marking Product related information is as automatic marking result.An embodiment according to the disclosure, marks unit 230 Can be based on the automobile brand vehicle list with " ambiguity/non-ambiguity " labelling, pin in e-text Automobile is carried out automatic marking, mentions that information is as automatic marking result obtaining the automobile of automatic marking. Here, mark unit 230 can based on the list of " ambiguity/non-ambiguity " labelling or ambiguity and Two lists of non-ambiguity, according to well known to a person skilled in the art coupling annotation process in e-text Carry out automatic marking.
Although during automatic marking, processing the automatic marking result obtained and still suffer from some mistakes, But statistical model below can utilize substantial amounts of statistical information, the mark that study to correctness is maximum Mode, and then can be during identification based on model, the mistake produced during correcting automatic marking Result by mistake.
It follows that training unit 240 can be based on the first product related information to being used for identifying product phase The product information identification model of pass information is trained, thus generates training pattern.Trained by this Journey, it is possible to achieve identify that new product type such as vehicle or the new product pet name and new product are retouched State mode.An embodiment according to the disclosure, before carrying out such as the training of HMM, The automobile information recognition result of automatic marking can be converted into the form of the mark of symbol shown in Fig. 3. Specifically, as it is shown on figure 3, s represents that the lead-in of part mentioned by automobile, such as, " I has bought in one / s Hua Junjie.”;M represents that word in the middle of part mentioned by automobile, and such as " I has bought a China/m fine horse/m Prompt.”;L represents that the tail word of part mentioned by automobile, and such as " I has bought a Chinese fine horse victory/l.”; B represents that the word before part mentioned by automobile, and such as " it is prompt that I has bought one/b China fine horse.”;A represents Part word below mentioned by automobile, and such as " it is prompt that I has bought a Chinese fine horse./a”;K represents two vapour The word in the middle of part mentioned by car, and such as " Ferrari is compared with/k Porsche, each has something to recommend him.”;With And e represents beyond divided by upper type Chinese character, letter and punctuate, other character.Concrete transform mode As shown in Figure 4, such as, automobile information recognition result is for " you feel [Land Rover] and [Cherokee] which board Son is good?" inverted rear symbol annotation results is:
Alternatively, for identifying that the model of product related information can also be for conditional random field models or maximum entropy Model.
Generally, HMM is by initial probability distribution, state transition probability distribution and observation Probability distribution determines.These probability distribution can be obtained by training data statistics.Initial probability distribution Refer to the different labeled probability distribution as beginning of the sentence word annotation results, such as the example be given in Fig. 5 A, Drawn by training data statistics, sentence lead-in be noted as the probability of s, b, e be respectively 0.1,0.1, 0.8, and the probability being labeled as other symbol is 0.State transition probability distribution refers at current word In the case of annotation results determines, next word is noted as the probability distribution of Different Results, at Fig. 5 B In the example be given, rower refers to that the annotation results of current word, vertical mark refer to that the mark of next word can Energy.As a example by the first row, can add up according to training data and draw, if current word is noted as s, The probability that so next word is noted as m is 0.8, and the probability being labeled as l is 0.2, is labeled as it The probability of its symbol is 0.Observation probability be distributed, refer to assume current annotation results it has been determined that The probability distribution of current location correspondence difference word.As shown in Figure 5 C, rower is for currently marking knot Really symbol, the vertical possible character being designated as correspondence.As a example by the first row, can add up according to training data Draw, if current label symbol is s, then current word be the probability of " length " be 0.8, when Front word be the probability of " peace " be 0.1, current word be the probability in " city " be 0.1, current word is other The probability of word is 0.
Then, the training pattern that recognition unit 250 can utilize training unit 240 to generate is civilian at electronics It is identified for product related information in Ben, to obtain the second product related information identified as knowledge Other result.An embodiment according to the disclosure, is utilizing model to original electron text (such as microblogging) During carrying out sequence labelling, it is possible to use viterbi algorithm is carried out.Viterbi algorithm (Viterbi Algorithm) being a kind of dynamic programming algorithm, it is used for finding most possible generation observed events sequence " Viterbi path " hidden state sequence, preferably in hidden Markov model.Utilizing dimension During spy mentions information than algorithm mark automobile, can set up two two-dimensional matrixs, one is used for Record present case most probable value, a path producing this most probable value for recording.Reference Accompanying drawing, Fig. 5 D is optimal path solution procedure, and Fig. 5 E is optimal path solving result.At Fig. 5 D In, the determination process of each line segment is as follows: first, at initial point, and " I " is marked as certain symbol Number probability be the general of " me " equal to this symbol as the probability character corresponding with this symbol of initial symbol The product of rate.Secondly, after each point, after being required for finding by calculating above, can make The probability obtaining this point is that maximum situation is as the path representated by line segment.With " car (a).(e)” As a example by, its real segment " car (e).(e) " it is also possible that, but under this line segment path Obtain ".(e) " probability is only 0.000001048576, it is far smaller than at " car (a).(e)” In the case of ".(e) " probit 0.0000580608, therefore, it can retain " car (a). (e) ", abandon " car (e).(e)”.Finally, after the result of Fig. 5 D obtains, the most very It is apparent to the optimal path result obtaining in Fig. 5 E.Recall along maximum path, because Each point can find uniquely " a upper node ", therefore, has obtained Shortest path result.It follows that after obtaining above-mentioned symbol annotation results, can be according in Fig. 4 Transform mode, converts this result using as recognition result.According to embodiment of the disclosure, " I Drive Chang'an car." symbol annotation results be:
Result it is identified for " I drives [Chang'an] car " after this symbol annotation results is inverted.
It follows that combining unit 260 can be by the first product related information and the second product related information Merge, to obtain final product related information as amalgamation result.According to the disclosure one Embodiment, combining unit 260 can be according to merging means well known in the art by relevant for the first product letter Breath and the second product related information merge.
According to the embodiment of Fig. 2, for as the most micro-at e-text in automobile, computer and digital camera etc. That mention in Bo and that trademark quantity is limited (hundreds to thousands) product type, the disclosure can be only Utilize the ambiguity/non-ambiguity information of brand correlation word, automatically identify related content.The disclosure need not be built Vertical substantial amounts of matched rule, without the training data of artificial mark, i.e. has one to new product name Fixed identity.Additionally, the product of the disclosure mentions that recognition methods is building training corpus stage profit Differentiate with the product correlation word of ambiguity/non-ambiguity, add the accuracy of data automatic marking.And, In the disclosure, the statistical model of training can utilize the corpus of structure automatically, analyzes and mentions product Language ambience information, plays the effect preferably differentiating that ambiguity word, identification new product mention expression way.
According to embodiment of the disclosure, when occur in that in e-text with in list with non- During the identical electronics sequence of terms of sequence of terms in the entry of ambiguity tag, mark unit 230 is permissible This electronics sequence of terms is labeled as the first product related information.According to another embodiment of the disclosure, When occur in that in a part of e-text with in the entry with ambiguity tag in list The identical electronics sequence of terms of sequence of terms and during the title of product, mark unit 230 can be by This part of e-text is labeled as the first product related information.Such as, mark unit 230 is permissible Based on the automobile brand vehicle list with " ambiguity/non-ambiguity " labelling, exist according to following mark rule E-text such as microblogging carries out automatic marking for automobile: in list with non-ambiguity tag Or the brand vehicle correlation word in non-ambiguity list, this word once occurs in e-text, then will It is labeled as automobile and mentions information;For in list with ambiguity tag or brand in ambiguity list Vehicle correlation word, if occurred in that in e-text " automobile ", or brand and vehicle occur simultaneously , then corresponding part is labeled as automobile and mentions information.Alternatively, the entry in labelling list of brands Carrying out successively arranging according to the inclusion relation of the sequence of terms in entry, such as " ChangAn Automobile " comes Before " Chang'an ", that is, if it is possible to match " ChangAn Automobile ", then on same position, " Chang'an " just can be no longer attempt to have matched.
Based on above notation methods, although there is likely to be mistake, but relative to the side of directly coupling For method, accuracy is greatly improved.Further, hidden Ma Erke on this basis Husband's model training process, it is possible to product is mentioned information and front and back linguistic context rule carry out statistical induction, And then utilizing the statistical model obtained again the product in original electron text to be mentioned, information is known Time other, it is possible to preferably utilize language ambience information.
According to embodiment of the disclosure, when the same position of e-text had both been noted as the first product phase When pass information is noted as the second product related information again, combining unit 260 can be by e-text This same position is labeled as final product related information.Further, when a position of e-text When being only noted as one of the first product related information and the second product related information, combining unit 260 This position of e-text can be labeled as final product related information.Further, when electronics literary composition This primary importance is noted as in the first product related information and the second product related information, And the second position comprising primary importance of e-text is noted as the first product related information and second During another in product related information, combining unit 260 can be by the second position mark of e-text Note is final product related information.And then, when the 3rd position of e-text is noted as the first product In product relevant information and the second product related information one, and e-text with the 3rd position part The 4th overlapping position be noted as in the first product related information and the second product related information another Time individual, combining unit 260 can will be noted as in the 3rd position of e-text and the 4th position The position of one product related information is labeled as final product related information.Specifically, two kinds of results Merging method is as follows: situation 1, the identification division that two kinds of results of reservation are identical;If situation 2 is a kind of Part mentioned by the automobile identified in result, does not recognize, also protect in another result Stay this result;If part mentioned by the automobile identified in a kind of result of situation 3, comprise Part mentioned by the automobile of this position identified in another result, then retain identification division relatively Long result;If 4 two results of situation same identified position (cross reference, but not It is inclusion relation), containing the conflict of type in addition to situation 2 and situation 3 in the result identified, The result then obtained according to automatic marking method retains.Such as, automatic marking result: " I abandons [Toyota] automobile goes to buy [producing daily] A2, finds that this [producing daily] automobile is sold out, changes the most again [this into Field].”;Model Identification result: " I abandons rich [field vapour] car and go to buy [daily output A2], finds that this is produced daily Automobile is sold out, changes the most again [Honda] into.", amalgamation result: " I abandons [Toyota] automobile and go to buy [day Produce A2], find that this [producing daily] automobile is sold out, change the most again [Honda] into.”.In amalgamation result Four parts, correspond with the content that situation 4, situation 3, situation 2 and situation 1 are mentioned.First Part, two outcome conflicts, retain automatic marking result;The second part, Model Identification result bag Contain automatic marking result, retain longer result, i.e. Model Identification result;3rd part, passes through Model Identification has obtained the content failing to identify in automatic marking result, then retained;4th portion Point, two Model Identification results are consistent, then retain this result.
Fig. 6 show another embodiment according to the disclosure for detecting e-text (such as microblogging) In the device 600 of product (such as automobile) relevant information.As shown in Figure 6, according to the reality of the disclosure Execute the device for detecting product (such as the automobile) relevant information in e-text (such as microblogging) of example Except acquiring unit 210, mark unit 220, mark unit 230, training unit 240, identify list Outside unit 250 and combining unit 260, it is also possible to include post-processing unit 610.Below in conjunction with This device is described by specific embodiment in detail.
As shown in Figure 6, after acquiring unit 210 obtains for the list of brands of product, mark unit 220 can carry out ambiguity or non-ambiguity tag mark for list of brands.It follows that mark unit 230 Automatic marking can be carried out based on the list of brands with " ambiguity/non-ambiguity " labelling.Then, training Unit 240 can be to for identifying that the product information identification model of product related information is trained.Connect The recognition unit 250 that gets off can be identified based on this training pattern.Then, combining unit 260 can So that automatic marking result and recognition result are merged.
According to the present embodiment, after the recognition result after being merged, can be to this amalgamation result Carry out a post processing.Specifically, it is correlated with when two sequence of terms are respectively denoted as final product Information and there is not any character between the two sequence of terms or only exist space, and the two Sequence of terms meets following condition for the moment, and post-processing unit 610 can be by the two sequence of terms mark Note is an overall product related information: the previous sequence of terms in the two sequence of terms comprises Later sequence of terms in product brand information, and the two sequence of terms does not comprise product brand Information;And the previous sequence of terms in the two sequence of terms comprises product brand information, and Later sequence of terms in the two sequence of terms comprises the product type corresponding with product brand information Information.An embodiment according to the disclosure, mentions for automobile and is partly divided into several unit and divides Not Bei Shibie situation, such as at microblogging, " limitation sells 10 Lamborghini Aventador LP760-4 Dragon Edition, carries 6.5 liters of V12 naturally aspirated engines, nominal power 760 horsepowers~~" In, ' Lamborghini Aventador LP760-4 ' and ' Dragon Edition ' is carried by being respectively identified as automobile And information.This result Producing reason, is the HMM result of determination to context of co-text. But, ' Dragon Edition ' although position be that automobile mentions that the probability of information is very big, but its This part real should with above ' Lamborghini Aventador LP760-4 ' is an entirety, is right Further describing of previous section.Therefore, for of this sort situation, post processing can be set Rule is as follows: if it find that two continuous print word sequences are respectively identified as automobile and mention information, and Can only be space or without any character between two continuous print word sequences, it addition, former and later two word order Row meet one of following condition: 1, comprise the content in automobile brand vehicle list in previous word sequence, Later word sequence does not comprise the content in automobile brand vehicle list;2, previous word sequence comprises Automobile brand information, later word sequence comprises the vehicle information corresponding with this brand;Then by two times Sequence merges into an entirety, and is labeled as a complete automobile and mentions identification information word sequence.This Sample process is because in e-text often comprising two automobiles and describes the situation that information occurs side by side, example Such as " video annotates luxurious moral, day four person of outstanding talent's car compares with Infiniti BMW Audi Lexus " In, ' Infiniti ', ' BMW ', ' Audi ' and ' Lexus ' to be identified as automobile respectively and to mention letter Breath, and can not merge.
Use the relevant letter of the product (such as automobile) being used for detecting in e-text (such as microblogging) of Fig. 6 The automatic marking result of device 600 and the combination of Model Identification result of breath and result after combining Post processing, it is possible to preferably improve recognition effect.
Alternatively, the product such as computer, number of other similar type it is applicable to according to the device of the disclosure The title identifications such as code-phase machine.Because the trademark quantity of this series products is limited, manual sorting correlation word " ambiguity/non-ambiguity " information is relatively easy.
Below in conjunction with Fig. 7 describe according to embodiment of the disclosure for detecting e-text (as micro- Rich) in the method for product (such as automobile) relevant information.As it is shown in fig. 7, according to the reality of the disclosure Execute example for the method detecting product (such as the automobile) relevant information in e-text (such as microblogging) Start from step S710.In step S710, obtain the list of product, each in described list The brand message of individual entry record product.
It follows that in step S720, input based on user is each entry mark in list Ambiguity tag or non-ambiguity tag are to obtain list, wherein, and the word in ambiguity tag instruction entry The declaration of will of word order row has ambiguity, rather than ambiguity tag then indicates the meaning of the sequence of terms in entry Think to represent that not there is ambiguity.
Then, in step S730, utilize list to carry out automatically for product in e-text Mark, to obtain the first product related information of automatic marking.
It follows that in step S740, based on the first product related information to product information identification model It is trained, thus generates training pattern.
Then, in step S750, utilize training pattern for product related information in e-text It is identified, to obtain the second product related information identified.
Finally, in step S760, the first product related information and the second product related information are carried out Merge, to obtain final product related information.
Preferably, utilize list to carry out automatic marking for product in e-text may include that When occurring in that in e-text and the word sequence in the entry with non-ambiguity tag in list When arranging identical electronics sequence of terms, electronics sequence of terms is labeled as the first product related information.
Preferably, utilize list to carry out automatic marking for product in e-text may include that When occur in that in a part of e-text with in the entry with ambiguity tag in list The identical electronics sequence of terms of sequence of terms and during the title of product, by this portion of e-text Divide and be labeled as the first product related information.
According to the another embodiment of the disclosure, the entry in list is according to the sequence of terms in entry Inclusion relation carry out successively arranging.
According to the another embodiment of the disclosure, by the first product related information and the second product related information Merge and may include that when the same position of e-text had both been noted as the first product related information When being noted as the second product related information again, this same position of e-text is labeled as final Product related information.
Preferably, the first product related information and the second product related information are merged and may include that When a position of e-text is only noted as the first product related information and the second product related information One of time, this position of e-text is labeled as final product related information.
Preferably, the first product related information and the second product related information are merged and may include that In the primary importance of e-text is noted as the first product related information and the second product related information One, and the second position comprising primary importance of e-text is noted as the relevant letter of the first product Breath and the second product related information in another time, the second position of e-text is labeled as finally Product related information.
Preferably, the first product related information and the second product related information are merged and may include that When the 3rd position of e-text is noted as in the first product related information and the second product related information One, and e-text be noted as the first product with the 3rd partly overlapping 4th position, position During another in relevant information and the second product related information, by the 3rd position and of e-text The position being noted as the first product related information in four positions is labeled as final product related information.
According to the another embodiment of the disclosure, each entry in list can be with record product Mutual correspondence between brand message and type information, and brand message and type information.
According to the another embodiment of the disclosure, according to embodiment of the disclosure for detecting e-text In the method for product related information can also include last handling process: when two sequence of terms respectively by It is labeled as final product related information and between the two sequence of terms, there is not any character or only There is space, and the two sequence of terms meets following condition for the moment, by the two sequence of terms It is labeled as an overall product related information: the previous sequence of terms bag in the two sequence of terms Brand message is not comprised containing the later sequence of terms in brand message, and the two sequence of terms; And the previous sequence of terms in the two sequence of terms comprises brand message, and the two word Later sequence of terms in sequence comprises the type information corresponding with brand message.
According to the method for detecting the product related information in e-text that embodiment of the disclosure Made detailed description before the various detailed description of the invention of above-mentioned steps, be not repeated.
Obviously, each according to the method for detecting product related information in e-text of the disclosure Individual operating process can be to be stored in the computer executable program in various machine-readable storage medium Mode realize.
And, the purpose of the disclosure can also be accomplished by: storage has above-mentioned execution The storage medium of program code is supplied to system or equipment directly or indirectly, and this system or set Computer or CPU (CPU) in Bei read and perform said procedure code.Now, As long as this system or equipment have the function of execution program, then embodiment of the present disclosure is not limited to Program, and this program can also be arbitrary form, and such as, target program, interpreter perform Program or be supplied to the shell script etc. of operating system.
These machinable mediums above-mentioned include but not limited to: various memorizeies and memory element, Semiconductor equipment, disk cell such as light, magnetic and magneto-optic disk, and other is suitable to Jie of storage information Matter etc..
It addition, computer is by the corresponding website being connected on the Internet, and by according to the disclosure Computer program code is downloaded and is installed in computer then perform this program, it is also possible to realize these public affairs The technical scheme opened.
Fig. 8 be wherein can realize according to embodiment of the disclosure for the product detecting in e-text The block diagram of the example arrangement of the general purpose personal computer of the method for product relevant information.
As shown in Figure 8, CPU 1301 according in read only memory (ROM) 1302 storage program or The program being loaded into random access memory (RAM) 1303 from storage part 1308 performs various process. In RAM 1303, store when CPU 1301 performs various process etc. required also according to needs Data.CPU 1301, ROM 1302 and RAM 1303 are connected to each other via bus 1304. Input/output interface 1305 is also connected to bus 1304.
Components described below is connected to input/output interface 1305: importation 1306 (includes keyboard, Mus Mark etc.), output part 1307 (include display, such as cathode ray tube (CRT), liquid crystal display Device (LCD) etc., and speaker etc.), storage part 1308 (including hard disk etc.), communications portion 1309 (including NIC such as LAN card, modem etc.).Communications portion 1309 is via network Such as the Internet performs communication process.As required, driver 1310 can be connected to input/output Interface 1305.Detachable media 1311 such as disk, CD, magneto-optic disk, semiconductor memory etc. It is installed in driver 1310 Deng as required so that the computer program read out is as required It is installed to store in part 1308.
In the case of realizing above-mentioned series of processes by software, it is situated between from network such as the Internet or storage Matter such as detachable media 1311 installs the program constituting software.
It will be understood by those of skill in the art that this storage medium is not limited to shown in Fig. 8 wherein Have program stored therein and equipment distributes the detachable media 1311 of the program that provides a user with separately. The example of detachable media 1311 comprises disk (comprising floppy disk (registered trade mark)), CD (only comprises CD Read memorizer (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprise mini-disk (MD) (note Volume trade mark)) and semiconductor memory.Or, storage medium can be ROM 1302, storage part Hard disk comprised in 1308 etc., wherein computer program stored, and divided together with the equipment comprising them Issue user.
In the apparatus and method of the disclosure, it is clear that each unit or each step are can to decompose and/or weight Combination nova.These decompose and/or reconfigure the equivalents that should be regarded as the disclosure.Further, perform The step of above-mentioned series of processes can order the most following the instructions perform in chronological order, but also Need not perform the most sequentially in time.Some step can perform parallel or independently of one another.
Although combine accompanying drawing above to describe in detail and embodiment of the disclosure, it is to be understood that above Described embodiment is only intended to the disclosure is described, and is not intended that restriction of this disclosure.Right For those skilled in the art, above-mentioned embodiment can be made various changes and modifications and do not have There is the spirit and scope deviating from the disclosure.Therefore, the scope of the present disclosure only by appended claim and Its equivalents limits.
About including the embodiment of above example, following remarks is also disclosed:
Remarks 1. 1 kinds is used for detecting the device of the product related information in e-text, including:
Acquiring unit, it obtains the list of product, produces described in each program recording in described list The brand message of product;
First mark unit, its input based on user is each entry mark discrimination in described list Justice labelling or non-ambiguity tag are to obtain list, and wherein, described ambiguity tag indicates described entry In the declaration of will of sequence of terms there is ambiguity, described non-ambiguity tag then indicates in described entry The declaration of will of sequence of terms not there is ambiguity;
Second mark unit, it utilizes described list for described product in described e-text Carry out automatic marking, to obtain the first product related information of automatic marking;
Training unit, product information identification model is instructed by it based on described first product related information Practice, thus generate training pattern;
Recognition unit, it utilizes described training pattern to be correlated with for described product in described e-text Information is identified, to obtain the second product related information identified;And
Combining unit, described first product related information and described second product related information are carried out by it Merge, to obtain final product related information.
Remarks 2. is according to the device described in remarks 1, wherein, when occurring in that in described e-text The electricity identical with the sequence of terms in the entry with described non-ambiguity tag in described list During sub-sequence of terms, described electronics sequence of terms is labeled as the first product phase by described second mark unit Pass information.
Remarks 3. is according to the device described in remarks 1, wherein, when a portion at described e-text Occur in that in Fen and the sequence of terms in the entry with described ambiguity tag in described list During the title of identical electronics sequence of terms and described product, described second mark unit is by described electricity Ziwen described part originally is labeled as the first product related information.
Remarks 4. is according to the device described in remarks 2 or 3, wherein, and the entry in described list Carry out successively arranging according to the inclusion relation of the sequence of terms in entry.
Remarks 5. is according to the device described in remarks 1, wherein, when the same position of described e-text When being not only noted as the first product related information but also be noted as the second product related information, described merging This same position of described e-text is labeled as final product related information by unit.
Remarks 6. is according to the device described in remarks 1, wherein, when a position of described e-text When being only noted as one of the first product related information and the second product related information, described merging list This position of described e-text is labeled as final product related information by unit.
Remarks 7. is according to the device described in remarks 1, wherein, when the primary importance of described e-text It is noted as in the first product related information and the second product related information, and described electronics is civilian This second position comprising described primary importance is noted as the first product related information and second and produces During another in product relevant information, described combining unit is by the described second position of described e-text It is labeled as final product related information.
Remarks 8. is according to the device described in remarks 1, wherein, when the 3rd position of described e-text It is noted as in the first product related information and the second product related information, and described electronics is civilian This with described 3rd partly overlapping 4th position, position be noted as the first product related information and During another in the second product related information, described combining unit is by described the of described e-text The position being noted as the first product related information in three positions and described 4th position is labeled as finally Product related information.
Remarks 9. is according to the device described in remarks 1, wherein, each in described list The brand message of product described in program recording and type information, and described brand message and described type letter Mutual correspondence between breath.
Remarks 10., according to the device described in remarks 9, also includes post-processing unit, wherein, when two Sequence of terms is respectively denoted as final product related information and between said two sequence of terms There is not any character or only exist space, and said two sequence of terms meets one of following condition Time, said two sequence of terms is labeled as the relevant letter of an overall product by described post-processing unit Breath:
Previous sequence of terms in said two sequence of terms comprises described brand message, and described Later sequence of terms in two sequence of terms does not comprise described brand message;And
Previous sequence of terms in said two sequence of terms comprises described brand message, and described Later sequence of terms in two sequence of terms comprises the described type corresponding with described brand message Information.
Remarks 11. is according to the device described in remarks 1, and wherein, described training unit is based on described first HMM, conditional random field models or maximum entropy model are trained by product related information To generate described training pattern.
Remarks 12. is according to the device described in remarks 1, wherein, described product be automobile, computer or Camera.
Remarks 13. is according to the device described in remarks 1, wherein, described product be have hundreds of to thousand of The product of trademark quantity.
Remarks 14. 1 kinds is used for the method detecting the product related information in e-text, including:
Obtain the list of product, the brand letter of product described in each program recording in described list Breath;
Input based on user is each entry mark ambiguity tag in described list or non-ambiguity Labelling is to obtain list, and wherein, described ambiguity tag indicates the sequence of terms in described entry Declaration of will has ambiguity, and described non-ambiguity tag then indicates the meaning of the sequence of terms in described entry Think to represent that not there is ambiguity;
Described list is utilized to carry out automatic marking for described product in described e-text, with Obtain the first product related information of automatic marking;
Based on described first product related information, product information identification model is trained, thus generates Training pattern;
Described training pattern is utilized to know for described product related information in described e-text Not, to obtain the second product related information identified;And
Described first product related information and described second product related information are merged, to obtain Final product related information.
Remarks 15. is according to the method described in remarks 14, wherein, utilizes described list described E-text carries out automatic marking for described product include: when occurring in that in described e-text The electricity identical with the sequence of terms in the entry with described non-ambiguity tag in described list During sub-sequence of terms, described electronics sequence of terms is labeled as the first product related information.
Remarks 16. is according to the method described in remarks 14, wherein, utilizes described list described E-text carries out automatic marking for described product include: when a portion at described e-text Occur in that in Fen and the sequence of terms in the entry with described ambiguity tag in described list During the title of identical electronics sequence of terms and described product, by the described part of described e-text It is labeled as the first product related information.
Remarks 17. is according to the method described in remarks 14, wherein, by described first product related information Merge with described second product related information and include: when described e-text same position both by It is labeled as the first product related information when being noted as the second product related information again, by described electronics literary composition This this same position is labeled as final product related information.
Remarks 18. is according to the method described in remarks 14, wherein, by described first product related information Merge with described second product related information and include: when described e-text a position only by When being labeled as one of the first product related information and the second product related information, by described e-text This position be labeled as final product related information.
Remarks 19. is according to the method described in remarks 14, by described first product related information and described Second product related information merges and includes: when the primary importance of described e-text is noted as In one product related information and the second product related information one, and described e-text comprise institute The second position stating primary importance is noted as the first product related information and the second product related information In another time, the described second position of described e-text is labeled as the relevant letter of final product Breath.
20. 1 kinds of machinable mediums of remarks, it carries the machine including being stored therein The program product of device instructions code, wherein, described instruction code is when being read by computer and performing Time, it is possible to make described computer perform according to the method in any of the one of remarks 14-19.

Claims (10)

1. for detecting a device for the product related information in e-text, including:
Acquiring unit, it obtains the list of product, produces described in each program recording in described list The brand message of product;
First mark unit, its input based on user is each entry mark discrimination in described list Justice labelling or non-ambiguity tag are to obtain list, and wherein, described ambiguity tag indicates described entry In the declaration of will of sequence of terms there is ambiguity, described non-ambiguity tag then indicates in described entry The declaration of will of sequence of terms not there is ambiguity;
Second mark unit, it utilizes described list for described product in described e-text Carry out automatic marking, to obtain the first product related information of automatic marking;
Training unit, product information identification model is instructed by it based on described first product related information Practice, thus generate training pattern;
Recognition unit, it utilizes described training pattern to be correlated with for described product in described e-text Information is identified, to obtain the second product related information identified;And
Combining unit, described first product related information and described second product related information are carried out by it Merge, to obtain final product related information.
Device the most according to claim 1, wherein, when occurring in that in described e-text The electricity identical with the sequence of terms in the entry with described non-ambiguity tag in described list During sub-sequence of terms, described electronics sequence of terms is labeled as the first product phase by described second mark unit Pass information.
Device the most according to claim 1, wherein, when a portion at described e-text Occur in that in Fen and the sequence of terms in the entry with described ambiguity tag in described list During the title of identical electronics sequence of terms and described product, described second mark unit is by described electricity Ziwen described part originally is labeled as the first product related information.
Device the most according to claim 1, wherein, when the same position of described e-text When being not only noted as the first product related information but also be noted as the second product related information, described merging This same position of described e-text is labeled as final product related information by unit.
Device the most according to claim 1, wherein, when a position of described e-text When being only noted as one of the first product related information and the second product related information, described merging list This position of described e-text is labeled as final product related information by unit.
Device the most according to claim 1, wherein, when the primary importance of described e-text It is noted as in the first product related information and the second product related information, and described electronics is civilian This second position comprising described primary importance is noted as the first product related information and second and produces During another in product relevant information, described combining unit is by the described second position of described e-text It is labeled as final product related information.
Device the most according to claim 1, wherein, when the 3rd position of described e-text It is noted as in the first product related information and the second product related information, and described electronics is civilian This with described 3rd partly overlapping 4th position, position be noted as the first product related information and During another in the second product related information, described combining unit is by described the of described e-text The position being noted as the first product related information in three positions and described 4th position is labeled as finally Product related information.
Device the most according to claim 1, wherein, each in described list The brand message of product described in program recording and type information, and described brand message and described type letter Mutual correspondence between breath.
Device the most according to claim 8, also includes post-processing unit, wherein, when two Sequence of terms is respectively denoted as final product related information and between said two sequence of terms There is not any character or only exist space, and said two sequence of terms meets one of following condition Time, said two sequence of terms is labeled as the relevant letter of an overall product by described post-processing unit Breath:
Previous sequence of terms in said two sequence of terms comprises described brand message, and described Later sequence of terms in two sequence of terms does not comprise described brand message;And
Previous sequence of terms in said two sequence of terms comprises described brand message, and described Later sequence of terms in two sequence of terms comprises the described type corresponding with described brand message Information.
10. for the method detecting the product related information in e-text, including:
Obtain the list of product, the brand letter of product described in each program recording in described list Breath;
Input based on user is each entry mark ambiguity tag in described list or non-ambiguity Labelling is to obtain list, and wherein, described ambiguity tag indicates the sequence of terms in described entry Declaration of will has ambiguity, and described non-ambiguity tag then indicates the meaning of the sequence of terms in described entry Think to represent that not there is ambiguity;
Described list is utilized to carry out automatic marking for described product in described e-text, with Obtain the first product related information of automatic marking;
Based on described first product related information, product information identification model is trained, thus generates Training pattern;
Described training pattern is utilized to know for described product related information in described e-text Not, to obtain the second product related information identified;And
Described first product related information and described second product related information are merged, to obtain Final product related information.
CN201510025848.4A 2015-01-19 2015-01-19 Device and method used to detect product relevant information in electronic text Pending CN105868193A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510025848.4A CN105868193A (en) 2015-01-19 2015-01-19 Device and method used to detect product relevant information in electronic text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510025848.4A CN105868193A (en) 2015-01-19 2015-01-19 Device and method used to detect product relevant information in electronic text

Publications (1)

Publication Number Publication Date
CN105868193A true CN105868193A (en) 2016-08-17

Family

ID=56623142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510025848.4A Pending CN105868193A (en) 2015-01-19 2015-01-19 Device and method used to detect product relevant information in electronic text

Country Status (1)

Country Link
CN (1) CN105868193A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241631A (en) * 2016-12-23 2018-07-03 百度在线网络技术(北京)有限公司 For the method and apparatus of pushed information
CN108255806A (en) * 2017-12-22 2018-07-06 北京奇艺世纪科技有限公司 A kind of name recognition methods and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060161537A1 (en) * 2005-01-19 2006-07-20 International Business Machines Corporation Detecting content-rich text
CN1871597A (en) * 2003-08-21 2006-11-29 伊迪利亚公司 System and method for associating documents with contextual advertisements
CN101295294A (en) * 2008-06-12 2008-10-29 昆明理工大学 Improved Bayes acceptation disambiguation method based on information gain
CN101576910A (en) * 2009-05-31 2009-11-11 北京学之途网络科技有限公司 Method and device for identifying product naming entity automatically
CN102033950A (en) * 2010-12-23 2011-04-27 哈尔滨工业大学 Construction method and identification method of automatic electronic product named entity identification system
CN102819576A (en) * 2012-07-23 2012-12-12 无锡雅座在线科技发展有限公司 Data mining method and system based on microblog

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1871597A (en) * 2003-08-21 2006-11-29 伊迪利亚公司 System and method for associating documents with contextual advertisements
US20060161537A1 (en) * 2005-01-19 2006-07-20 International Business Machines Corporation Detecting content-rich text
CN101295294A (en) * 2008-06-12 2008-10-29 昆明理工大学 Improved Bayes acceptation disambiguation method based on information gain
CN101576910A (en) * 2009-05-31 2009-11-11 北京学之途网络科技有限公司 Method and device for identifying product naming entity automatically
CN102033950A (en) * 2010-12-23 2011-04-27 哈尔滨工业大学 Construction method and identification method of automatic electronic product named entity identification system
CN102819576A (en) * 2012-07-23 2012-12-12 无锡雅座在线科技发展有限公司 Data mining method and system based on microblog

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241631A (en) * 2016-12-23 2018-07-03 百度在线网络技术(北京)有限公司 For the method and apparatus of pushed information
CN108241631B (en) * 2016-12-23 2022-09-30 百度在线网络技术(北京)有限公司 Method and device for pushing information
CN108255806A (en) * 2017-12-22 2018-07-06 北京奇艺世纪科技有限公司 A kind of name recognition methods and device
CN108255806B (en) * 2017-12-22 2021-12-17 北京奇艺世纪科技有限公司 Name recognition method and device

Similar Documents

Publication Publication Date Title
CN109348275B (en) Video processing method and device
Liu et al. On the hidden mystery of ocr in large multimodal models
CN102930048B (en) Use the data rich found automatically with reference to the semanteme with vision data
CN112100384B (en) Data viewpoint extraction method, device, equipment and storage medium
Ye et al. Interpreting the rhetoric of visual advertisements
CN106557463A (en) Sentiment analysis method and device
CN110390110B (en) Method and apparatus for pre-training generation of sentence vectors for semantic matching
CN112328800A (en) System and method for automatically generating programming specification question answers
CN108959643A (en) Generate method, apparatus, server and the storage medium of label
FR3015073A1 (en) METHOD AND DEVICE FOR AUTOMATICALLY RECOMMENDING COMPLEX OBJECTS
US20200143159A1 (en) Search device, search method, search program, and recording medium
CN109983473A (en) Flexible integrated identification and semantic processes
CN110276633A (en) Advertisement placement method, system, equipment and storage medium based on online education
Riquelme et al. Explaining VQA predictions using visual grounding and a knowledge base
CN106897274B (en) Cross-language comment replying method
Chen et al. Chain-of-thought prompt distillation for multimodal named entity and multimodal relation extraction
CN105868193A (en) Device and method used to detect product relevant information in electronic text
Intasuwan et al. Text and object detection on billboards
CN109242020A (en) A kind of music field order understanding method based on fastText and CRF
KR101794547B1 (en) System and Method for Automatically generating of personal wordlist and learning-training word
CN112837466A (en) Bill recognition method, device, equipment and storage medium
Shaharabany et al. Similarity maps for self-training weakly-supervised phrase grounding
EP3731108A1 (en) Search system, search method, and program
CN109192201A (en) Voice field order understanding method based on dual model identification
Kim et al. Developing a system for searching a shop name on a mobile device using voice recognition and GPS information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160817

WD01 Invention patent application deemed withdrawn after publication