CN104331395B - The method and apparatus that Chinese trade name is recognized from text - Google Patents

The method and apparatus that Chinese trade name is recognized from text Download PDF

Info

Publication number
CN104331395B
CN104331395B CN201410586116.8A CN201410586116A CN104331395B CN 104331395 B CN104331395 B CN 104331395B CN 201410586116 A CN201410586116 A CN 201410586116A CN 104331395 B CN104331395 B CN 104331395B
Authority
CN
China
Prior art keywords
word
text
attribute information
sentence
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410586116.8A
Other languages
Chinese (zh)
Other versions
CN104331395A (en
Inventor
岳兴明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201410586116.8A priority Critical patent/CN104331395B/en
Publication of CN104331395A publication Critical patent/CN104331395A/en
Application granted granted Critical
Publication of CN104331395B publication Critical patent/CN104331395B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention provides a kind of method and apparatus that Chinese trade name is recognized from text, the Chinese trade name helped to accurately identify in text.This method includes:Preserve merchandise news word dictionary;Determine the sentence attribute information sequence in text;Carry out thick judgement and thin judgement successively to the sentence attribute information sequence;Wherein, the standard that slightly judges is:The sentence attribute information sequence includes centre word, brand and attribute, or includes centre word and more than one attribute;The standard that carefully judges is:The length of sentence attribute information sequence is more than the first preset value and is wherein not belonging to the word proportion of the merchandise news word dictionary less than the second preset value;For the sentence attribute information sequence for meeting the thick judgement simultaneously and carefully judging, Chinese trade name is confirmed as.

Description

The method and apparatus that Chinese trade name is recognized from text
Technical field
The present invention relates to field of computer technology, a kind of particularly method that Chinese trade name is recognized from text And device.
Background technology
Automatic-answering back device machine people increasingly highlights its importance in ecommerce retail trade.Much enter e-business network The businessman of platform, answers the client of consulting, this is to general net to ensure service quality, it is necessary to keep by computer Upper individual operator is undoubtedly a very big human cost.In this case, most of common counseling problem can be solved Automatic-answering back device machine people transports and given birth to, and this can not answer asking for client's consulting when can solve unattended for many businessmans of on-line shop Topic.
In actual consultation process, client often replicate a trade name along with to seek advice from the problem of consulted Ask, the identification of this problem is a difficult point for automatic-answering back device machine people.Because the trade name on website is general longer (the such as commodity of the classification such as mobile phone, clothes, for example:" the trendy pure cotton grid of graceful 2013 winter dress of mattress hits cloth splicing and connects surplus cotton in cap Clothing is orange red "), and these merchandise newss be all by businessman's voluntarily typing and maintenance, can be by marketing promotion, seasonal variety etc. Factor often changes, and the automatic-answering back device machine people for being either also based on Keywords matching based on classifying at present is difficult to this (for example " the trendy pure cotton grid of graceful 2013 winter dress of mattress hits cloth splicing company to problem to center service problem comprising longer trade name Outside orange red this part clothes of surplus cotton dress be what material for making clothes in cap " center service problem be " outside this part clothes what is Material for making clothes ") it is identified, if extracted in the sentence that at this moment trade name can be inputted from client, then sentence is remained Remaining part is analyzed, and such issues that can effectively solve is seeked advice from.So needing one kind to recognize Chinese trade name from text The method of title, in order to recognize the counseling problem for including Chinese trade name so that automatic-answering back device machine people to such issues that enter Row processing.
The content of the invention
In view of this, the present invention provides a kind of method and apparatus that Chinese trade name is recognized from text, contributes to standard Really recognize the Chinese trade name in text.
To achieve the above object, according to an aspect of the invention, there is provided one kind recognizes Chinese trade name from text The method of title.
In the method that Chinese trade name is recognized from text of the present invention, the text is that ecommerce client consults online The text of inquiry topic, this method includes:Step A:Merchandise news word dictionary is preserved, the merchandise news word includes centre word, product Board, attribute and separation word;Step B:The sentence attribute information sequence in text is determined, the sentence attribute information sequence is The sequence that from left to right first separates before word in the text, or first in the text from left to right not The sequence belonged to before the word of the merchandise news word dictionary;Step C:The sentence attribute information sequence is slightly sentenced successively Disconnected and thin judgement;Wherein, the standard that slightly judges is:The sentence attribute information sequence includes centre word, brand and category Property, or include centre word and more than one attribute;The standard that carefully judges is:The length of sentence attribute information sequence is more than First preset value and be wherein not belonging to the merchandise news word dictionary word proportion be less than the second preset value;Step D:It is right In the sentence attribute information sequence for meeting the thick judgement simultaneously and carefully judging, Chinese trade name is confirmed as.
Alternatively, before the step B, judge whether the length of the text exceedes setting value, and only in judged result The step B is carried out in the case of to be.
Alternatively, the span of the setting value is [25,30].
Alternatively, the setting value is determined as follows:Statistics specifies being averaged for the trade name number of words of category commodity Length, or statistics include the average length for the text for specifying category trade name;Then statistical result is multiplied by into one to preset Coefficient obtains the setting value.
Alternatively, the step B includes:Participle, during participle, the word obtained according to participle are carried out to the text Any of the above-described person is not belonging to for the centre word, brand, attribute or separation word, or the word, is respectively word distribution first Mark to the 5th mark;The text after participle is traveled through from left to right, the feelings for the word for being assigned the 4th mark are being traversed Using the sequence before the word as the sentence sequence of attributes in the text under condition, otherwise by last in the text point Be fitted on it is non-5th mark word and the word before sequence as the sentence sequence of attributes in the text.
Alternatively, the scope of the first preset value in the standard carefully judged is [6,10], the scope of the second preset value It is [0.25,0.45].
Alternatively, the text for the problem of text is the on-line consulting of ecommerce client.
There is provided a kind of device that Chinese trade name is recognized from text according to another aspect of the present invention.
In the device that Chinese trade name is recognized from text of the present invention, the text is that ecommerce client consults online The text of inquiry topic, the device includes:Merchandise news word dictionary module, for preserving merchandise news word dictionary, the commodity letter Ceasing word includes centre word, brand, attribute and separates word;Sentence attribute information block, for determining the sentence in text Attribute information sequence, the sentence attribute information sequence is first in the text from left to right sequence separated before word Arrange, or the sequence that first in the text from left to right is not belonging to before the word of the merchandise news word dictionary;Sentence Disconnected module, for carrying out thick judgement and thin judgement successively to the sentence attribute information sequence;Wherein, the standard slightly judged It is:The sentence attribute information sequence includes centre word, brand and attribute, or includes centre word and more than one attribute; The standard that carefully judges is:The length of sentence attribute information sequence is more than the first preset value and is wherein not belonging to the commodity The word proportion of information word dictionary is less than the second preset value;Output module, for for while meeting the thick judgement and thin The sentence attribute information sequence of judgement, exports the sentence attribute information sequence.
Alternatively, in addition to pre- judge module, for determining text in the sentence attribute information block in sentence Before sub- attribute information sequence, judge whether the length of the text exceedes setting value, and the situation for being only yes in judged result The lower triggering sentence attribute information block.
Alternatively, the pre- judge module is additionally operable to:Statistics specifies the average length of the trade name number of words of category commodity, Or statistics includes the average length for the text for specifying category trade name;Then statistical result is multiplied by into a predetermined coefficient to obtain To the setting value.
Alternatively, the sentence attribute information block is additionally operable to:Participle is carried out to the text, in participle process In, the word obtained according to participle is the centre word, brand, attribute or separate word, or the word be not belonging to it is any of the above-described Person, is respectively that word distribution first is marked to the 5th mark;The text after participle is traveled through from left to right, traversed point Using the sequence before the word as the sentence sequence of attributes in the text in the case of word equipped with the 4th mark, otherwise by institute Sequence before last stated in text is assigned to the word of non-5th mark and the word belongs to as the sentence in the text Property sequence.
Alternatively, the text for the problem of text is the on-line consulting of ecommerce client.
Technique according to the invention scheme, the word in text is analyzed based on merchandise news word dictionary, contributes to accurate knowledge Chinese trade name in other text.This point can be confirmed well by real system test.
Brief description of the drawings
Accompanying drawing is used to more fully understand the present invention, does not constitute inappropriate limitation of the present invention.Wherein:
Fig. 1 is the signal of the key step of the method according to embodiments of the present invention that Chinese trade name is recognized from text Figure;
Fig. 2 is the signal of the main modular of the device according to embodiments of the present invention that Chinese trade name is recognized from text Figure.
Embodiment
The one exemplary embodiment of the present invention is explained below in conjunction with accompanying drawing, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize Arrive, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together Sample, for clarity and conciseness, eliminates the description to known function and structure in following description.
The technical scheme of the present embodiment can recognize Chinese trade name from text, and this text can be ecommerce Client's on-line consulting when text the problem of the propose or text in other situations.In the skill of the embodiment of the present invention In art scheme, mainly by analyzing the amount that the word belonged in trade name occurs in the text, to judge whether wrapped in the text Containing Chinese trade name.The technical scheme to the embodiment of the present invention elaborates below, wherein mainly being existed with above-mentioned client Exemplified by the text for the problem of line is proposed when seeking advice from.
Fig. 1 is the signal of the key step of the method according to embodiments of the present invention that Chinese trade name is recognized from text Figure.As shown in figure 1, this method mainly includes steps S11 to step S14.
Step S11:Preserve merchandise news word dictionary.Merchandise news word includes centre word, brand, attribute and separates word. Centre word is the information for representing commodity category, such as mobile phone, computer, trousers, shirt.Attribute is expression in e-commerce field The information of the attribute of commodity, the information such as the material of commodity, technique, size, author.By to a large amount of Chinese trade names Analysis can show that centre word, brand, attribute are three topmost features that trade name is generally comprised.Merchandise news word Specific word in dictionary is to be obtained using technological means such as web crawlers from e-commerce website, is obtaining brand and attribute word When, preferably captured from the page comprising commodity category, such as brand, chi are obtained from the page comprising clothing commodity Code, material etc..It should be noted that for some commodity, may be without brand message.
Separate word generally adjacent with trade name in text.The problem of by including Chinese trade name to 1500, is carried out Analysis, the composition pattern for finding most of problem is that " Chinese trade name+separation word is (such as:May I ask, this, have, be)+ Central issue ", such as " the trendy pure cotton grid of graceful 2013 winter dress of mattress hits cloth splicing and connected in cap outside this orange red part clothes of surplus cotton dress Face is any material for making clothes "." being what material for making clothes outside this part clothes " therein is exactly central issue.Central issue is usually client's official communication Essentiality content in inquiry topic.This kind of word is simultaneously few, in the present embodiment, and six are only extracted after being counted to problem: Please, go back, have, not having, this, be, it is this kind of separation word also add merchandise news word dictionary.It can increase as needed in the implementation point Every the quantity of word.
It is after merchandise news word dictionary is saved, it is possible to which target text is handled, such as online to user to consult The problem of inquiry, is handled, before treatment it is possible if desired to first be pre-processed to target text, including complicated and simple conversion, Blank character, punctuate etc. are removed, replacement of website links etc..Subsequently into step S12.
Step S12:For current target text, the sentence attribute information sequence in text is determined.Sentence attribute information Sequence is first in the text from left to right sequence separated before word, or first in text from left to right does not belong to Sequence before the word of merchandise news word dictionary.So can be by being traveled through from left to right to text, according to above-mentioned sentence The definition of attribute information sequence is determined to it.In the present embodiment, above-mentioned processing is carried out using mark, is carried out in two steps.
The first step:Sentence participle and lexical feature mark.Using Forward Maximum Method to target text or pretreated Text carries out participle, and for 6 characters, (value is set by counting the vocabulary length of merchandise news dictionary to the maximum length of matching , other values can also be set).During participle, if match the centre word of merchandise news word dictionary, brand, attribute, With separation word, different numeral marks are given respectively:1st, 2,3,4, the monocase that dictionary is not matched is labeled as 0.Such as Text:" it is orange red in stockit is available that the trendy pure cotton grid of graceful 2013 winter dress of mattress hits surplus cotton dress in cloth splicing company cap ", word segmentation processing is simultaneously And be after addition mark:The trendy 3# pure cottons 3# grid 3# of the graceful 2#20133# winter dresses 1# of mattress hit cloth 3# splicings 3# and connect surplus 3# in cap 3# The orange red 3# of cotton dress 1# have 4# goods 0# 0#.
Second step:Extract sentence attribute information sequence.If extracting the sequence before 4 comprising word mark 4 is separated in sequence The attribute information sequence as the sentence is arranged, such as the sentence information sequence above in example is:231333333313400, then sentence Attribute information sequence is:231333333313;Separate word mark if do not included in sequence, extract in whole sentence sequence most Subsequence before latter non-zero mark is as attribute information sequence, such as sentence information sequence is: 2313333300130000, then sentence attribute information sequence be:231333330013.
It should be noted that because Chinese trade name is long, the text comprising Chinese trade name is also corresponding It is longer, so before step S12, length that can be first to text is judged in advance, can directly it assert if comparison is short It does not include Chinese trade name.By the statistics to the Chinese trade name in 3,630,000 websites, its average length is 36.03 Character;By the statistics for client's counseling problem that 140,000 are not included with Chinese trade name, the average length of such problem is 7.8 characters.It is 25 to 30 characters to refer to threshold length of this data setting comprising Chinese trade name sentence, and sentence is long Degree is believed that it includes Chinese trade name more than the threshold value.The threshold value is less than the average length of Chinese trade name, is to examine Considering also has a considerable amount of Chinese trade name length to be less than the average length, needs in actual applications as much as possible by machine Device handles client's counseling problem, so needing that the text comprising Chinese trade name is included into process range as far as possible.In order to obtain Preferably judge effect in advance, the average length of its trade name of the commodity statistics of some specific category, or statistics bag can be directed to The average length of the text of trade name containing the specific category, is then multiplied by a predetermined coefficient, such as 0.7 by statistical result Or 0.75, as the threshold value of above-mentioned pre- judgement, for example discuss that client seeks advice from the forum of the category as being directed to the text of the category The problem of be less than the threshold value, then it is assumed that the text is comprising Chinese trade name.
Step S13:Carry out thick judgement and thin judgement successively to sentence attribute information sequence.Due to the Chinese trade name in part Not comprising this feature of brand, so thick criterion there are two, wherein one is met i.e. by slightly judging:1st, the attribute of sentence Information sequence comprising centre word, attribute and brand, is such as handled by the way of addition mark above simultaneously, then needed in sequence Simultaneously comprising 1,2 and 3, then thick judged result is true;2nd, the attribute information sequence of sentence includes centre word and more than one category Property word, i.e. sequence in include 1, and more than one 3, then thick judged result is true.It is genuine text for thick judged result, enters One step carries out following thin judgements.
Due to including the vocabulary of noncommodity information dictionary in many Chinese trade names, such as "【Bag postal】Man cultivates one's moral character Pure cotton Korea Spro version stand-up collar leisure long-sleeved blouse ", due to the business vocabulary that " bag postal " belongs in customer issue, participle is simultaneously added after mark Result be:Bag 0# postal 0# men 3# cultivates one's moral character 3# pure cottons Korea Spro 3# version 3# stand-up collars 3# leisure 3# long-sleeved blouse 1#, the attribute letter of this Ceasing sequence is:00333331.Statistical analysis is carried out by the attribute information sequence to 1,100,000 Chinese trade names, it is averaged Sequence length is 9.3, and the average accounting of 0 quantity is 0.26 in sequence, according to this data, sets the length threshold of attribute information sequence Value N scope is [6,10], and 0 quantity accounting threshold value P scope is [0.25,0.45], if attribute information sequence length is big It is less than P in N and 0 quantity accounting, then thin judged result is true.
Step S14:Output passes sequentially through the thick text for judging and carefully judging.This step is to confirm in current goal text Include Chinese trade name.In this case, it is easy to analyze in the other parts in current goal text, such as client's consulting The central issue of appearance, and then processing is further analysed by automatic-answering back device machine people.
Fig. 2 is the signal of the main modular of the device according to embodiments of the present invention that Chinese trade name is recognized from text Figure.As shown in Fig. 2 the device 20 of Chinese trade name is recognized from text mainly includes merchandise news word dictionary module 21, sentence Sub- attribute information block 22, judge module 23 and output module 24.
Merchandise news word dictionary module 21, for preserving merchandise news word dictionary.Sentence attribute information block 22, is used In it is determined that sentence attribute information sequence in text.Judge module 23, for being carried out successively to the sentence attribute information sequence It is thick to judge and thin judgement.Output module 24, for the sentence attribute information sequence for meeting the thick judgement simultaneously and carefully judging Row, export the sentence attribute information sequence.
Recognize that the device 20 of Chinese trade name may also include pre- judge module from text, in sentence attribute information Block is determined before the sentence attribute information sequence in text, judges that whether the length of text exceedes setting value, and only exist Judged result is triggering sentence attribute information block 22 in the case of being.The pre- judge module can be additionally used in:Statistics is specified The average length of the trade name number of words of category commodity, or statistics include the average length for the text for specifying category trade name Degree;Then statistical result is multiplied by a predetermined coefficient and obtains above-mentioned setting value.
Sentence attribute information block 22 can be additionally used in:Participle is carried out to text, during participle, obtained according to participle Word centered on the word arrived, brand, attribute or separate word, or the word is not belonging to any of the above-described person, be respectively word distribution the One mark to the 5th mark;Text after participle is traveled through from left to right, the situation for the word for being assigned the 4th mark is being traversed Otherwise last in the text be assigned to non-by the lower sequence using before the word as the sentence sequence of attributes in the text Sequence before the word and the word of 5th mark is used as the sentence sequence of attributes in the text.
Technical scheme according to embodiments of the present invention, analyzes the word in text based on merchandise news word dictionary, contributes to Accurately identify the Chinese trade name in text.Technical scheme to the embodiment of the present invention carries out testing as follows:
Test environment:Associate the operating system of ThinkPad 430C, Windows 7,4G internal memories, Intel Duo i5 processing Device, code is write with Python.
Testing material:(sentence length is big for 5000 long sentence problems on clothes category of the actual consulting of certain website client In 25) comprising 3000 the problem of Chinese trade name.
Merchandise news dictionary vocabulary:Clothing brand identity vocabulary 11940, clothing attributive character vocabulary 2142 Individual, clothing centre word feature vocabulary 1070 separates word 6.
Important parameter is set in program:The length threshold for being determined as long sentence is 25, sentence attribute information sequence length threshold value For 6,0 quantity accounting threshold value is 0.42 in attribute information sequence.
Test result:Identification obtains 2862 long sentences for including Chinese trade name, wherein, accuracy rate is 97.62%, i.e., 2794 identification is correct;Recall rate is 93.13%, i.e., 206 are not called back and (in step S12 pre- judgement, actually contain The text of Chinese trade name is considered as not including, then it is not called back).
Test result:The technical scheme of the embodiment of the present invention can recognize and extract well the Chinese trade name in long sentence Claim, can be used in actual ecommerce retail trade automatic-answering back device machine people, this is for improving problem identification rate, lifting visitor Family satisfaction will be very helpful.
The general principle of the present invention is described above in association with specific embodiment, however, it is desirable to, it is noted that to this area For those of ordinary skill, it is to be understood that the whole or any steps or part of the process and apparatus of the present invention, Ke Yi In any computing device (including processor, storage medium etc.) or the network of computing device, with hardware, firmware, software or Combinations thereof is realized that this is that those of ordinary skill in the art use them in the case where having read the explanation of the present invention Basic programming skill can be achieved with.
Therefore, the purpose of the present invention can also by run on any computing device a program or batch processing come Realize.The computing device can be known fexible unit.Therefore, the purpose of the present invention can also be included only by offer Realize that the program product of the program code of methods described or device is realized.That is, such program product is also constituted The present invention, and the storage medium for such program product that is stored with also constitutes the present invention.Obviously, the storage medium can be Any known storage medium or any storage medium developed in the future.
It may also be noted that in apparatus and method of the present invention, it is clear that each part or each step are to decompose And/or reconfigure.These decompose and/or reconfigured the equivalents that should be regarded as the present invention.Also, perform above-mentioned series The step of processing can order naturally following the instructions perform in chronological order, but and need not necessarily sequentially in time Perform.Some steps can be performed parallel or independently of one another.
Above-mentioned embodiment, does not constitute limiting the scope of the invention.Those skilled in the art should be bright It is white, depending on design requirement and other factors, can occur various modifications, combination, sub-portfolio and replacement.It is any Modifications, equivalent substitutions and improvements made within the spirit and principles in the present invention etc., should be included in the scope of the present invention Within.

Claims (12)

1. the method for Chinese trade name is recognized from text, it is characterised in that the text is that ecommerce client consults online The text of inquiry topic, this method includes:
Step A:Merchandise news word dictionary is preserved, the merchandise news word includes centre word, brand, attribute and separates word;
Step B:Determine the sentence attribute information sequence in text, the sentence attribute information sequence is from left-hand in the text Right first separates the sequence before word, or first in the text from left to right is not belonging to the merchandise news Sequence before the word of word dictionary;
Step C:Carry out thick judgement and thin judgement successively to the sentence attribute information sequence;Wherein, the standard slightly judged It is:The sentence attribute information sequence includes centre word, brand and attribute, or includes centre word and more than one attribute; The standard that carefully judges is:The length of sentence attribute information sequence is more than the first preset value and is wherein not belonging to the commodity The word proportion of information word dictionary is less than the second preset value;
Step D:For the sentence attribute information sequence for meeting the thick judgement simultaneously and carefully judging, Chinese trade name is confirmed as Claim.
2. according to the method described in claim 1, it is characterised in that before the step B, judge the text length whether More than setting value, and the step B is only carried out in the case where the judgment result is yes.
3. method according to claim 2, it is characterised in that the span of the setting value is [25,30].
4. method according to claim 2, it is characterised in that the setting value is determined as follows:Statistics specifies product The average length of the trade name number of words of class commodity, or statistics include the average length for the text for specifying category trade name; Then statistical result is multiplied by a predetermined coefficient and obtains the setting value.
5. according to the method described in claim 1, it is characterised in that the step B includes:
Participle is carried out to the text, during participle, the word obtained according to participle is the centre word, brand, attribute or Person separates word, or the word is not belonging to any of the above-described person, is respectively that word distribution first is marked to the 5th mark;
The text after participle is traveled through from left to right, traverse be assigned the 4th mark word in the case of by the word it Otherwise last in the text is assigned to non-5th mark by preceding sequence as the sentence sequence of attributes in the text Sequence before the word of note and the word is used as the sentence sequence of attributes in the text.
6. according to the method described in claim 1, it is characterised in that the scope of the first preset value in the standard carefully judged It is [6,10] that the scope of the second preset value is [0.25,0.45].
7. method according to any one of claim 1 to 6, it is characterised in that the text is ecommerce client The text of the problem of on-line consulting.
8. the device of Chinese trade name is recognized from text, it is characterised in that the text is that ecommerce client consults online The text of inquiry topic, the device includes:
Merchandise news word dictionary module, for preserving merchandise news word dictionary, the merchandise news word include centre word, brand, Attribute and separation word;
Sentence attribute information block, for determining the sentence attribute information sequence in text, the sentence attribute information sequence Row are first in the text from left to right sequences separated before word, or in the text from left to right first The individual sequence being not belonging to before the word of the merchandise news word dictionary;
Judge module, for carrying out thick judgement and thin judgement successively to the sentence attribute information sequence;Wherein, the thick judgement Standard be:The sentence attribute information sequence includes centre word, brand and attribute, or comprising centre word and one with Upper attribute;The standard that carefully judges is:The length of sentence attribute information sequence is more than the first preset value and is wherein not belonging to The word proportion of the merchandise news word dictionary is less than the second preset value;
Output module, for the sentence attribute information sequence for meeting the thick judgement simultaneously and carefully judging, exports the sentence Attribute information sequence.
9. device according to claim 8, it is characterised in that also including pre- judge module, in the sentence attribute Information sequence module is determined before the sentence attribute information sequence in text, judges whether the length of the text exceedes setting Value, and the sentence attribute information block is only triggered in the case where the judgment result is yes.
10. device according to claim 9, it is characterised in that the pre- judge module is additionally operable to:Statistics specifies category business The average length of the trade name number of words of product, or statistics include the average length for the text for specifying category trade name;Then Statistical result is multiplied by a predetermined coefficient and obtains the setting value.
11. device according to claim 8, it is characterised in that the sentence attribute information block is additionally operable to:
Participle is carried out to the text, during participle, the word obtained according to participle is the centre word, brand, attribute or Person separates word, or the word is not belonging to any of the above-described person, is respectively that word distribution first is marked to the 5th mark;
The text after participle is traveled through from left to right, traverse be assigned the 4th mark word in the case of by the word it Otherwise last in the text is assigned to non-5th mark by preceding sequence as the sentence sequence of attributes in the text Sequence before the word of note and the word is used as the sentence sequence of attributes in the text.
12. the device according to any one of claim 8 to 11, it is characterised in that the text is ecommerce client On-line consulting the problem of text.
CN201410586116.8A 2014-10-28 2014-10-28 The method and apparatus that Chinese trade name is recognized from text Active CN104331395B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410586116.8A CN104331395B (en) 2014-10-28 2014-10-28 The method and apparatus that Chinese trade name is recognized from text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410586116.8A CN104331395B (en) 2014-10-28 2014-10-28 The method and apparatus that Chinese trade name is recognized from text

Publications (2)

Publication Number Publication Date
CN104331395A CN104331395A (en) 2015-02-04
CN104331395B true CN104331395B (en) 2017-11-03

Family

ID=52406124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410586116.8A Active CN104331395B (en) 2014-10-28 2014-10-28 The method and apparatus that Chinese trade name is recognized from text

Country Status (1)

Country Link
CN (1) CN104331395B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045909B (en) * 2015-08-11 2018-04-03 北京京东尚科信息技术有限公司 The method and apparatus that trade name is identified from text
CN108345620B (en) * 2017-01-24 2021-05-25 北京京东尚科信息技术有限公司 Brand information processing method, brand information processing device, storage medium and electronic equipment
CN109190122B (en) * 2018-09-03 2023-04-18 上海腾道信息技术有限公司 Commodity naming identification method applied to international trade field
CN111651984A (en) * 2019-02-19 2020-09-11 北京京东尚科信息技术有限公司 Method and device for processing article description text and computer readable storage medium
CN111339253A (en) * 2020-02-25 2020-06-26 中国建设银行股份有限公司 Method and device for extracting article information

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102033950A (en) * 2010-12-23 2011-04-27 哈尔滨工业大学 Construction method and identification method of automatic electronic product named entity identification system
CN103631948A (en) * 2013-12-11 2014-03-12 北京京东尚科信息技术有限公司 Identifying method of named entities

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102033950A (en) * 2010-12-23 2011-04-27 哈尔滨工业大学 Construction method and identification method of automatic electronic product named entity identification system
CN103631948A (en) * 2013-12-11 2014-03-12 北京京东尚科信息技术有限公司 Identifying method of named entities

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
在篇章中面向产品类的命名实体识别研究;李治国等;《第三届学生计算语言学研讨会论文集》;20071105;第3页第3.2节、第4.1节,第5页第4.3节 *
面向商务信息抽取的产品命名实体识别研究;刘非凡等;《中文信息学报》;20060130;第20卷(第1期);全文 *

Also Published As

Publication number Publication date
CN104331395A (en) 2015-02-04

Similar Documents

Publication Publication Date Title
CN104331395B (en) The method and apparatus that Chinese trade name is recognized from text
CN102866990B (en) A kind of theme dialogue method and device
CN112035742B (en) User portrait generation method, device, equipment and storage medium
CN105808526B (en) Commodity short text core word extracting method and device
US9514202B2 (en) Information processing apparatus, information processing method, program for information processing apparatus and recording medium
CN110298245B (en) Interest collection method, interest collection device, computer equipment and storage medium
CN105447186B (en) A kind of user behavior analysis system based on big data platform
CN105069654A (en) User identification based website real-time/non-real-time marketing investment method and system
CN109783632A (en) Customer service information-pushing method, device, computer equipment and storage medium
CN109635117A (en) A kind of knowledge based spectrum recognition user intention method and device
CN104077407B (en) A kind of intelligent data search system and method
CN108491388B (en) Data set acquisition method, classification method, device, equipment and storage medium
CN106056407A (en) Online banking user portrait drawing method and equipment based on user behavior analysis
Siqueira et al. A feature extraction process for sentiment analysis of opinions on services
CN105045909B (en) The method and apparatus that trade name is identified from text
KR20210036184A (en) Item recommendation module based on user taste information and method for identifying user taste information
CN109872162A (en) A kind of air control classifying identification method and system handling customer complaint information
CN112291423B (en) Communication call intelligent response processing method and device, electronic equipment and storage medium
CN108090216A (en) A kind of Tag Estimation method, apparatus and storage medium
CN103886869B (en) A kind of information feedback method based on speech emotion recognition and system
KR20180064324A (en) Appartus for recommendating of product and method using the same
CN103389981B (en) Network label automatic identification method and its system
CN107688594B (en) The identifying system and method for risk case based on social information
CN106022860A (en) Matching method and apparatus
CN107609921A (en) A kind of data processing method and server

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant