CN104331395A - Method and device for identifying Chinese product name from text - Google Patents

Method and device for identifying Chinese product name from text Download PDF

Info

Publication number
CN104331395A
CN104331395A CN201410586116.8A CN201410586116A CN104331395A CN 104331395 A CN104331395 A CN 104331395A CN 201410586116 A CN201410586116 A CN 201410586116A CN 104331395 A CN104331395 A CN 104331395A
Authority
CN
China
Prior art keywords
word
text
attribute information
sentence
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410586116.8A
Other languages
Chinese (zh)
Other versions
CN104331395B (en
Inventor
岳兴明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201410586116.8A priority Critical patent/CN104331395B/en
Publication of CN104331395A publication Critical patent/CN104331395A/en
Application granted granted Critical
Publication of CN104331395B publication Critical patent/CN104331395B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for identifying a Chinese product name from a text, which is beneficial to accurate identification of the Chinese product name in the text. The method comprises the following steps: saving a product information word library; determining a sentence attribute information sequence in the text; performing rough judgment and fine judgment on the sentence attribute information sequence in sequence, wherein the standard of the rough judgment is that the sentence attribute information sequence comprises a key word, a brand and an attribute, or comprises a key word and more than one attribute; the standard of the fine judgment is that the sentence attribute information sequence is greater than a first preset value, and the proportion of words which do not belong to the product information word library is smaller than a second preset value; determining that the sentence attribute information sequence which simultaneously satisfies the rough judgment and fine judgment as the Chinese product name.

Description

The method and apparatus of Chinese trade name is identified from text
Technical field
The present invention relates to field of computer technology, particularly a kind of method and apparatus identifying Chinese trade name from text.
Background technology
Auto answer machine people day by day highlights its importance in ecommerce retail trade.Much enter the businessman of electronic commerce Website platform, in order to guarantee service quality, need to keep and answering the other client to consulting of computer, this is undoubtedly a very large human cost to general online individual operator.In this case, can solve the auto answer machine people fortune of most of common counseling problem and give birth to, this can for answering client the problem of consulting when businessman of a lot of on-line shop solves unmanned.
In actual consultation process, client often can copy a trade name and add the problem that will seek advice from and seek advice from, and the identification of this problem is a difficult point concerning auto answer machine people.Because general longer (the such as mobile phone of the trade name on website, the commodity of the classifications such as clothes, such as: " it is orange red that the trendy textile grid of graceful 2013 winter dress of mattress hits surplus cotton dress in cloth splicing company cap "), and these merchandise newss are all by businessman's typing and maintenance voluntarily, marketing promotion can be subject to, the factors such as seasonal variety often change, at present identify based on classification or the center service problem being all difficult to comprise longer trade name to this based on the auto answer machine people of Keywords matching (such as the center service problem of problem " the trendy textile grid of graceful 2013 winter dress of mattress hit cloth splicing connect in cap be what material for making clothes outside surplus cotton dress this part clothes orange red " is " being what material for making clothes outside this part clothes "), if at this moment trade name can be extracted from the sentence that client inputs, again the remainder of sentence is analyzed, effectively can solve the consulting of this kind of problem.So need a kind of method identifying Chinese trade name from text, so that identification comprises the counseling problem of Chinese trade name thus makes auto answer machine people process this kind of problem.
Summary of the invention
In view of this, the invention provides a kind of method and apparatus identifying Chinese trade name from text, contribute to the Chinese trade name accurately identified in text.
For achieving the above object, according to an aspect of the present invention, a kind of method identifying Chinese trade name from text is provided.
Of the present inventionly from text, identify that the method for Chinese trade name comprises: steps A: preserve merchandise news word dictionary, described merchandise news word comprises centre word, brand, attribute and separates word; Step B: determine the sentence attribute information sequence in text, described sentence attribute information sequence is the sequence that in described text, from left to right first separates before word, or first sequence do not belonged to before the word of described merchandise news word dictionary from left to right in described text; Step C: thick judgement and thin judgement are carried out successively to described sentence attribute information sequence; Wherein, the standard of described thick judgement is: described sentence attribute information sequence comprises centre word, brand and attribute, or comprises centre word and more than one attribute; The standard of described thin judgement is: sentence attribute information sequence is greater than the first preset value and the word proportion wherein not belonging to described merchandise news word dictionary is less than the second preset value; Step D: for meeting described thick judgement and the thin sentence attribute information sequence judged simultaneously, confirm as Chinese trade name.
Alternatively, before described step B, judge whether the length of described text exceedes setting value, and only carry out described step B when judged result is for being.
Alternatively, the span of described setting value is [25,30].
Alternatively, described setting value is determined as follows: statistics specifies the average length of the trade name number of words of category commodity, or statistics comprises the average length of the text of specifying category trade name; Then statistics is multiplied by a predetermined coefficient and obtains described setting value.
Alternatively, described step B comprises: carry out participle to described text, and in participle process, the word obtained according to participle is described centre word, brand, attribute or separation word, or this word do not belong to above-mentioned any one, be respectively this word and distribute and first be tagged to the 5th mark; Described text after participle is traveled through from left to right, when traverse be assigned the 4th mark word using the sequence before this word as the sentence sequence of attributes in described text, otherwise last in described text is assigned to non-5th mark word and this word before sequence as the sentence sequence of attributes in described text.
Alternatively, the scope of the first preset value in the standard of described thin judgement is [6,10], and the scope of the second preset value is [0.25,0.45].
Alternatively, described text is the text of the problem of the on-line consulting of ecommerce client.
According to a further aspect in the invention, a kind of device identifying Chinese trade name from text is provided.
Of the present inventionly from text, identify that the device of Chinese trade name comprises: merchandise news word dictionary module, for preserving merchandise news word dictionary, described merchandise news word comprises centre word, brand, attribute and separates word; Sentence attribute information block, for determining the sentence attribute information sequence in text, described sentence attribute information sequence is the sequence that in described text, from left to right first separates before word, or first sequence do not belonged to before the word of described merchandise news word dictionary from left to right in described text; Judge module, for carrying out thick judgement and thin judgement successively to described sentence attribute information sequence; Wherein, the standard of described thick judgement is: described sentence attribute information sequence comprises centre word, brand and attribute, or comprises centre word and more than one attribute; The standard of described thin judgement is: sentence attribute information sequence is greater than the first preset value and the word proportion wherein not belonging to described merchandise news word dictionary is less than the second preset value; Output module, for for meeting described thick judgement and the thin sentence attribute information sequence judged simultaneously, exports this sentence attribute information sequence.
Alternatively, also comprise pre-judge module, before the sentence attribute information sequence in described sentence attribute information block determination text, judge whether the length of described text exceedes setting value, and only trigger described sentence attribute information block when judged result is for being.
Alternatively, described pre-judge module also for: statistics specifies the average length of the trade name number of words of category commodity, or statistics comprises the average length of the text of specifying category trade name; Then statistics is multiplied by a predetermined coefficient and obtains described setting value.
Alternatively, described sentence attribute information block also for: participle is carried out to described text, in participle process, the word obtained according to participle is described centre word, brand, attribute or separation word, or this word do not belong to above-mentioned any one, be respectively this word and distribute and first be tagged to the 5th mark; Described text after participle is traveled through from left to right, when traverse be assigned the 4th mark word using the sequence before this word as the sentence sequence of attributes in described text, otherwise last in described text is assigned to non-5th mark word and this word before sequence as the sentence sequence of attributes in described text.
Alternatively, described text is the text of the problem of the on-line consulting of ecommerce client.
According to technical scheme of the present invention, analyze the word in text based on merchandise news word dictionary, contribute to the Chinese trade name accurately identified in text.This point can be confirmed well by real system test.
Accompanying drawing explanation
Accompanying drawing is used for understanding the present invention better, does not form inappropriate limitation of the present invention.Wherein:
Fig. 1 is the schematic diagram identifying the key step of the method for Chinese trade name from text according to the embodiment of the present invention;
Fig. 2 is the schematic diagram identifying the main modular of the device of Chinese trade name from text according to the embodiment of the present invention.
Embodiment
Below in conjunction with accompanying drawing, one exemplary embodiment of the present invention is explained, comprising the various details of the embodiment of the present invention to help understanding, they should be thought it is only exemplary.Therefore, those of ordinary skill in the art will be appreciated that, can make various change and amendment, and can not deviate from scope and spirit of the present invention to the embodiments described herein.Equally, for clarity and conciseness, the description to known function and structure is eliminated in following description.
The technical scheme of the present embodiment can identify Chinese trade name from text, the text of the problem proposed when this text can be client's on-line consulting of ecommerce, also can be the text in other situations.In the technical scheme of the embodiment of the present invention, mainly through analyzing the amount belonging to the word in trade name and occur in the text, judge whether comprise Chinese trade name in the text.Below the technical scheme of the embodiment of the present invention is elaborated, wherein main for the text of the problem proposed during above-mentioned client's on-line consulting.
Fig. 1 is the schematic diagram identifying the key step of the method for Chinese trade name from text according to the embodiment of the present invention.As shown in Figure 1, the method mainly comprises following step S11 to step S14.
Step S11: preserve merchandise news word dictionary.Merchandise news word comprises centre word, brand, attribute and separates word.Centre word is the information representing commodity category, such as mobile phone, computer, trousers, shirt etc.Attribute is the information of the attribute representing commodity in e-commerce field, the information such as material, technique, size, author of such as commodity.By drawing the analysis of a large amount of Chinese trade name, centre word, brand, attribute are three topmost features that trade name generally comprises.Concrete word in merchandise news word dictionary adopts the technological means such as web crawlers to obtain from e-commerce website, when obtaining brand and attribute word, preferably capture from the page comprising commodity category, such as, from the page comprising clothing commodity, obtain brand, size, material etc.It should be noted that, for some commodity, brand message be there is no.
Separate word usually adjacent with trade name at text.Analyzed by the problem comprising Chinese trade name to 1500, find the formation pattern of most of problem be " Chinese trade name+separation word (and such as: may I ask, this, have, be)+central issue ", such as " the trendy textile grid of graceful 2013 winter dress of mattress hit cloth splicing connect in cap be what material for making clothes outside surplus cotton dress this part clothes orange red "." being what material for making clothes outside this part clothes " is wherein exactly central issue.Central issue is generally the essentiality content in client's counseling problem.This kind of word is also few, in the present embodiment, is only extracted six after adding up to problem: ask, go back, have, do not have, this, be that this kind of separation word is also added merchandise news word dictionary.The quantity separating word can be increased in the implementation as required.
After saving merchandise news word dictionary, just can process target text, such as the problem of user's on-line consulting is processed, if needed before treatment, first can carry out pre-service to target text, comprise complicated and simple conversion, the removal such as blank character, punctuate, the replacement etc. of website links.Then step S12 is entered.
Step S12: for current target text, determines the sentence attribute information sequence in text.Sentence attribute information sequence is the sequence that in text, from left to right first separates before word, or first sequence do not belonged to before the word of merchandise news word dictionary from left to right in text.So by traveling through from left to right text, according to the definition of above-mentioned sentence attribute information sequence, it is determined.In the present embodiment, utilize mark to carry out above-mentioned process, carry out in two steps.
The first step: sentence participle and lexical feature mark.Adopt Forward Maximum Method to carry out participle to target text or pretreated text, the maximum length of coupling is 6 characters (this value is by the vocabulary length setting of statistics merchandise news dictionary, also can set other values).In participle process, if match centre word, brand, the attribute of merchandise news word dictionary and separate word, give different figure notations respectively: 1,2,3,4, the monocase not matching dictionary is labeled as 0.Such as text: " the trendy textile grid of graceful 2013 winter dress of mattress hit cloth splicing connect surplus cotton dress in cap orange red in stockit is available ", word segmentation processing and after adding mark be: the trendy 3# of mattress graceful 2#20133# winter dress 1# textile 3# grid 3# hits cloth 3# and splices 3# and connect the orange red 3# of surplus 3# cotton dress 1# in cap 3# and have 4# goods 0# 0#.
Second step: extract sentence attribute information sequence.Separate word mark 4 as comprised in infructescence, then extract the attribute information sequence of the sequence before 4 as this sentence, the sentence information sequence such as gone up in example is: 231333333313400, then sentence attribute information sequence is: 231333333313; Word mark is separated as do not comprised in infructescence, then extract subsequence in whole sentence sequence before last non-zero mark as attribute information sequence, such as sentence information sequence is: 2313333300130000, then sentence attribute information sequence is: 231333330013.
It should be noted that, because Chinese trade name is long, so the text comprising Chinese trade name is also corresponding longer, so before step S12, first can judge in advance the length of text, if shorter, directly can assert that it does not comprise Chinese trade name.By the statistics to 3,630,000 website Chinese trade names, its average length is 36.03 characters; By not comprising the statistics of client's counseling problem of Chinese trade name to 140,000, the average length of such problem is 7.8 characters.The threshold length that can comprise Chinese trade name sentence with reference to this data setting is 25 to 30 characters, and sentence length exceedes this threshold value all can think that it comprises Chinese trade name.This threshold value is less than the average length of Chinese trade name, consider also to have a considerable amount of Chinese trade name length to be less than this average length, need in actual applications as much as possible by machine processing client counseling problem, so need to include the text comprising Chinese trade name in process range as far as possible.In order to obtain better judging effect in advance, can for the average length of its trade name of commodity statistics of certain concrete category, or statistics comprises the average length of the text of the trade name of this concrete category, then statistics is multiplied by a predetermined coefficient, such as 0.7 or 0.75, as the threshold value of above-mentioned pre-judgement, the problem that client's consulting in the forum of this category is such as discussed as the text for this category is less than this threshold value, then think that the text does not comprise Chinese trade name.
Step S13: thick judgement and thin judgement are carried out successively to sentence attribute information sequence.Because the Chinese trade name of part does not comprise this feature of brand, so thick criterion has two, meet wherein one namely judged by thick: 1, the attribute information sequence of sentence comprises centre word, attribute and brand simultaneously, as adopt above add tagged mode process, then need in sequence to comprise 1,2 and 3, then thick judged result is true simultaneously; 2, the attribute information sequence of sentence comprises centre word and more than one attribute word, namely comprises 1 in sequence, and more than one 3, then thick judged result is true.Be genuine text for thick judged result, carry out following thin judgement further.
Owing to including the vocabulary of noncommodity information dictionary in a lot of Chinese trade name, such as " [bag postal] man cultivate one's moral character textile Korea Spro's version stand-up collar leisure long-sleeved blouse ", the business vocabulary in customer issue is belonged to due to " bag postal ", participle result after adding mark are: bag 0# postal 0# man 3# 3# textile Korea Spro 3# version 3# stand-up collar 3# that cultivates one's moral character lies fallow 3# long-sleeved blouse 1#, and this attribute information sequence is: 00333331.By carrying out statistical study to the attribute information sequence of 1,100,000 Chinese trade names, its average sequence length is 9.3, in sequence, the average accounting of the quantity of 0 is 0.26, according to these data, the scope of the length threshold N of setting attribute information sequence is [6,10], the scope of the quantity accounting threshold value P of 0 is [0.25,0.45], if attribute information sequence length be greater than N and the quantity accounting of 0 lower than P, then thin judged result is true.
Step S14: export and judged and the thin text judged by thick successively.Namely this step confirms to comprise Chinese trade name in current goal text.In this case, be convenient to analyze other parts in current goal text, the such as central issue of client's reference content, and then make further analyzing and processing by auto answer machine people.
Fig. 2 is the schematic diagram identifying the main modular of the device of Chinese trade name from text according to the embodiment of the present invention.As shown in Figure 2, from text, identify that the device 20 of Chinese trade name mainly comprises merchandise news word dictionary module 21, sentence attribute information block 22, judge module 23 and output module 24.
Merchandise news word dictionary module 21, for preserving merchandise news word dictionary.Sentence attribute information block 22, for determining the sentence attribute information sequence in text.Judge module 23, for carrying out thick judgement and thin judgement successively to described sentence attribute information sequence.Output module 24, for for meeting described thick judgement and the thin sentence attribute information sequence judged simultaneously, exports this sentence attribute information sequence.
From text, identify that the device 20 of Chinese trade name also can comprise pre-judge module, before the sentence attribute information sequence in sentence attribute information block determination text, judge whether the length of text exceedes setting value, and only trigger sentence attribute information block 22 when judged result is for being.This pre-judge module also can be used for: statistics specifies the average length of the trade name number of words of category commodity, or statistics comprises the average length of the text of specifying category trade name; Then statistics is multiplied by a predetermined coefficient and obtains above-mentioned setting value.
Sentence attribute information block 22 also can be used for: carry out participle to text, in participle process, word centered by the word obtained according to participle, brand, attribute or separate word, or this word do not belong to above-mentioned any one, be respectively this word and distribute first and be tagged to the 5th mark; Text after participle is traveled through from left to right, when traverse be assigned the 4th mark word using the sequence before this word as the sentence sequence of attributes in the text, otherwise last in the text is assigned to non-5th mark word and this word before sequence as the sentence sequence of attributes in the text.
According to the technical scheme of the embodiment of the present invention, analyze the word in text based on merchandise news word dictionary, contribute to the Chinese trade name accurately identified in text.Test is carried out to the technical scheme of the embodiment of the present invention as follows:
Test environment: association ThinkPad 430C, Windows 7 operating system, 4G internal memory, Intel Duo i5 processor, code Python is write.
Testing material: 5000 long sentence problems (sentence length is greater than 25) about clothes category of the actual consulting of certain website client, wherein, comprise the problem 3000 of Chinese trade name.
Merchandise news dictionary vocabulary: clothing brand identity vocabulary 11940, clothing attributive character vocabulary 2142, clothing centre word feature vocabulary 1070, separates word 6.
In program, important parameter is arranged: be judged to be that the length threshold of long sentence is 25, sentence attribute information sequence length threshold value is 6, and in attribute information sequence, 0 quantity accounting threshold value is 0.42.
Test result: identify and obtain the long sentence that 2862 comprise Chinese trade name, wherein, accuracy rate is 97.62%, namely 2794 identify correct; Recall rate is 93.13%, i.e. 206 be not called back (in the pre-judgement of step S12, the actual text containing Chinese trade name is considered to not comprise, then it is not called back).
Test result: the technical scheme of the embodiment of the present invention can well identify and extract the Chinese trade name in long sentence, may be used in actual ecommerce retail trade auto answer machine people, this is for improving problem identification rate, improving customer satisfaction and will be very helpful.
Below ultimate principle of the present invention is described in conjunction with specific embodiments, but, it is to be noted, for those of ordinary skill in the art, whole or any step or the parts of method and apparatus of the present invention can be understood, can in the network of any calculation element (comprising processor, storage medium etc.) or calculation element, realized with hardware, firmware, software or their combination, this is that those of ordinary skill in the art use their basic programming skill just can realize when having read explanation of the present invention.
Therefore, object of the present invention can also be realized by an operation program or batch processing on any calculation element.Described calculation element can be known fexible unit.Therefore, object of the present invention also can realize only by the program product of providing package containing the program code realizing described method or device.That is, such program product also forms the present invention, and the storage medium storing such program product also forms the present invention.Obviously, described storage medium can be any storage medium developed in any known storage medium or future.
Also it is pointed out that in apparatus and method of the present invention, obviously, each parts or each step can decompose and/or reconfigure.These decompose and/or reconfigure and should be considered as equivalents of the present invention.Further, the step performing above-mentioned series of processes can order naturally following the instructions perform in chronological order, but does not need necessarily to perform according to time sequencing.Some step can walk abreast or perform independently of one another.
Above-mentioned embodiment, does not form limiting the scope of the invention.It is to be understood that depend on designing requirement and other factors, various amendment, combination, sub-portfolio can be there is and substitute in those skilled in the art.Any amendment done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within scope.

Claims (12)

1. from text, identify the method for Chinese trade name, it is characterized in that, comprising:
Steps A: preserve merchandise news word dictionary, described merchandise news word comprises centre word, brand, attribute and separates word;
Step B: determine the sentence attribute information sequence in text, described sentence attribute information sequence is the sequence that in described text, from left to right first separates before word, or first sequence do not belonged to before the word of described merchandise news word dictionary from left to right in described text;
Step C: thick judgement and thin judgement are carried out successively to described sentence attribute information sequence; Wherein, the standard of described thick judgement is: described sentence attribute information sequence comprises centre word, brand and attribute, or comprises centre word and more than one attribute; The standard of described thin judgement is: sentence attribute information sequence is greater than the first preset value and the word proportion wherein not belonging to described merchandise news word dictionary is less than the second preset value;
Step D: for meeting described thick judgement and the thin sentence attribute information sequence judged simultaneously, confirm as Chinese trade name.
2. method according to claim 1, is characterized in that, before described step B, judges whether the length of described text exceedes setting value, and only carries out described step B when judged result is for being.
3. method according to claim 2, is characterized in that, the span of described setting value is [25,30].
4. method according to claim 2, is characterized in that, described setting value is determined as follows: statistics specifies the average length of the trade name number of words of category commodity, or statistics comprises the average length of the text of specifying category trade name; Then statistics is multiplied by a predetermined coefficient and obtains described setting value.
5. method according to claim 1, is characterized in that, described step B comprises:
Participle is carried out to described text, in participle process, the word obtained according to participle is described centre word, brand, attribute or separate word, or this word do not belong to above-mentioned any one, be respectively this word and distribute and first be tagged to the 5th mark;
Described text after participle is traveled through from left to right, when traverse be assigned the 4th mark word using the sequence before this word as the sentence sequence of attributes in described text, otherwise last in described text is assigned to non-5th mark word and this word before sequence as the sentence sequence of attributes in described text.
6. method according to claim 1, is characterized in that, the scope of the first preset value in the standard of described thin judgement is [6,10], and the scope of the second preset value is [0.25,0.45].
7. method according to any one of claim 1 to 6, is characterized in that, described text is the text of the problem of the on-line consulting of ecommerce client.
8. from text, identify the device of Chinese trade name, it is characterized in that, comprising:
Merchandise news word dictionary module, for preserving merchandise news word dictionary, described merchandise news word comprises centre word, brand, attribute and separates word;
Sentence attribute information block, for determining the sentence attribute information sequence in text, described sentence attribute information sequence is the sequence that in described text, from left to right first separates before word, or first sequence do not belonged to before the word of described merchandise news word dictionary from left to right in described text;
Judge module, for carrying out thick judgement and thin judgement successively to described sentence attribute information sequence; Wherein, the standard of described thick judgement is: described sentence attribute information sequence comprises centre word, brand and attribute, or comprises centre word and more than one attribute; The standard of described thin judgement is: sentence attribute information sequence is greater than the first preset value and the word proportion wherein not belonging to described merchandise news word dictionary is less than the second preset value;
Output module, for for meeting described thick judgement and the thin sentence attribute information sequence judged simultaneously, exports this sentence attribute information sequence.
9. device according to claim 8, it is characterized in that, also comprise pre-judge module, before the sentence attribute information sequence in described sentence attribute information block determination text, judge whether the length of described text exceedes setting value, and only trigger described sentence attribute information block when judged result is for being.
10. device according to claim 9, is characterized in that, described pre-judge module also for: statistics specifies the average length of the trade name number of words of category commodity, or statistics comprises the average length of the text of specifying category trade name; Then statistics is multiplied by a predetermined coefficient and obtains described setting value.
11. devices according to claim 8, is characterized in that, described sentence attribute information block also for:
Participle is carried out to described text, in participle process, the word obtained according to participle is described centre word, brand, attribute or separate word, or this word do not belong to above-mentioned any one, be respectively this word and distribute and first be tagged to the 5th mark;
Described text after participle is traveled through from left to right, when traverse be assigned the 4th mark word using the sequence before this word as the sentence sequence of attributes in described text, otherwise last in described text is assigned to non-5th mark word and this word before sequence as the sentence sequence of attributes in described text.
Device according to any one of 12. according to Claim 8 to 11, is characterized in that, described text is the text of the problem of the on-line consulting of ecommerce client.
CN201410586116.8A 2014-10-28 2014-10-28 The method and apparatus that Chinese trade name is recognized from text Active CN104331395B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410586116.8A CN104331395B (en) 2014-10-28 2014-10-28 The method and apparatus that Chinese trade name is recognized from text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410586116.8A CN104331395B (en) 2014-10-28 2014-10-28 The method and apparatus that Chinese trade name is recognized from text

Publications (2)

Publication Number Publication Date
CN104331395A true CN104331395A (en) 2015-02-04
CN104331395B CN104331395B (en) 2017-11-03

Family

ID=52406124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410586116.8A Active CN104331395B (en) 2014-10-28 2014-10-28 The method and apparatus that Chinese trade name is recognized from text

Country Status (1)

Country Link
CN (1) CN104331395B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045909A (en) * 2015-08-11 2015-11-11 北京京东尚科信息技术有限公司 Method and device for recognizing commodity name from text
CN108345620A (en) * 2017-01-24 2018-07-31 北京京东尚科信息技术有限公司 Brand message processing method, device, storage medium and electronic equipment
CN109190122A (en) * 2018-09-03 2019-01-11 上海腾道信息技术有限公司 A kind of recognition methods applied to trade designation in domain of international trade
CN111339253A (en) * 2020-02-25 2020-06-26 中国建设银行股份有限公司 Method and device for extracting article information
CN111651984A (en) * 2019-02-19 2020-09-11 北京京东尚科信息技术有限公司 Method and device for processing article description text and computer readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102033950A (en) * 2010-12-23 2011-04-27 哈尔滨工业大学 Construction method and identification method of automatic electronic product named entity identification system
CN103631948A (en) * 2013-12-11 2014-03-12 北京京东尚科信息技术有限公司 Identifying method of named entities

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102033950A (en) * 2010-12-23 2011-04-27 哈尔滨工业大学 Construction method and identification method of automatic electronic product named entity identification system
CN103631948A (en) * 2013-12-11 2014-03-12 北京京东尚科信息技术有限公司 Identifying method of named entities

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘非凡等: "面向商务信息抽取的产品命名实体识别研究", 《中文信息学报》 *
李治国等: "在篇章中面向产品类的命名实体识别研究", 《第三届学生计算语言学研讨会论文集》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045909A (en) * 2015-08-11 2015-11-11 北京京东尚科信息技术有限公司 Method and device for recognizing commodity name from text
CN105045909B (en) * 2015-08-11 2018-04-03 北京京东尚科信息技术有限公司 The method and apparatus that trade name is identified from text
CN108345620A (en) * 2017-01-24 2018-07-31 北京京东尚科信息技术有限公司 Brand message processing method, device, storage medium and electronic equipment
CN108345620B (en) * 2017-01-24 2021-05-25 北京京东尚科信息技术有限公司 Brand information processing method, brand information processing device, storage medium and electronic equipment
CN109190122A (en) * 2018-09-03 2019-01-11 上海腾道信息技术有限公司 A kind of recognition methods applied to trade designation in domain of international trade
CN111651984A (en) * 2019-02-19 2020-09-11 北京京东尚科信息技术有限公司 Method and device for processing article description text and computer readable storage medium
CN111339253A (en) * 2020-02-25 2020-06-26 中国建设银行股份有限公司 Method and device for extracting article information

Also Published As

Publication number Publication date
CN104331395B (en) 2017-11-03

Similar Documents

Publication Publication Date Title
CN112035742B (en) User portrait generation method, device, equipment and storage medium
CN107977798B (en) Risk assessment method for quality of electronic commerce product
CN104462333B (en) Shopping search is recommended and alarm method and system
US9892384B2 (en) Extracting product purchase information from electronic messages
US10963912B2 (en) Method and system for filtering goods review information
CN108346075B (en) Information recommendation method and device
CN108985347A (en) Training method, the method and device of shop classification of disaggregated model
CN104331395A (en) Method and device for identifying Chinese product name from text
CN110298245B (en) Interest collection method, interest collection device, computer equipment and storage medium
CN115002200B (en) Message pushing method, device, equipment and storage medium based on user portrait
CN106815192A (en) Model training method and device and sentence emotion identification method and device
US20160110763A1 (en) Extracting product purchase information from electronic messages
CN109241527B (en) Automatic generation method of false comment data set of Chinese commodity
CN107733967A (en) Processing method, device, computer equipment and the storage medium of pushed information
CN104834651A (en) Method and apparatus for providing answers to frequently asked questions
CN115391669A (en) Intelligent recommendation method and device and electronic equipment
CN107436916A (en) The method and device of intelligent prompt answer
CN108648005A (en) Data processing method and system
CN113420018A (en) User behavior data analysis method, device, equipment and storage medium
CN109615437A (en) Sale obtains objective method for tracking and managing
CN109145187A (en) Cross-platform electric business fraud detection method and system based on comment data
CN106997350A (en) A kind of method and device of data processing
CN107688594B (en) The identifying system and method for risk case based on social information
US20240232200A9 (en) Order searching method, apparatus, computer device, and storage medium
CN103886869B (en) A kind of information feedback method based on speech emotion recognition and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant