CN106802887A - Participle processing method and device, electronic equipment - Google Patents

Participle processing method and device, electronic equipment Download PDF

Info

Publication number
CN106802887A
CN106802887A CN201611263885.XA CN201611263885A CN106802887A CN 106802887 A CN106802887 A CN 106802887A CN 201611263885 A CN201611263885 A CN 201611263885A CN 106802887 A CN106802887 A CN 106802887A
Authority
CN
China
Prior art keywords
word
reflection
comment content
content
evaluating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611263885.XA
Other languages
Chinese (zh)
Inventor
焦增涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN201611263885.XA priority Critical patent/CN106802887A/en
Publication of CN106802887A publication Critical patent/CN106802887A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

A kind of participle processing method of disclosure and device, electronic equipment, the method include:In the text after having carried out word segmentation processing, it is determined that the word of reflection comment content;In the case of it is determined that meeting predetermined relationship between the word of the reflection comment content and the word of the neighbouring word for reflecting comment content, the word that content is commented in the reflection is merged with the word of the neighbouring word for reflecting comment content.Using the technical scheme of the application, bigger participle granularity can be reached, so as to effectively lift the analysis ability of the word to reflection comment content.

Description

Participle processing method and device, electronic equipment
Technical field
The present invention relates to natural language processing technique field, in particular to a kind of participle processing method and device, electricity Sub- equipment.
Background technology
In Chinese, word is minimum to be capable of independent activities, significant language element.And in Chinese due to word with Do not exist separator between word, word also lacks obvious morphological markers in itself, therefore when being analyzed to Chinese text, Chinese Participle is a basic fundamental, is the basis of follow-up other items analyses.And due to different participle granularities, its competency is not Equally, therefore for different Chinese texts analyze, participle granularity serves key effect to the accuracy analyzed.
At present, flourishing with ecommerce, the comment on commodity information of each electric business platform is more and more.To comment When being analyzed, need also exist for carrying out participle, and different participle granularities can be then influenceed to for example commenting on attribute word, comment word etc. Analysis ability.
Current word segmentation processing technology relatively relies on artificial, and not enough intelligence is with flexibly, and the degree of accuracy that some are automatically processed is relatively low, It is difficult to expected participle granularity.
The content of the invention
In view of this, the present invention provides a kind of participle processing method and device, electronic equipment, it is adaptable to comment information point Analysis, can reach bigger participle granularity, effectively word (such as base attribute word, comment word etc.) of the lifting to reflection comment content Analysis ability, possess intelligent and flexibility.
Other characteristics of the invention and advantage will be apparent from by following detailed description, or partially by the present invention Practice and acquistion.
According to an aspect of the present invention, there is provided a kind of participle processing method, including:
In the text after having carried out word segmentation processing, it is determined that the word of reflection comment content;
It is determined that meeting predetermined between the word of the reflection comment content and the word of the neighbouring word for reflecting comment content In the case of relation, the word that content is commented in the reflection is merged instead with the word of the word of the neighbouring reflection comment content Reflect comment content reflection comment content reflection comment content reflection comment content.
In addition, the present invention also provides a kind of word segmentation processing device, it includes:
Word determining module, the word for determining reflection comment content in the text after having carried out word segmentation processing;
Merging module, between the word of the word of the reflection comment content and the word of the neighbouring reflection comment content In the case of meeting predetermined relationship, the reflection is commented on the word of content and the word of the word of the neighbouring reflection comment content Merge reflection comment content reflection comment content reflection comment content reflection comment content.
Additionally, the present invention also provides a kind of electronic equipment, including:
Processor;And
Memory, is stored thereon with the computer program that can be run on the processor;
The step of computer program is to realize method as described above described in the computing device.
The present invention also provides a kind of computer-readable recording medium, and be stored with computer program, the computer program quilt The step of method as described above being realized during computing device.
According to the participle processing method and device and electronic equipment of embodiment of the present invention, reflection comment can be automatically determined The word of content, and on this basis by the verification of predetermined relationship, judge whether to merge the word with neighbouring word automatically, make Must merge treatment after text can reach bigger participle granularity, possess intelligent and flexibility, can reach compared with The degree of accuracy high.
It should be appreciated that the general description of the above and detailed description hereinafter are only exemplary, this can not be limited Invention.
Brief description of the drawings
Its example embodiment is described in detail by referring to accompanying drawing, above and other target of the invention, feature and advantage will Become more fully apparent.
Fig. 1 is a kind of flow chart of the participle processing method according to an illustrative embodiments.
Fig. 2 is a kind of flow chart of the participle processing method according to an illustrative embodiments.
Fig. 3 is a kind of flow chart of the participle processing method according to an illustrative embodiments.
Fig. 4 is a kind of principle schematic of the participle processing method according to an illustrative embodiments.
Fig. 5 A are a kind of flow charts of the participle processing method according to an illustrative embodiments.
Fig. 5 B are a kind of flow charts of the participle processing method according to an illustrative embodiments.
Fig. 6 A are a kind of flow charts of the participle processing method according to an illustrative embodiments.
Fig. 6 B are a kind of flow charts of the participle processing method according to an illustrative embodiments.
Fig. 6 C are a kind of principle schematics of the participle processing method according to an illustrative embodiments.
Fig. 7 A are a kind of flow charts of the participle processing method according to an illustrative embodiments.
Fig. 7 B are a kind of flow charts of the participle processing method according to an illustrative embodiments.
Fig. 7 C and Fig. 7 D are a kind of principle schematics of the participle processing method according to an illustrative embodiments.
Fig. 8 A are a kind of flow charts of the participle processing method according to an illustrative embodiments.
Fig. 8 B are a kind of flow charts of the participle processing method according to an illustrative embodiments.
Fig. 9 is a kind of block diagram of the word segmentation processing device according to an illustrative embodiments.
Figure 10 is a kind of block diagram of the word segmentation processing device according to an illustrative embodiments.
Figure 11 is a kind of block diagram of the word segmentation processing device according to an illustrative embodiments.
Figure 12 A are a kind of block diagrams of the word segmentation processing device according to an illustrative embodiments.
Figure 12 B are a kind of block diagrams of the word segmentation processing device according to an illustrative embodiments.
Figure 13 is the block diagram of a kind of electronic equipment according to an illustrative embodiments.
Specific embodiment
Example embodiment is described more fully with referring now to accompanying drawing.However, example embodiment can be with various shapes Formula is implemented, and is not understood as limited to example set forth herein;Conversely, thesing embodiments are provided so that the present invention will more Fully and completely, and by the design of example embodiment those skilled in the art is comprehensively conveyed to.Accompanying drawing is only the present invention Schematic illustrations, be not necessarily drawn to scale.Identical reference represents same or similar part in figure, thus Repetition thereof will be omitted.
Additionally, described feature, structure or characteristic can be combined in one or more implementations in any suitable manner In mode.In the following description, there is provided many details fully understand so as to be given to embodiments of the present invention.So And, it will be appreciated by persons skilled in the art that technical scheme can be put into practice and omit in the specific detail Or more, or can be using other methods, constituent element, device, step etc..In other cases, it is not shown in detail or describes Known features, method, device, realization or operation are to avoid that a presumptuous guest usurps the role of the host so that each aspect of the present invention thickens.
Fig. 1 is a kind of flow chart of the participle processing method according to embodiment of the present invention.In embodiment party of the present invention In formula, participle processing method may include:
Step S1:In the text after having carried out word segmentation processing, it is determined that the word of reflection comment content.
In embodiments of the present invention, first extracting can reflect the word of comment content.The word of reflection comment content refers generally to The word of most crucial content is expressed in the sentence of comment.By taking the user comment content for taking out platform as an example, the comment of certain user is for " outward The dispatching speed sold is credible ", wherein the word of reflection comment content can be " speed ", " trust ", because " speed " is to comment The main body of opinion, " trust " is the core views of user.And " take-away ", " dispatching " are all modifications " speed ", " worth " is to make It is that modal verb and " dependence " constitute phrase, can not all reflects the core content of comment content.For a take-away platform, its User comment can be presented certain statistical regularity, and for example the comment such as " speed ", " environment ", " attitude ", " service " main body occurs Frequency it is higher, therefore reflection comment content word can be a predetermined collection, for after certain has carried out word segmentation processing Text in determine reflection comment content word.Certainly, it is determined that the method for the word of reflection comment content is not limited thereto.
Step S3:It is determined that reflection comment content word and neighbouring reflection comment content word word between meet predetermined In the case of relation, will reflect that the word of comment content merges with the word of the word of neighbouring reflection comment content.
In embodiments of the present invention, extract can reflect comment content word after, if detect the word with it is neighbouring The relation of word meets predetermined relationship, for example, meet grammatical relation, part of speech Matching Relation etc., then can merge treatment, i.e. phase Compared with the text that bigger participle granularity is formed before merging.In this way, can elevator machine or system intelligent processing method level and place The flexibility ratio of reason, and the degree of accuracy that lifting merges.
" participle granularity " is a computational language technics, i.e., number of one Chinese word comprising Chinese character, such as " speed " Participle granularity be 2, the participle granularity of " dispatching speed " is 4.It is appreciated that based on certain word, with merging and participle grain The increase of degree, the implication of its expression is also more definite, helps that comment content is further analyzed and processed.
Fig. 2 is the flow chart of another participle processing method according to embodiment of the present invention.The method and Fig. 1 institutes The method shown is compared to also including judging step S2.It is specific as follows:
In step sl, it is determined that the word of reflection comment content.
In step s 2, it is determined that between the word of the word of the word of reflection comment content and the neighbouring reflection comment content whether Meet predetermined relationship.
If met between the word for determining the word of reflection comment content and the word of neighbouring reflection comment content in step s 2 Predetermined relationship, then perform step S3.If determining the word and neighbouring reflection comment content of reflection comment content in step s 2 Predetermined relationship is unsatisfactory between the word of word, then the word of the word of the neighbouring reflection comment content of the word of explanation reflection comment content is uncomfortable Merge, thus operation can not be merged.
In step s3, will reflect that the word of comment content merges with the word of the word of neighbouring reflection comment content.
Fig. 3 and Fig. 4 is referred to, in some embodiments, word segmentation processing first can be carried out to text before step S1, The step of word segmentation processing, may include:
Step S5, cutting word is carried out to text.Can be smaller compared to granularity before cutting word by text segmentation by cutting word Multiple small grain size words.
Step S6, part-of-speech tagging and interdependent syntax mark are carried out to the word in the text after cutting word.For example, can be to entering The multiple small grain size words obtained after the treatment of row cutting word carry out part-of-speech tagging and interdependent syntax mark.
Fig. 4 is referred to, text can be split first, that is, the small grain size word of most basic unit is formed, then to it Carry out part-of-speech tagging and interdependent syntax mark.Part-of-speech tagging and interdependent syntax mark are the bases of subsequent treatment.Such as step S2 In verification, it may include verification whether meet predetermined syntax dependence and/or whether meet predetermined part of speech pattern, the step is just It is based on part-of-speech tagging and interdependent syntax mark and premise.For example, a comment " the dispatching speed order in for text People gasp in admirations ", " dispatching " constitutes modified relationship (ATT) with " speed ", and " dispatching " is verb (v), and " speed " is noun (n), these All it is labeled.In addition to modified relationship, also can be by simultaneous language (DBL) relation, dynamic guest's relation (VOB) etc., part-of-speech tagging Other parts of speech in addition to noun, verb are may include, it is numerous to list herein.Marked on part-of-speech tagging and interdependent syntax, ability Field technique personnel can realize that it is not specifically limited to this for the application according to related natural language processing technique.
In some embodiments, the word of the word of neighbouring reflection comment content may include and reflection comment content in step S2 The adjacent word of word.
" adjacent " can be a kind of basic condition of " neighbouring ", be also a kind of common situation.With user comment " take-away Dispatching speed is credible " as a example by, the word adjacent with " speed " may include " dispatching ", " worth ", i.e., preceding adjacent with rear adjacent two The situation of kind.Wherein it is possible to default " adjacent " is preceding adjacent or rear adjacent, be for example preset as it is preceding adjacent, will " speed " conduct Suffix, mutually arranges in pairs or groups with " dispatching ".Below based on the situation of description " adjacent ", but " neighbouring " of the invention is not limited to A kind of " adjacent " this situation.In other instances, some void be there may be between the word word adjacent thereto of reflection comment content Word, for example, in sentence " speed of the dispatching of take-away is credible ", exist between " dispatching " and " speed " " ", " " it is one Kind of auxiliary word, belongs to one kind of function word, in such a case, it is possible to first detect and reject it is similar " ", " it ", " ", " " etc. Function word, then carry out subsequent step.
Refering to Fig. 5 A, in some embodiments, predetermined relationship is met in step S2 may include to meet predetermined syntax interdependent Relation.Specifically, step S2 may include:
S201, it is determined that whether meeting predetermined sentence between the word of the reflection comment content word adjacent with the word of reflection comment content Method dependence.If meeting, can be determined that and meet predetermined relationship, continue executing with step S3.
That is, judge whether the word of reflection comment content word adjacent thereto meets predetermined syntax dependence, if Meet, then can perform merging treatment.
Interdependent syntax is proposed at first by French linguist L.Tesniere.It is by the analysis of sentence into an interdependent syntax Tree, is depicted the dependence between each word.Namely indicate between word in syntactical Matching Relation, it is this to take It is associated with semanteme with relation.
Using the analysis to predetermined syntax dependence, the word with predetermined Matching Relation can be merged, this knowledge Not with merge more flexibly and intelligent.For example, it is determined that the word " speed " of content is commented in reflection, but can be with speed collocation Word is a lot, and as dispatching speed, food delivery speed, manufacturing speed, service rate etc., user can use according to the speech habits of oneself The different words arranged in pairs or groups with speed, therefore, the word arranged in pairs or groups with " speed " is an open collection.However, the collocation with " speed " With certain rule, for example, all meeting modified relationship (ATT).Therefore, the thinking of present embodiment is by holding the rule Suitably arranged in pairs or groups to recognize or screen, as long as meeting predetermined syntax dependence, arranged in pairs or groups but regardless of with " speed " Be specifically what word.In this way, making the present processes that there is preferable intelligent level and flexibility ratio, good conjunction is can obtain And result.
Further, Fig. 5 B are referred to, predetermined relationship is met in some embodiments, in step S2 and be may also include and meet Predetermined part of speech pattern.That is step S2 may also include:
S202, it is determined that whether meeting predetermined word between the word of the reflection comment content word adjacent with the word of reflection comment content Sexual norm.
With reference to S201, if meeting pre- simultaneously between the word of the reflection comment content word adjacent with the word of reflection comment content Determine syntax dependence and predetermined part of speech pattern, that is, be judged to meet predetermined relationship.
If the judged result of step S201 is no, the word of reflection comment content and the word of reflection comment content can be determined Predetermined relationship is unsatisfactory between adjacent word, therefore treatment can not be merged.
Furthermore, if the judged result of step S202 is no, word and the reflection of reflection comment content can be determined Predetermined relationship is unsatisfactory between the adjacent word of word for commenting on content, therefore treatment can not be merged.
On the basis of being identified, screening using syntax dependence, can further be known word sexual norm simultaneously Other and screening, i.e. must simultaneously meet predetermined syntax dependence and predetermined part of speech pattern could be by verification.In this way, predetermined Part of speech pattern can further lift the degree of accuracy of identification and screening equivalent to a further verification means.
In figure 5b, the judgement of step S202 is proceeded after the judgement for having carried out step S201, being that one kind is dual sentences Disconnected mode, it is possible to increase the degree of accuracy of granularity customization.In one embodiment, it is also possible to only carry out the judgement of step S202, and The judgement of step S201 is not carried out.That is, whether the judgement for meeting predetermined relationship in the application may include whether to meet The judgement of predetermined syntax dependence and/or whether meet the judgement of predetermined part of speech pattern.
The verification of verification and part of speech pattern on syntax dependence, below has more detailed exemplary illustration.
In some embodiments, determine that the word of reflection comment content may include to determine reflection comment content in step S1 Base attribute word.Base attribute word, can refer to the object of comment, for example, for text " the dispatching speed of take-away is credible " For, " speed " is exactly base attribute word, and for text " hall's environment is very clean ", " environment " is exactly base attribute word.With It is lower to carry out the detailed description on the checking of syntax dependence for base attribute word.
Fig. 6 A are referred to, in this embodiment, meeting predetermined relationship may include to meet predetermined syntax dependence;Basic category The property corresponding predetermined syntax dependence of word may include:Base attribute word with before the base attribute word and adjacent word has Modified relationship.
That is step S201 can be realized:
Step S201a, determine base attribute word with before the base attribute word and whether adjacent word is met with repairing Decorations relation.If meeting, it is judged to meet in step s 2.
For example, referring to Fig. 6 C, text is " the dispatching speed that AA takes out really is worth affirmative ", and wherein AA can be outside certain Sell brand name.Word segmentation processing is carried out first, and determines the base attribute word " speed " of reflection comment content.Then, it is determined that " speed Whether degree " meets modified relationship (ATT) with preceding adjacent word, and without this preceding adjacent word specifically what word managed.For example, " dispatching speed ", " food delivery speed ", " take-away speed ", " jockey's speed " etc., are satisfied by the condition of the modified relationship (in Fig. 6 C Example is " dispatching speed "), i.e., by verification, can merge.Merging the phrase that is formed or phrase can express containing more precisely Justice, so as to facilitate subsequent treatment.
Fig. 6 B are referred to, in this embodiment, the corresponding part of speech pattern of base attribute word may include:Positioned at base attribute word Before and adjacent word, constitute verb with base attribute word and add modification noun pattern or noun plus modification noun pattern.
That is, S202 can be realized:
Step S202a, determine base attribute word with before the base attribute word and whether adjacent word has predetermined word Sexual norm, wherein predetermined part of speech pattern may include verb plus noun pattern or noun plus noun pattern.If step S201a and step The judged result of S202a is satisfaction, then can determine the word of reflection comment content and the word of the neighbouring reflection comment content Word between meet predetermined relationship.
Refering to Fig. 6 C, that is to say, that can determine in the case of meeting modified relationship and predetermined part of speech pattern at the same time anti- Meet predetermined relationship between the word for reflecting the word of comment content and the word of the neighbouring reflection comment content.It is appreciated that verb adds Noun pattern refers to as the base attribute word of suffix as the word before noun, and base attribute word is verb.For example, " dispatching ", " food delivery " is verb, and " dispatching speed ", " food delivery speed " are to meet the verb plus noun pattern.And noun adds noun pattern i.e. As the base attribute word of suffix for the word before noun, and base attribute word is noun.For example, " take-away ", " jockey " are run after fame Word, " take-away speed ", " jockey's speed " are to meet the noun plus noun pattern.Above-mentioned two pattern meets one, then together When meet the condition of modified relationship, i.e., by verification, the merging of step S3 can be carried out.Fig. 6 C examples are " dispatching speed ", are met Verb adds noun pattern, therefore by verification.Certainly, the verb in present embodiment adds noun pattern or noun plus noun pattern Only it is exemplary, the predetermined part of speech pattern mentioned in the present invention is not limited to above two part of speech checking mode.
When the part of speech of certain word is judged, can using compared with predetermined dictionary to method.For example, default verb dictionary, should Typing has common verb in evaluation content in verb dictionary, for example, " dispense ", " food delivery ", " service " etc., by word to be judged It is right compared with the word of the verb dictionary, if word to be judged belongs to the verb dictionary, you can judge that the word to be judged is verb.
Additionally, predeterminable part of speech more specifically dictionary, such as modal verb dictionary, there be typing in the modal verb dictionary Common modal verb in evaluation content, for example, " make us ", " needing " etc., by word to be judged and the word of the modal verb dictionary Compared to right, if word to be judged belongs to the modal verb dictionary, you can judge that the word to be judged is modal verb.In this way, can The part of speech for treating the word of judgement carries out more careful judgement.Due to modal verb common in evaluation content be it is limited, because This is used compared with predetermined dictionary to being a kind of convenience, the method for suitable judgement part of speech.
Certainly, to judging that it is dynamic in above-mentioned example that the method for part of speech is not restrictively applied to compared with predetermined dictionary Word, modal verb, apply also for noun, simultaneous language noun, order verb etc..
It is available more accurately or more to meet expected amalgamation result by modified relationship and the twin check of part of speech pattern.
The word of reflection comment content in some embodiments, can also be reflection user in addition to base attribute word The evaluating word of viewpoint.Evaluating word, that is, embody the word of user's taste viewpoint, for example, through the text " dispatching that AA takes out of word segmentation processing Speed is really worth affirmative " in, it may be determined that " affirmative " is evaluating word.Carried out on syntax dependence below for evaluating word The detailed description of checking.
Fig. 7 A are referred to, in this embodiment, it is determined that the word of reflection comment content includes:It is determined that reflecting commenting for User Perspective Valency word.The corresponding predetermined syntax dependence of evaluating word may include:Evaluating word with before the evaluating word and adjacent word has V-O construction relation or simultaneous language add guest's relation.
That is step S201 can be realized:
Step S201b, determine base attribute word with before the base attribute word and whether adjacent word is met with dynamic Guest's structural relation or simultaneous language add guest's relation.If meeting, the word and the neighbouring reflection that can determine reflection comment content are commented Meet predetermined relationship between word by the word of content.
For example, referring to Fig. 7 C and Fig. 7 D, text is " the dispatching speed that AA takes out really is worth affirmative ", is carried out first Word segmentation processing, and determine the evaluating word " affirmative " of reflection comment content.Then, it is determined that whether " affirmative " meets with preceding adjacent word V-O construction (VOB) relation or and language (DBL) plus dynamic guest (VOB) relation.For example, " being worth affirmative " meets V-O construction relation, I.e. by verification.And for example, Fig. 7 D are referred to, " making us gasp in admiratioing " meets and language adds guest's relation, i.e., by verification.By after verification The merging of step S3 can be carried out.Phrase or phrase that merging is formed can express implication more precisely, so as to facilitate subsequent treatment.
Further, predetermined relationship is met in step S2 may also include and meet predetermined part of speech pattern.In this embodiment, walk Rapid S202 may include step S202b and step S202c.
In step S202b, if evaluating word judges with before the evaluating word and adjacent word has V-O construction relation Whether part of speech pattern was met before evaluating word and adjacent word constitutes modal verb plus verb pattern with evaluating word, if so, Judgement meets predetermined relationship.
In step S202c, if evaluating word is sentenced with before the evaluating word and adjacent word has and language adds guest's relation Whether hyphenation sexual norm was met before evaluating word and adjacent word constitutes order verb and adds simultaneous language noun to add with evaluating word Word pattern, if so, judgement meets predetermined relationship.
Also referring to Fig. 7 C, that is to say, that the different syntax dependences corresponding part of speech mould of correspondence in step S201b Formula.For example, " being worth affirmative " meets V-O construction relation, meanwhile, " worth " is modal verb (v), and " affirmative " is verb (v), Therefore " it is worth affirming " that also constituting modal verb adds verb pattern, in this way, two conditions can meet, therefore, judge " to be worth agreeing It is fixed " meet predetermined relationship, i.e., by verification.And for example, Fig. 7 D are referred to, " making us gasp in admiratioing " meets and language adds guest's relation, meanwhile, " order " is order verb (v), and " people " is and language noun (n), and " gasp in admiratioing " is verb (v), therefore, " making us gasp in admiratioing " also meets order Verb adds and language noun plus verb pattern, and two conditions are satisfied by, therefore, judge that " making us gasp in admiratioing " meets predetermined relationship, that is, lead to Cross verification.Certainly, the modal verb in present embodiment adds verb pattern or order verb to add simultaneous language noun plus verb pattern only It is exemplary, the predetermined part of speech pattern for evaluating word mentioned in the present invention is not limited to above two part of speech calibration mode Formula.
Wherein, distinguish verb with the method for modal verb can refer to it is above-mentioned compared with predetermined dictionary to method.For example it is pre- If modal verb dictionary, the common modal verb of typing, for judging whether word to be judged is modal verb.No longer go to live in the household of one's in-laws on getting married herein State.
Additionally, the word of reflection comment content of the invention is not limited to base attribute word or evaluating word, base attribute word Predetermined relationship be also not limited to syntax dependence or predetermined part of speech pattern.
Fig. 8 A are referred to, the determination of the word of comment content is reflected in step S1, can be in the following manner.In some embodiment party The word of reflection comment content is determined in formula, in step S1 be may include:
Step S103, sets up and evaluates dictionary;
Step S104, by the word in text compared with the word in evaluating dictionary to determining whether the word in text is reflection Comment on the word of content.
For a certain take-away platform, the content of user's concern and the comment object of user can be presented certain statistical law, For example, " speed ", " environment ", " service " etc. belong to common and have the base attribute word of reference value, and " trust ", " praise Sigh ", " affirmative ", " good " etc. belong to common and have the evaluating word of reference value.Therefore, evaluation dictionary can be set up in advance by these Easily there is and has the word typing of reference value, when analyzing a certain specific text, by the word in the word in text and evaluation dictionary Compared to determine the word in text whether be reflection comment content word.
In this way, can be analyzed in conjunction with syntax dependence or part of speech etc., the final word for determining reflection comment content, because, If only meeting and belonging to dictionary, the comment object or evaluation content for belonging to and evaluating in sentence might not be met, it is also possible to be Some irrelevant contents that user arbitrarily delivers.Therefore, whether dictionary and the analysis to sentence can be combined, certain word is comprehensively judged It is the word of reflection comment content.
The foundation of dictionary refers to following two modes, but the invention is not limited in following two modes.
First way is that artificial foundation, i.e. basis are manually entered foundation and evaluate dictionary.For example manually to passing user Comment carries out statistical analysis, selects word that is common and having reference value and includes evaluation dictionary.
Another way is that system is set up and automatic perfect automatically.Refer to Fig. 8 B, in some embodiments, step Evaluation dictionary is set up in S103 be may include:
Step S1031, the occurrence number or frequency of each word in the multiple texts of statistics;
Step S1032, evaluation dictionary is included when times or frequency is more than predetermined value by the word.
System can count over the frequency of each word occurred in comment text, and screening occurrence number or frequency word higher are made It is objective appraisal attribute word or objective appraisal word and the corresponding dictionary of typing.For example, system detectio to part of speech for noun word in, " speed " occur frequency be higher than preset frequency, will " speed " typing objective appraisal attribute word dictionary.
Above two mode also can R. concomitans, such as it is first artificial to set up then automatic perfect.Both can be artificial after and for example setting up Improving also can be automatically perfect, for example, based on system automatic identification typing, but can manually modify, for example deletion system is missed Identification, and/or increase the unidentified word to reflection comment content of system.
It will be clearly understood that the present disclosure describe how being formed and use particular example, but principle of the invention is not limited to Any details of these examples.Conversely, the teaching based on present disclosure, these principles can be applied to many other Implementation method.
It will be appreciated by those skilled in the art that realizing that all or part of step of above-mentioned implementation method is implemented as being held by CPU Capable computer program.When the computer program is performed by CPU, it is above-mentioned that the above method that the execution present invention is provided is limited The program of function can be stored in a kind of computer-readable recording medium, and the storage medium can be read-only storage, disk Or CD etc..
Further, it should be noted that above-mentioned accompanying drawing is may include according to the method for exemplary embodiment of the invention Treatment schematically illustrate, rather than limitation purpose.It can be readily appreciated that above-mentioned treatment shown in the drawings is not intended that or limits this The time sequencing of a little treatment.In addition, being also easy to understand, these treatment can for example either synchronously or asynchronously be performed in multiple modules 's.
Following is apparatus of the present invention embodiment, can be used for performing the inventive method embodiment.For apparatus of the present invention reality The details not disclosed in example is applied, the inventive method embodiment is refer to.
Fig. 9 is referred to, the present invention provides a kind of word segmentation processing device 100, and it may include:
Word determining module 110, the word for determining reflection comment content in the text after having carried out word segmentation processing.
Merging module 120, for it is determined that between the word of reflection comment content and the word of the neighbouring word for reflecting comment content In the case of meeting predetermined relationship, will reflect that the word of comment content merges with the word of the word of the neighbouring reflection comment content.
Figure 10 is referred to, the device can be with removing module 130.The removing module 130 can again reflect comment content Function word is deleted in the case of there is function word between the word of the word of word and the neighbouring reflection comment content.
Figure 11 is referred to, in some embodiments, word segmentation processing device 100 may also include:
Segmentation module 150, for carrying out cutting word to text.
Labeling module 160, for carrying out part-of-speech tagging and interdependent syntax mark to the word in the text after cutting word.
Wherein, segmentation module 150 can be used to realize step S5 that labeling module 160 can be used to realize step S6.
In some embodiments, the word of the word of neighbouring reflection comment content may include adjacent with the word of reflection comment content Word.
Further, in some embodiments, meeting predetermined relationship includes meeting predetermined syntax dependence and/or pre- Determine part of speech pattern.
In one embodiment, it is determined that the word of reflection comment content may include:It is determined that the base attribute of reflection comment content Word, the corresponding predetermined syntax dependence of base attribute word includes:Base attribute word be located at base attribute word before and Adjacent word has modified relationship.The corresponding predetermined part of speech pattern of base attribute word may include:It is described positioned at the base attribute Before word and adjacent word and the base attribute word constitute verb and add noun pattern or noun plus noun pattern.
In one embodiment, the word for determining reflection comment content may include:It is determined that the evaluation of reflection User Perspective Word.The corresponding predetermined syntax dependence of evaluating word may include:Evaluating word with before the evaluating word and adjacent word has There are V-O construction relation or simultaneous language to add guest's relation.The corresponding part of speech pattern of evaluating word may include:Before evaluating word and phase Adjacent word constitutes modal verb and adds verb pattern with evaluating word;Or before the evaluating word and adjacent word and evaluating word Order verb is constituted to add and language noun plus verb pattern.
Figure 12 A are referred to, in some embodiments, word determining module 110 may include:
Dictionary sets up unit 111, and dictionary is evaluated for setting up;
Comparing unit 113, for by the word in text with evaluate dictionary in word compared with to being with the word determined in text No is the word of reflection comment content.
Further, in some embodiments, dictionary sets up unit 111 for according to being manually entered and set up evaluating word Storehouse.
Figure 12 B are referred to, in other implementation methods, dictionary is set up unit 111 and be may include:
Statistics subelement 1111, occurrence number or frequency for counting each word in multiple texts;And
Storage subelement 1113, for the word to be included into evaluation dictionary when times or frequency is more than predetermined value.
Figure 13 is referred to, the application provides a kind of electronic equipment 1300, and the electronic equipment can include the He of memory 1301 Processor 1302.Be stored with the computer program that can be run on processor 1302 on memory 1301.Processor 1302 is performed Computer program can realize method described herein.
Memory 1301 can be various by any kind of volatibility or non-volatile memory device or their group Close and realize, such as static RAM (SRAM), Electrically Erasable Read Only Memory (EEPROM) is erasable to compile Journey read-only storage (EPROM), programmable read only memory (PROM), read-only storage (ROM), magnetic memory, flash Device, disk or CD.
The electronic equipment 1300 can possess the various equipment calculated with disposal ability, except memory 1301 and treatment Outside device 1302, (can also for example be raised one's voice including various input equipments (such as user interface, keyboard etc.), various output equipments Device etc.) and display device, repeat no more herein.
The application also provides a kind of computer-readable recording medium, and be stored with computer program, and computer program is processed Device 1302 realizes method described herein when performing.
It should be noted that the block diagram shown in above-mentioned accompanying drawing is functional entity, not necessarily must with physically or logically Independent entity is corresponding.Can realize these functional entitys using software form, or in one or more hardware modules or These functional entitys are realized in integrated circuit, or is realized in heterogeneous networks and/or processor device and/or microcontroller device These functional entitys.
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can be realized by software, it is also possible to be realized by way of software is with reference to necessary hardware.Therefore, according to the present invention The technical scheme of implementation method can be embodied in the form of software product, and the software product can store non-volatile at one Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are causing a calculating Equipment (can be personal computer, server, mobile terminal or network equipment etc.) is performed according to embodiment of the present invention Method.
More than it is particularly shown and described illustrative embodiments of the invention.It should be appreciated that the present invention is not limited In detailed construction described herein, set-up mode or implementation method;On the contrary, it is intended to cover be included in appended claims Spirit and scope in various modifications and equivalence setting.

Claims (13)

1. a kind of participle processing method, it is characterised in that including:
In the text after having carried out word segmentation processing, it is determined that the word of reflection comment content;
It is determined that meeting predetermined relationship between the word of the reflection comment content and the word of the neighbouring word for reflecting comment content In the case of, the word that content is commented in the reflection is merged with the word of the word of the neighbouring reflection comment content.
2. method according to claim 1, it is characterised in that methods described also includes:
After it is determined that the word of content is commented in the reflection, if the word of the reflection comment content and the neighbouring reflection There is function word between the word of the word for commenting on content, then delete the function word.
3. method according to claim 1 and 2, it is characterised in that described to meet predetermined relationship and include meeting predetermined syntax Dependence and/or predetermined part of speech pattern.
4. method according to claim 3, it is characterised in that the word of the determination reflection comment content includes:It is determined that anti- Reflect the base attribute word of comment content.
5. method according to claim 4, it is characterised in that the corresponding predetermined syntax of the base attribute word is interdependent Relation includes:
The base attribute word with before the base attribute word and adjacent word has modified relationship.
6. method according to claim 4, it is characterised in that the corresponding predetermined part of speech pattern of the base attribute word Including:
It is described before the base attribute word and adjacent word and the base attribute word constitute verb add noun pattern or Noun adds noun pattern.
7. method according to claim 3, it is characterised in that the word of the determination reflection comment content includes:It is determined that anti- Reflect the evaluating word of User Perspective.
8. method according to claim 7, it is characterised in that the corresponding predetermined syntax dependence of the evaluating word Including:
The evaluating word with before the evaluating word and adjacent word have V-O construction relation or and language add guest's relation.
9. method according to claim 7, it is characterised in that the corresponding part of speech pattern of the evaluating word includes:Institute's rheme Before the evaluating word and adjacent word and the evaluating word constitute modal verb and add verb pattern;Or
It is described before the evaluating word and adjacent word and the evaluating word constitute order verb and add and language noun plus verb Pattern.
10. method according to claim 1, it is characterised in that the word segmentation processing that carries out includes:
Cutting word is carried out to the text;
Part-of-speech tagging and interdependent syntax mark are carried out to the word in the text after cutting word.
A kind of 11. word segmentation processing devices, it is characterised in that including:
Word determining module, the word for determining reflection comment content in the text after having carried out word segmentation processing;
Merging module, for meeting between the word of the word of the reflection comment content and the word of the neighbouring reflection comment content In the case of predetermined relationship, the word of content is commented on into the reflection and the word of the word of the neighbouring reflection comment content is closed And.
12. a kind of electronic equipment, including:
Processor;And
Memory, is stored thereon with the computer program that can be run on the processor;
Characterized in that, computer program described in the computing device is realizing the side as described in claim any one of 1-10 The step of method.
A kind of 13. computer-readable recording mediums, be stored with computer program, it is characterised in that the computer program is located The step of reason device realizes claim 1-10 any one methods describeds when performing.
CN201611263885.XA 2016-12-30 2016-12-30 Participle processing method and device, electronic equipment Pending CN106802887A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611263885.XA CN106802887A (en) 2016-12-30 2016-12-30 Participle processing method and device, electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611263885.XA CN106802887A (en) 2016-12-30 2016-12-30 Participle processing method and device, electronic equipment

Publications (1)

Publication Number Publication Date
CN106802887A true CN106802887A (en) 2017-06-06

Family

ID=58985332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611263885.XA Pending CN106802887A (en) 2016-12-30 2016-12-30 Participle processing method and device, electronic equipment

Country Status (1)

Country Link
CN (1) CN106802887A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549631A (en) * 2018-03-30 2018-09-18 北京智慧正安科技有限公司 Noun dictionary extracting method, electronic device and computer readable storage medium
CN109582948A (en) * 2017-09-29 2019-04-05 北京国双科技有限公司 The method and device that evaluated views extract

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955451A (en) * 2014-05-15 2014-07-30 北京优捷信达信息科技有限公司 Method for judging emotional tendentiousness of short text
CN105224640A (en) * 2015-09-25 2016-01-06 杭州朗和科技有限公司 A kind of method and apparatus extracting viewpoint
CN105573980A (en) * 2015-12-10 2016-05-11 百度在线网络技术(北京)有限公司 Information segment generation method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955451A (en) * 2014-05-15 2014-07-30 北京优捷信达信息科技有限公司 Method for judging emotional tendentiousness of short text
CN105224640A (en) * 2015-09-25 2016-01-06 杭州朗和科技有限公司 A kind of method and apparatus extracting viewpoint
CN105573980A (en) * 2015-12-10 2016-05-11 百度在线网络技术(北京)有限公司 Information segment generation method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582948A (en) * 2017-09-29 2019-04-05 北京国双科技有限公司 The method and device that evaluated views extract
CN109582948B (en) * 2017-09-29 2022-11-22 北京国双科技有限公司 Method and device for extracting evaluation viewpoints
CN108549631A (en) * 2018-03-30 2018-09-18 北京智慧正安科技有限公司 Noun dictionary extracting method, electronic device and computer readable storage medium

Similar Documents

Publication Publication Date Title
Gu et al. " what parts of your apps are loved by users?"(T)
Peersman et al. Predicting age and gender in online social networks
US20180341871A1 (en) Utilizing deep learning with an information retrieval mechanism to provide question answering in restricted domains
US8239189B2 (en) Method and system for estimating a sentiment for an entity
CN109145216A (en) Network public-opinion monitoring method, device and storage medium
CN109829166B (en) People and host customer opinion mining method based on character-level convolutional neural network
US20160239500A1 (en) System and methods for extracting facts from unstructured text
Diamantini et al. A negation handling technique for sentiment analysis
CN111125354A (en) Text classification method and device
CN110096573B (en) Text parsing method and device
CN106570180A (en) Artificial intelligence based voice searching method and device
US10546088B2 (en) Document implementation tool for PCB refinement
CN106897290B (en) Method and device for establishing keyword model
CN113076735B (en) Target information acquisition method, device and server
CN110532354A (en) The search method and device of content
KR102280490B1 (en) Training data construction method for automatically generating training data for artificial intelligence model for counseling intention classification
CN109582788A (en) Comment spam training, recognition methods, device, equipment and readable storage medium storing program for executing
CN102789449A (en) Method and device for evaluating comment text
CN103150331A (en) Method and device for providing search engine tags
Castillo et al. Text analysis using different graph-based representations
Raja et al. Fake news detection on social networks using Machine learning techniques
KR101473239B1 (en) Category and Sentiment Analysis System using Word pattern.
US20220365956A1 (en) Method and apparatus for generating patent summary information, and electronic device and medium
CN110347934B (en) Text data filtering method, device and medium
Rathan et al. Every post matters: a survey on applications of sentiment analysis in social media

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170606

RJ01 Rejection of invention patent application after publication