CN109977392B - Text feature analysis method and device - Google Patents

Text feature analysis method and device Download PDF

Info

Publication number
CN109977392B
CN109977392B CN201711459613.1A CN201711459613A CN109977392B CN 109977392 B CN109977392 B CN 109977392B CN 201711459613 A CN201711459613 A CN 201711459613A CN 109977392 B CN109977392 B CN 109977392B
Authority
CN
China
Prior art keywords
feature
evaluation object
words
target text
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711459613.1A
Other languages
Chinese (zh)
Other versions
CN109977392A (en
Inventor
王鑫
董浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201711459613.1A priority Critical patent/CN109977392B/en
Publication of CN109977392A publication Critical patent/CN109977392A/en
Application granted granted Critical
Publication of CN109977392B publication Critical patent/CN109977392B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a text feature analysis method and device, and relates to the technical field of computers. One embodiment of the method comprises the following steps: acquiring feature words in a target text, and determining grammar structures between the feature words and words in the target text; determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule; and calculating the characteristic values of the characteristic words and the evaluation object to obtain the characteristic value of the target text, and determining the characteristic of the target text. According to the embodiment, according to the grammar structures of the feature words and the words, the coverage range and the acquisition accuracy of the evaluation object are improved, and the accuracy of the target text feature is determined.

Description

Text feature analysis method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for text feature analysis.
Background
Emotion analysis, mainly mining the ideas and emotion polarities expressed by users from texts, is used for helping other users to make decisions. And the emotion tendency acquisition mainly focuses on emotion information extraction work, namely extracting emotion words and evaluation objects in a text so as to mine emotion values expressed by each evaluation object and emotion word combination in the text. The user can learn the ideas or emotions expressed by the text by browsing the words with subjective colors.
However, when the same emotion word modifies different evaluation objects, the polarities are also different, for example, the semantic polarity of the emotion word "decline" is negative, the profit decline "is a negative polarity emotion phrase, but the cost decline" is a positive polarity emotion phrase, so that the identification of the evaluation object-emotion word pair is helpful for further judging the emotion tendencies of the evaluation objects.
Related mining for text emotion mining in the prior art is mainly focused on the commodity comment field, and text emotion mining is generally carried out by means of word rule, dictionary matching and the like.
In carrying out the present invention, the inventors have found that at least the following problems exist in the prior art:
in the emotion mining mode provided by the prior art, the relation in the sentence structure in the text is often ignored, only the core words in the sentence are extracted, and part of evaluation object information is lost, so that the semantics of the evaluation object are incomplete, the text emotion mining is not facilitated, and the emotion analysis error condition exists.
Disclosure of Invention
In view of this, the embodiments of the present invention provide a method and an apparatus for text feature analysis, which at least can solve the problem of text feature analysis errors caused by incomplete evaluation object semantics in the prior art.
To achieve the above object, according to one aspect of the embodiments of the present invention, there is provided a method for text feature analysis, including: acquiring feature words in a target text, and determining grammar structures between the feature words and words in the target text; determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule; and calculating the characteristic values of the characteristic words and the evaluation object to obtain the characteristic value of the target text, and determining the characteristic of the target text.
Optionally, the determining, according to the grammar structure and a preset evaluation object grammar structure extraction rule, the evaluation object corresponding to the feature word in the target text includes:
when the feature words serve as predicates in the target text, determining that a subject corresponding to the feature words is a first evaluation object; or (b)
Determining a subject and a predicate corresponding to the feature word as a second evaluation object when the feature word serves as an object, a idiom or a complement of the predicate in the target text; or (b)
Determining a subject, a predicate and an object corresponding to the feature word as a third evaluation object when the feature word serves as an object's object or complement in the target text; or (b)
And when the feature word serves as a fixed language in the target text, determining that a non-feature word corresponding to the feature word is a fourth evaluation object.
Optionally, the method further comprises: when a parallel word corresponding to the evaluation object exists, determining the parallel word as a fifth evaluation object; or (b)
When a fixed language corresponding to the evaluation object exists, adding a non-feature word corresponding to the fixed language to the evaluation object, and generating a sixth evaluation object.
Optionally, after determining the evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule, the method further includes: when parallel words or interlocking words corresponding to the feature words exist, determining the parallel words or the interlocking words as first feature words, and determining the evaluation object as a seventh evaluation object corresponding to the first feature words;
the calculating the feature value of the feature word and the evaluation object to obtain the feature value of the target text, and the determining the feature of the target text comprises: and calculating the characteristic values of the characteristic words and the evaluation objects, and the characteristic values of the first characteristic words and the seventh evaluation objects to obtain the characteristic values of the target text, and determining the characteristics of the target text.
Optionally, the method further comprises: and analyzing part-of-speech features of each word in the target text, and acquiring nouns with the distance from the feature words within a preset distance range as eighth evaluation objects.
Optionally, the calculating the feature value of the feature word and the evaluation object to obtain the feature value of the target text, and determining the feature of the target text includes: calculating the similarity between the evaluation object and each preset representative object, determining the representative object with the similarity exceeding a preset similarity threshold value, and replacing the evaluation object as the determined representative object; and calculating the characteristic words and the determined characteristic values representing the objects to obtain the characteristic values of the target text, and obtaining the characteristics of the target text.
Optionally, the feature word is an emotion word, and the feature value is an emotion value.
To achieve the above object, according to another aspect of an embodiment of the present invention, there is provided an apparatus for text feature analysis, including: the grammar structure determining module is used for obtaining feature words in the target text and determining grammar structures between the feature words and words in the target text; the evaluation object determining module is used for determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset extraction rule of the evaluation object grammar structure; and the text feature determining module is used for calculating the feature values of the feature words and the evaluation object, obtaining the feature values of the target text and determining the features of the target text.
Optionally, the evaluation object determining module is configured to:
when the feature words serve as predicates in the target text, determining that a subject corresponding to the feature words is a first evaluation object; or (b)
Determining a subject and a predicate corresponding to the feature word as a second evaluation object when the feature word serves as an object, a idiom or a complement of the predicate in the target text; or (b)
Determining a subject, a predicate and an object corresponding to the feature word as a third evaluation object when the feature word serves as an object's object or complement in the target text; or (b)
And when the feature word serves as a fixed language in the target text, determining that a non-feature word corresponding to the feature word is a fourth evaluation object.
Optionally, the method further comprises an evaluation object expansion module for: when a parallel word corresponding to the evaluation object exists, determining the parallel word as a fifth evaluation object; or (b)
When a fixed language corresponding to the evaluation object exists, adding a non-feature word corresponding to the fixed language to the evaluation object, and generating a sixth evaluation object.
Optionally, the method further comprises a feature word expansion module for: when parallel words or interlocking words corresponding to the feature words exist, determining the parallel words or the interlocking words as first feature words, and determining the evaluation object as a seventh evaluation object corresponding to the first feature words;
The text feature determining module is used for: and calculating the characteristic values of the characteristic words and the evaluation objects, and the characteristic values of the first characteristic words and the seventh evaluation objects to obtain the characteristic values of the target text, and determining the characteristics of the target text.
Optionally, the device further comprises a part-of-speech feature analysis module for: and analyzing part-of-speech features of each word in the target text, and acquiring nouns with the distance from the feature words within a preset distance range as eighth evaluation objects.
Optionally, the text feature determining module is configured to: calculating the similarity between the evaluation object and each preset representative object, determining the representative object with the similarity exceeding a preset similarity threshold value, and replacing the evaluation object as the determined representative object; and calculating the characteristic words and the determined characteristic values representing the objects to obtain the characteristic values of the target text, and obtaining the characteristics of the target text.
Optionally, the feature word is an emotion word, and the feature value is an emotion value.
To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided an electronic device for text feature analysis.
The electronic equipment of the embodiment of the invention comprises: one or more processors; and a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of text feature analysis as described in any of the above.
To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided a computer-readable medium having stored thereon a computer program which, when executed by a processor, implements the method of text feature analysis described in any of the above.
According to the solution provided by the present invention, one embodiment of the above invention has the following advantages or beneficial effects: according to the grammar structure of the feature words and each word, the coverage range and the acquisition accuracy of the evaluation object are improved, and the accuracy of determining the target text features is improved.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic flow diagram of a method of text feature analysis according to an embodiment of the present invention;
FIG. 2 is a flow diagram of an alternative method of text feature analysis according to an embodiment of the invention;
FIG. 3 is a flow diagram of another alternative method of text feature analysis according to an embodiment of the invention;
FIG. 4 is a flow diagram of a method of yet another alternative text feature analysis in accordance with an embodiment of the present invention;
FIG. 5 is a flow diagram of a method of still another alternative text feature analysis in accordance with an embodiment of the present invention;
FIG. 6 is a schematic diagram of the main modules of an apparatus for text feature analysis according to an embodiment of the present invention;
FIG. 7 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 8 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that, the text features provided by the embodiments of the present invention include, but are not limited to, text emotion, but may also be determining that the text is news, entertainment, an administrative category, etc. Thus, the determined feature words include, but are not limited to, affective words, but may also be other words that can represent text features, such as entertainment coils. The embodiment of the invention is described by taking emotion words, emotion values and text emotion as examples.
In addition, the target text may be text data in the form of a microblog, a commodity comment, a forum comment, a blog, or the like. The emotion words may be words with emotion colors in the target text, including positive emotion words, negative emotion words, neutral emotion words, e.g., beautiful, angry, etc., and the invention is not limited herein.
Referring to fig. 1, a main flowchart of a text feature analysis method provided by an embodiment of the present invention is shown, including the following steps:
s101: and acquiring feature words in the target text, and determining grammar structures of the feature words and words in the target text.
S102: and determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule.
S103: and calculating the characteristic values of the characteristic words and the evaluation objects to obtain the characteristic values of the target text, and determining the characteristics of the target text.
In the above embodiment, for the step S101, the method of obtaining the feature word may be to perform word segmentation on the target text, match the word in the text with the feature word in the feature word dictionary, and determine the successfully matched word as the feature word of the target text. For example, the emotion dictionary may be an existing emotion dictionary, such as a large company emotion dictionary or a web emotion dictionary, or may be an emotion dictionary set according to characteristics of the target text, or may be a emotion dictionary that extracts the target text based on a machine learning method, and the embodiment of the invention is not limited herein. For example, after word segmentation processing is performed on the target text "flower very beautiful", flower/very beautiful/beautiful is obtained, where "beautiful" is an emotion word.
After word segmentation is performed on the target text, predicate verbs (predicates for short) can be identified from the target text according to semantic role labels (Semantic Role Labeling, SRL), semantic information contained in sentences is not subjected to deep analysis by taking the predicates as centers, and only grammar structures between words and predicates in the sentences are analyzed. For example, stock price/up/fast, where "fast" is an affective word, a main term Structure (SBV) between "stock price" and "up", and a dynamic complement structure (CMP) between "up" and "fast".
Semantic role labels are the main implementation form of the current shallow semantic analysis, and the meaning of each label is shown in table 1:
TABLE 1 semantic role labeling
Marking Interpretation of the drawings
A0 Story of (subject)
A2 Indirect acting object
DIR Direction
EXT Length, range
MOD General modifications
MNR Hold at all
PRP Proposal of
TMP Time
A1 Atress (object)
DIS Chapter mark
LOC Position of
NEG Negative receipt
PRD Second predicate
REC Antisense sense
ADV Adverb modification
However, the feature words do not just serve as predicates in sentences, but need to be expanded on the basis of semantic role labels when serving as other components, and specifically, the dependency relationship among the words in the target text can be analyzed according to the dependency grammar (Dependency Parsing, DP) so as to reveal the grammar structure of the feature words.
That is, the dependency syntax can analyze and identify the dependency relationship among the words of the target text, reveal the semantic modification relationship, point out the syntactic collocation relationship of the words in the sentence, and analyze the main, the predicate, the guest, the definite, the form and the complement structure of one sentence. The language technology platform provides a series of Chinese language processing rules, which totally define 24 dependency relationships, and the specific reference is shown in table 2:
TABLE 2 dependency grammar tagging system
Marking Full spelling Interpretation of the drawings
ATT attribute Centering relationship
QUN quantity Quantitative relationship
COO coordinate Parallel relationship
APP appositive Co-located relationship
LAD left adjunct Front attachment relation
RAD right adjunct Post-attachment relationship
VOB verb-object Relation of moving guest
POB preposition-object Medium guest relationship
SBV subject-verb Relationship of main and secondary terms
SIM similarity Analogue relationship
HED head Core(s)
VV verb-verb Linkage structure
DE "word" structure
DI Ground character structure
DEI 'get' word structure
BA Handle-shaped structure
BEI Quilt-shaped structure
ADV adverbial Structure in form
MT mood-tense Language structure
CMP complement Dynamic compensation structure
IS independent structure Independent structure
CNJ conjunctive Correlation structure
IC independent clause Independent clauses
DC dependent clause Dependency clause
For step S102, the structural component that the feature word serves as in the target text sentence may be predicate, object, subject, or the like. For different situations, there may be a rule for extracting an evaluation object correspondingly, and when there is a dependency structure between two words in the target text and one of the two words is a feature word, the other word in the dependency structure may be determined as the evaluation object associated with the feature word. The dependency relationship includes a master-slave relationship, a moving-guest relationship, a centering relationship, a parity relationship, and the like, which is not limited herein.
The evaluation object associated with the feature word is determined in the following manner:
(1) When a feature word serves as a predicate in a sentence, its evaluation object is a word decorated by the predicate, which may be a subject associated with the feature word.
Taking the "maximum elasticity of insurance rise" as an example, the term "rise" is a feature word, and "insurance" and "rise" are the main relationships, so < insurance (evaluation object of predicate), rise > is extracted.
Taking the example of stable growth of the current market demand of solar photovoltaic power generation and good long-term development prospect, wherein the characteristic words of growth and market demand, solar energy, photovoltaic and power generation form a main relation, but the market demand is nearest to the growth, so that < market demand (evaluation object of predicate) is extracted, and growth >; similarly, for the feature word "best", < long-term development prospect (evaluation target of predicate), best > is extracted.
Further, the word can also be a nearest action, incident or indirect action object adjacent to the feature word; wherein, the event is usually the subject and the subject is usually the object. However, when a plurality of subjects, events, and indirect action targets exist at the same time, the evaluation target is a combination of the plurality of subjects, events, and indirect action targets.
(2) When a feature word serves as a non-predicate component in a sentence, such as an object, a sense or a complement of a predicate, a sense or complement of an object, a subject or a subject's subject, it is necessary to determine the predicate associated therewith first, and then determine an evaluation object thereof based on the predicate; the manner in which the evaluation object is determined based on the predicate is shown in the rule (1) described above.
(1) Object, object or complement with characteristic words being predicates
When the feature is an object, a scholartree, or a complement of the predicate, the whole main predicate structure is modified, and the associated predicate (the predicate is a core of the VOB, the ADV, or the CMP) can be determined according to the VOB (dynamic guest relation), the ADV (dynamic complement structure), or the CMP (dynamic complement structure) to determine a comment object in which the main predicate structure is the feature, that is, an evaluation object-feature pair is < an evaluation object of the predicate+the predicate, the feature >.
Taking "quick stock price rising" as an example, wherein "quick" is a feature word, "rising" is a predicate, and "stock price" and "rising" form a main-called structure, and "rising" and "quick" form a dynamic compensation structure. According to rule (1), "fast" is the complement of the predicate "rising", then < stock price (evaluation object of predicate) +rising (predicate), fast > should be extracted.
(2) The feature word being the object's idiom or complement
When the feature word is the object's object or complement, although the direct modifier of the feature word is the object, in order to ensure the semantic integrity thereof, the evaluation object thereof is selected as the main-predicate structure of the whole sentence.
Specifically: first, when a feature word is an object's object or complement, the feature word is a modifier of ADV (in-shape structure) or CMP (dynamic complement structure), and an object associated with the feature word can be determined from ADV or CMP; secondly, the object is ADV or CMP core word and is also modifier of VOB, while the predicate is core word of VOB, and the associated predicate can be determined according to the VOB (moving object relation); finally, the evaluation object-feature word pair of the feature word is < evaluation object of predicate+predicate+object, feature word >.
Taking "the silicon raw material supply problem is effectively solved" as an example, wherein "effective" is a feature word, and "effective" and "solution" constitute a structure in a form, according to rule (2), then < the silicon raw material supply problem (subject) +get (predicate) +solution (object), effective > should be extracted.
(3) The characteristic words being the subject or object's subject
When the feature word is a subject or object's subject (the subject is not associated with a predicate), the evaluation object is a word modified by the subject, that is, a core word of ATT, and the evaluation object-feature word pair of the feature word is < core word of ATT, feature word >.
Taking the "market low-fans condition for last year of the quarter as an example, wherein" low-fans "are feature words, and" low-fans "and" conditions "are centering relations, the < conditions (core words of the centering relations) are extracted according to the rule (3), and low-fans >.
For step S103, after determining < evaluation object, feature word > of the target text, the polarity intensity of the target text may be determined, specifically, may be calculated by the polarity of each sentence in the target text, see the formula:
wherein θ (o i ,t j ) Representing one<Evaluation object, feature word>Polar intensity of E (o) i ) Represents an evaluation object o i Polarity in the target Text is expected, E (o, text) indicates that the target Text is directed to the evaluation object o i N represents n feature words in a sentence, f (o) i ) Represents the evaluation object o i Importance of different locations in the target text. The feature value of the target text can be calculated by all<Evaluation object, feature word>The average intensity of the polarity is determined and can be used for text analysis, for example, positive and negative electrodes.
For the weight f (o i ) The number of the value-taking modes can be various, and sentences of the text title, the beginning, the head and the end of the section are more representative of the characteristics of the text than sentences of other positions, so that the positions are weighted higher than the other positions. For example, the values are 5, 4, 3 and 1; wherein 5 corresponds to the evaluation object o i At the text header position, 4 corresponds to the evaluation object o i The position at the beginning or end of the text, 3, corresponds to the evaluation object o i At the first position of the text segment, 1 corresponds to the evaluation object o i In other locations of the text. However, for an evaluation object that is located at a plurality of positions at the same time, for example, an evaluation object that is located at both the beginning of a text segment and the beginning of the text, the value is taken as the maximum value, that is, 4.
The method provided by the embodiment provides an extraction idea of the < evaluation object, feature word > based on the combination of syntactic analysis and semantic role marking, and the structural features and semantic integrity of the phrase evaluation object are fully considered according to the grammar structure between the feature word and each word, so that the acquisition accuracy of the evaluation object is improved; in addition, the feature value based on text position weighting is calculated in a dividing mode, and therefore accuracy of determining the target text features is improved.
Referring to fig. 2, a main flowchart of an alternative text feature analysis method provided by an embodiment of the present invention is shown, including the following steps:
s201: and acquiring feature words in the target text, and determining grammar structures of the feature words and words in the target text.
S202: and determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule.
S203: when there is a parallel word corresponding to the evaluation target, the parallel word is determined to be the fifth evaluation target.
S204: when a stationary word corresponding to the evaluation object exists, a non-feature word corresponding to the stationary word is added to the evaluation object, and a sixth evaluation object is generated.
S205: all evaluation objects associated with the feature word are determined.
S206: and calculating the characteristic values of the characteristic words and the evaluation objects to obtain the characteristic values of the target text, and determining the characteristics of the target text.
In the above embodiment, for the steps S201, S202, S206, reference may be made to the descriptions of the steps S101, S102, S103 shown in fig. 1, and the descriptions are omitted here.
According to the manner of extracting the evaluation object shown in step S102 of fig. 1, the extracted core word mainly associated with the feature word may cause a situation that a part of the evaluation object is lost or the semantic of the evaluation object is incomplete. For example, in the rule (2) (3) in step S102, the extracted evaluation object-feature word pair is < situation, low fan > for the feature word "low fan", but it is known from the context semantic analysis that the evaluation object adopts the semantics of "market situation" more accurately than "situation". Therefore, expansion can be performed on the basis of the acquired evaluation object.
In the above embodiment, when a plurality of parallel terms are included in the evaluation target in step S203, the plurality of evaluation targets may be extracted by the parallel structure, and a plurality of evaluation target-feature term pairs may be generated by combining the plurality of evaluation targets.
Taking "important advantage of company is inexpensive electricity and labor" as an example, wherein "inexpensive" is a feature word, according to the rule of (3) in step S102 shown in fig. 1, < electricity (core word of centering relationship), inexpensive > can be extracted. Since "electricity" and "labor" are juxtaposed structures, it is possible to extract < labor (juxtaposed structures of electricity), which is inexpensive >.
For step S204, when the evaluation object has a idiom modification, a centering structure associated with the idiom may be determined, and a non-feature modification word in the centering structure may be added to the evaluation object to supplement the evaluation object.
Similarly, in the rules (2) and (3), "market low-quiz status of the last year in the quarter" is an example, wherein "low-quiz" is a feature word, the evaluation object "status" includes 3 centering structures, respectively (status, last year), (status, market) and (status, low-quiz), and "last year" constitutes a centering structure with the last year (last year ), so the expanded evaluation object-feature word pair is < last year market status (non-feature modifier), low-quiz >.
Further, in step S203, after the parallel evaluation targets are determined, the evaluation targets may be expanded in step S204. Referring to the example "company's important advantage is inexpensive electricity and labor", since "electricity" does not form a centering structure with other words, no subsequent expansion is required.
The method provided by the embodiment provides an idea of expanding the evaluation object so as to reduce the situation of losing the evaluation object or incomplete semantics, fully considers the phrase structural characteristics, improves the accuracy of acquiring the evaluation object, and further improves the accuracy of text feature analysis.
Referring to fig. 3, a main flow chart of another alternative text feature analysis method provided by the embodiment of the present invention is shown, including the following steps:
s301: and acquiring feature words in the target text, and determining grammar structures of the feature words and words in the target text.
S302: and determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule.
S303: when there is a parallel word or a interlocking word corresponding to the feature word, the parallel word or the interlocking word is determined to be the first feature word, and the evaluation object is determined to be a seventh evaluation object corresponding to the first feature word.
S304: and calculating the characteristic values of the characteristic words and the evaluation objects, and the characteristic values of the first characteristic words and the seventh evaluation objects to obtain the characteristic values of the target text, and determining the characteristics of the target text.
In the above embodiment, for the steps S301 and S302, reference may be made to the descriptions of the steps S101 and S102 shown in fig. 1, and the descriptions are omitted here.
The feature words may be present in the parallel structure and the linkage structure in addition to the sentence components (predicates, stationary words, idioms, etc.) in the sentence in the target text, that is, the feature words are linked with predicates, objects, stationary words, idioms, or complements in the sentence by the parallel structure or the linkage structure.
In the above embodiment, when the feature word appears at the modifier position in the parallel structure or the interlocked structure, the evaluation target is the evaluation target of the core word in the parallel structure or the interlocked structure in step S303.
It should be noted that, the core words in the parallel structure or the linkage structure may be feature words or not: in the case of the feature word, the evaluation object may be determined by the rule in step S102 shown in fig. 1; in the case of a non-feature word, the evaluation target of the non-feature core word may be determined according to the rules of steps S203 and S204 shown in fig. 2. However, when the feature word has a plurality of parallel structures or linkage structures, it is necessary to identify a core word in the parallel structures or linkage structures until the core word is a predicate, object, fixed, object, or complement in the sentence, and then identify an evaluation target by the core word.
Taking "the size of the property of the industry is greatly reduced, 33.86% is reduced in the last quarter" as an example:
according to rule (1) in step S102 shown in fig. 1, "scale" and "drop" are the main-predicate relations, and < scale, drop > is extracted; according to the rule of step S303, "decrease" and "decrease" are feature words and are in parallel relationship, there is < scale, decrease >.
Further, after determining the first feature word and the corresponding seventh evaluation object, the seventh evaluation object may be further expanded, for example, according to the rule of step S204 shown in fig. 2, the "peer" and the "asset", "asset" and "scale" are all in a centering relationship, so that the "peer asset scale" is expanded, and then the < peer asset scale is extracted, and the >.
The method provided by the embodiment provides an evaluation object-feature word extraction thought based on a feature word parallel structure or a linkage structure, expands the extraction range of the feature word and improves the accuracy of text feature determination.
Referring to fig. 4, there is shown a main flow chart of a method for still another alternative text feature analysis provided by an embodiment of the present invention, including the steps of:
s401: and acquiring feature words in the target text, analyzing part-of-speech features of each word in the target text, and acquiring nouns with the distance from the feature words within a preset distance range as eighth evaluation objects.
S402: and calculating the characteristic values of the characteristic words and the eighth evaluation object to obtain the characteristic value of the target text, and determining the characteristic of the target text.
In the above embodiment, for step S401, after performing word segmentation processing on the target text by using the word segmentation device, part-of-speech tagging may also be performed on the target text by combining the parts-of-speech of each word in the word stock; wherein the parts of speech include, but are not limited to, verbs, nouns, adjectives, adverbs, and the invention is not limited thereto.
If a noun exists in a predetermined distance range of the feature word, the noun is extracted as an evaluation object of the feature word. Considering that the distance between nouns and feature words has a certain relation, when the distance between nouns and feature words is far, the feature value is smaller. Therefore, a distance threshold value between the noun and the feature word may be preset, and a corresponding distance range may be determined according to the distance threshold value, for example, the distance may be set to 1; the noun nearest to the feature word may be set, and the noun may be located before the feature word or may be located after the feature word.
After determining the nouns associated with the feature words, the obtained nouns may be expanded according to steps S203 and S204 shown in fig. 2, or the feature words may be expanded according to step S303 shown in fig. 3, so as to improve the accuracy of determining the features of the target text.
Taking the example of "the solar photovoltaic power generation market demand steadily increases at present and the long-term development prospect looks good" in the rule (1) of step S102 shown in fig. 1, wherein the term closest to the feature word "increase" is "demand", and the < demand, increase > is extracted according to the rule of step S401. In addition, the "market" and the "demand" are in parallel relation, and < market demand (evaluation target of predicate) is extracted and grown > at this time according to the rule of step S203 shown in fig. 2.
For step S402, the feature values of the nouns and feature words obtained by calculation are just needed, and the rest can be referred to the description of step S103 shown in fig. 1, which is not repeated here.
The method provided by the embodiment provides a way for extracting the evaluation object based on the part-of-speech features, and the words with far distance from the feature words and small feature values can be prevented from being extracted by setting the distance range, so that the coverage range of the evaluation object is improved, and the accuracy of determining the features of the target text is ensured.
Referring to fig. 5, a main flowchart of a method for still another optional text feature analysis according to an embodiment of the present invention is shown, including the following steps:
s501: and acquiring feature words in the target text, and determining grammar structures of the feature words and words in the target text.
S502: and determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule.
S503: and calculating the similarity between the evaluation object and each preset representative object, determining the representative object with the similarity exceeding a preset similarity threshold value, and replacing the evaluation object with the determined representative object.
S504: and calculating the characteristic words and the determined characteristic values representing the objects to obtain the characteristic values of the target text, and obtaining the characteristics of the target text.
In the above embodiment, for step S501, reference is made to the description of step S101 shown in fig. 1; step S502 may refer to the description of step S102 shown in fig. 1, and may refer to the descriptions of fig. 2 to 4, which are not repeated here.
In the above embodiment, for step S503, for the target text with more feature words, the obtained evaluation objects are also more, and in order to reduce the processing workload, a clustering algorithm, for example, single-pass (text clustering algorithm) and LDA (Latent Dirichlet Allocation, topic model algorithm) may be used to effectively perform online clustering on the evaluation objects, and a specific processing manner may be as follows:
(1) Obtaining a representative object library, wherein each representative object corresponds to a class cluster; wherein, the representative object can be one of evaluation objects with similar semantics, such as sweater, skirt and shorts, and 'sweater' can be selected as the representative object; the aggregate object may be configured to be a representative object, such as sweater, skirt, and shorts, according to the actual meaning of the semantically similar evaluation object, and the representative object is "clothes";
(2) Performing similarity calculation on the obtained evaluation object and representative objects under all class clusters, for example, cosine similarity and Jaccard Index algorithm;
(3) Screening out a class cluster with highest similarity, or setting a preset similarity threshold value, judging whether the calculated similarity is larger than the preset similarity threshold value, and classifying the evaluation object into the class cluster if the calculated similarity is larger than or equal to the preset similarity threshold value; if the evaluation object is smaller than the new class cluster, a new class cluster is required to be determined, and the evaluation object is determined to be a representative object of the new class cluster.
For example, < sweater, beautiful >, < skirt, like >, < shorts, favorites >, wherein the evaluation object is sweater, skirt, shorts. If the representative object is one of the evaluation objects having similar semantics, for example, "sweater", the correspondence between the feature word and the evaluation object becomes: < sweater beautiful >, < sweater, like >, < sweater, preference >. If the representative object is an aggregate object, for example, "clothes", the correspondence relationship between the feature word and the evaluation object becomes < clothes, beautiful >, < clothes, like >, < clothes, preference >.
In addition, the representative object may be pre-established; the target text can also be extracted according to the feature words; after the evaluation objects are classified into class clusters with the similarity exceeding a predetermined similarity threshold, counting the occurrence times of each evaluation object in the class clusters, and selecting the evaluation object with the largest occurrence time as a representative object.
With step S303, after determining a representative object corresponding to the evaluation object or determining a representative object in each class cluster, it is necessary to replace all the evaluation objects in the class cluster with the representative object when calculating the text feature.
And matching based on the representative object and the feature words, and acquiring the feature value of each sentence aiming at the representative object in the target text. For example, < sweater, beautiful >, < skirt, like >, < shorts, preference >, based on the fact that the representative object is < clothes, beautiful >, < clothes, like >, < clothes, preference >, the characteristic value of "clothes" in the text is calculated at this time.
However, when there is no cluster or representative object corresponding to the evaluation object, the evaluation object is the representative object at this time, for example, < sweater, beautiful >, and the feature value of "sweater" in the text is calculated at this time.
Further, in order to reduce the calculation workload, the evaluation objects in the class clusters can be ranked from large to small, the first n class clusters are selected, and only the characteristic value of each representative object in the class clusters in the target text is calculated.
The method provided by the embodiment provides a concept of replacing the evaluation object with the representative object, reduces the processing workload of subsequent calculation of the feature value of the < evaluation object and the feature word >, and improves the accuracy of acquiring the target text feature.
The method provided by the embodiment of the invention provides an extraction idea of < evaluation object, feature word > based on the combination of syntactic analysis and semantic role marking, and fully considers the structural features and semantic integrity of the phrase evaluation object according to the syntactic structure of the feature word and each word, and improves the coverage range and acquisition accuracy of the evaluation object by combining the expansion of the evaluation object and the feature word, and further improves the accuracy of determining the target text feature based on the calculation of the feature score weighted by the text position.
Referring to fig. 6, a schematic diagram of main modules of an apparatus 600 for text feature analysis according to an embodiment of the present invention is shown, including:
the grammar structure determining module 601 is configured to obtain feature words in a target text, and determine grammar structures between the feature words and words in the target text;
the evaluation object determining module 602 is configured to determine an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset extraction rule of the evaluation object grammar structure;
and a text feature determining module 603, configured to calculate feature values of the feature words and the evaluation object, obtain feature values of the target text, and determine features of the target text.
In the apparatus provided by the embodiment of the present invention, the evaluation object determining module 602 is configured to:
when the feature words serve as predicates in the target text, determining that a subject corresponding to the feature words is a first evaluation object; or (b)
Determining a subject and a predicate corresponding to the feature word as a second evaluation object when the feature word serves as an object, a idiom or a complement of the predicate in the target text; or (b)
Determining a subject, a predicate and an object corresponding to the feature word as a third evaluation object when the feature word serves as an object's object or complement in the target text; or (b)
And when the feature word serves as a fixed language in the target text, determining that a non-feature word corresponding to the feature word is a fourth evaluation object.
The device provided by the embodiment of the invention further comprises an evaluation object expansion module 604, which is used for:
when a parallel word corresponding to the evaluation object exists, determining the parallel word as a fifth evaluation object; or (b)
When a fixed language corresponding to the evaluation object exists, adding a non-feature word corresponding to the fixed language to the evaluation object, and generating a sixth evaluation object.
The device provided by the embodiment of the invention further comprises a feature word expansion module 605, which is used for:
When parallel words or interlocking words corresponding to the feature words exist, determining the parallel words or the interlocking words as first feature words, and determining the evaluation object as a seventh evaluation object corresponding to the first feature words;
the text feature determining module is used for:
and calculating the characteristic values of the characteristic words and the evaluation objects, and the characteristic values of the first characteristic words and the seventh evaluation objects to obtain the characteristic values of the target text, and determining the characteristics of the target text.
The apparatus provided by the embodiment of the present invention further includes a part-of-speech feature analysis module 606, configured to:
and analyzing part-of-speech features of each word in the target text, and acquiring nouns with the distance from the feature words within a preset distance range as eighth evaluation objects.
In the apparatus provided by the embodiment of the present invention, the text feature determining module 603 is configured to:
calculating the similarity between the evaluation object and each preset representative object, determining the representative object with the similarity exceeding a preset similarity threshold value, and replacing the evaluation object as the determined representative object;
and calculating the characteristic words and the determined characteristic values representing the objects to obtain the characteristic values of the target text, and obtaining the characteristics of the target text.
In the device provided by the embodiment of the invention, the characteristic words are emotion words, and the characteristic values are emotion values.
The device provided by the embodiment of the invention provides an extraction idea of < evaluation object, feature word > based on the combination of syntactic analysis and semantic role marking, and the structural feature and semantic integrity of the phrase evaluation object are fully considered according to the syntactic structure of the feature word and each word, so that the coverage range and the acquisition accuracy of the evaluation object are improved by combining the expansion of the evaluation object and the feature word, and the accuracy of determining the target text feature is further improved by calculating the feature score based on the text position weighting.
Referring to fig. 7, an exemplary system architecture 700 to which the text feature analysis method or text feature analysis apparatus of embodiments of the present invention may be applied is shown.
As shown in fig. 7, a system architecture 700 may include terminal devices 701, 702, 703, a network 704, and a server 705. The network 704 is the medium used to provide communication links between the terminal devices 701, 702, 703 and the server 705. The network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 705 via the network 704 using the terminal devices 701, 702, 703 to receive or send messages or the like. Various communication client applications such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social system software, etc., may be installed on the terminal devices 701, 702, 703, as just examples.
The terminal devices 701, 702, 703 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 705 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using the terminal devices 701, 702, 703. The background management server may analyze and process the received data such as the product information query request, and feedback the processing result (e.g., the target push information, the product information—only an example) to the terminal device.
It should be noted that, the method for arranging objects in a list provided by the embodiment of the present invention is generally performed by the server 705, and accordingly, the device for arranging objects in the list is generally disposed in the server 705.
It should be understood that the number of terminal devices, networks and servers in fig. 7 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring to fig. 8, there is shown a schematic diagram of a computer system 800 suitable for use in implementing an embodiment of the invention. The terminal device shown in fig. 8 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU) 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
The following components are connected to the I/O interface 805: an entry section 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 801.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes a grammar structure determination module, an evaluation object determination module, and a text feature determination module. The names of these modules do not constitute limitations on the module itself in some cases, and for example, the grammar structure determination module may also be described as a "feature word and grammar structure determination module".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include:
acquiring feature words in a target text, and determining grammar structures between the feature words and words in the target text;
determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule;
and calculating the characteristic values of the characteristic words and the evaluation object to obtain the characteristic value of the target text, and determining the characteristic of the target text.
According to the technical scheme of the embodiment of the invention, the structural characteristics and semantic integrity of the phrase evaluation object are fully considered according to the grammar structure between the feature words and each word, and the coverage range and the acquisition accuracy of the evaluation object are improved and the accuracy of the target text characteristics is determined by combining the expansion of the evaluation object and the feature words.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (14)

1. A method of text feature analysis, comprising:
acquiring feature words in a target text, and determining grammar structures between the feature words and words in the target text;
determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule; when the feature words serve as predicates in sentences, the evaluation objects of the feature words are words decorated by the predicates; when the feature words serve as non-predicate components in the sentence, the predicate associated with the feature words needs to be determined first, and then an evaluation object is determined according to the predicate;
when a parallel word corresponding to the evaluation object exists, determining the parallel word as a fifth evaluation object; or when a fixed language corresponding to the evaluation object exists, adding a non-characteristic word corresponding to the fixed language to the evaluation object to generate a sixth evaluation object;
Determining weights according to the importance degrees of the obtained evaluation objects at different positions in the target text, calculating the feature values of the feature words and the obtained evaluation objects, obtaining the feature values of the target text, and determining the features of the target text; wherein the obtained evaluation object is one of the evaluation object and the fifth evaluation object, the evaluation object and the sixth evaluation object, the evaluation object and the fifth evaluation object and the sixth evaluation object.
2. The method according to claim 1, wherein the determining, according to the grammar structure and a preset evaluation object grammar structure extraction rule, an evaluation object corresponding to the feature word in the target text includes:
when the feature words serve as predicates in the target text, determining that a subject corresponding to the feature words is a first evaluation object; or (b)
Determining a subject and a predicate corresponding to the feature word as a second evaluation object when the feature word serves as an object, a idiom or a complement of the predicate in the target text; or (b)
Determining a subject, a predicate and an object corresponding to the feature word as a third evaluation object when the feature word serves as an object's object or complement in the target text; or (b)
And when the feature word serves as a fixed language in the target text, determining that a non-feature word corresponding to the feature word is a fourth evaluation object.
3. The method according to claim 1, wherein after determining the evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule, the method further comprises:
when parallel words or interlocking words corresponding to the feature words exist, determining the parallel words or the interlocking words as first feature words, and determining the evaluation object as a seventh evaluation object corresponding to the first feature words;
calculating the feature value of the feature word and the obtained evaluation object to obtain the feature value of the target text, wherein determining the feature of the target text comprises the following steps:
and calculating the characteristic values of the characteristic words and the obtained evaluation objects, and the characteristic values of the first characteristic words and the seventh evaluation objects to obtain the characteristic values of the target text, and determining the characteristics of the target text.
4. The method as recited in claim 1, further comprising:
and analyzing part-of-speech features of each word in the target text, and acquiring nouns with the distance from the feature words within a preset distance range as eighth evaluation objects.
5. The method of claim 1, wherein the calculating the feature value of the feature word and the obtained evaluation object to obtain the feature value of the target text, and determining the feature of the target text comprises:
calculating the similarity between the obtained evaluation object and each preset representative object, determining the representative object with the similarity exceeding a preset similarity threshold value, and replacing the obtained evaluation object with the determined representative object;
and calculating the characteristic words and the determined characteristic values representing the objects to obtain the characteristic values of the target text, and obtaining the characteristics of the target text.
6. The method of any one of claims 1-5, wherein the feature word is an affective word and the feature value is an affective value.
7. An apparatus for text feature analysis, comprising:
the grammar structure determining module is used for obtaining feature words in the target text and determining grammar structures between the feature words and words in the target text;
the evaluation object determining module is used for determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset extraction rule of the evaluation object grammar structure;
The evaluation object expansion module is used for determining that the parallel words are fifth evaluation objects when the parallel words corresponding to the evaluation objects exist; or when a fixed language corresponding to the evaluation object exists, adding a non-characteristic word corresponding to the fixed language to the evaluation object to generate a sixth evaluation object;
the text feature determining module is used for determining weights according to the importance degrees of the obtained evaluation objects at different positions in the target text, calculating the feature values of the feature words and the obtained evaluation objects, obtaining the feature values of the target text, and determining the features of the target text; wherein the obtained evaluation object is one of the evaluation object and the fifth evaluation object, the evaluation object and the sixth evaluation object, the evaluation object and the fifth evaluation object and the sixth evaluation object.
8. The apparatus of claim 7, wherein the evaluation object determination module is configured to:
when the feature words serve as predicates in the target text, determining that a subject corresponding to the feature words is a first evaluation object; or (b)
Determining a subject and a predicate corresponding to the feature word as a second evaluation object when the feature word serves as an object, a idiom or a complement of the predicate in the target text; or (b)
Determining a subject, a predicate and an object corresponding to the feature word as a third evaluation object when the feature word serves as an object's object or complement in the target text; or (b)
And when the feature word serves as a fixed language in the target text, determining that a non-feature word corresponding to the feature word is a fourth evaluation object.
9. The apparatus of claim 7, further comprising a feature word expansion module configured to:
when parallel words or interlocking words corresponding to the feature words exist, determining the parallel words or the interlocking words as first feature words, and determining the evaluation object as a seventh evaluation object corresponding to the first feature words;
the text feature determining module is used for:
and calculating the characteristic values of the characteristic words and the obtained evaluation objects, and the characteristic values of the first characteristic words and the seventh evaluation objects to obtain the characteristic values of the target text, and determining the characteristics of the target text.
10. The apparatus of claim 7, further comprising a part-of-speech feature analysis module to:
and analyzing part-of-speech features of each word in the target text, and acquiring nouns with the distance from the feature words within a preset distance range as eighth evaluation objects.
11. The apparatus of claim 7, wherein the text feature determination module is configured to:
calculating the similarity between the obtained evaluation object and each preset representative object, determining the representative object with the similarity exceeding a preset similarity threshold value, and replacing the obtained evaluation object with the determined representative object;
and calculating the characteristic words and the determined characteristic values representing the objects to obtain the characteristic values of the target text, and obtaining the characteristics of the target text.
12. The apparatus according to any one of claims 7 to 11, wherein the feature words are emotion words and the feature values are emotion values.
13. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.
14. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.
CN201711459613.1A 2017-12-28 2017-12-28 Text feature analysis method and device Active CN109977392B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711459613.1A CN109977392B (en) 2017-12-28 2017-12-28 Text feature analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711459613.1A CN109977392B (en) 2017-12-28 2017-12-28 Text feature analysis method and device

Publications (2)

Publication Number Publication Date
CN109977392A CN109977392A (en) 2019-07-05
CN109977392B true CN109977392B (en) 2024-02-09

Family

ID=67074639

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711459613.1A Active CN109977392B (en) 2017-12-28 2017-12-28 Text feature analysis method and device

Country Status (1)

Country Link
CN (1) CN109977392B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881402A (en) * 2015-06-02 2015-09-02 北京京东尚科信息技术有限公司 Method and device for analyzing semantic orientation of Chinese network topic comment text
CN105243129A (en) * 2015-09-30 2016-01-13 清华大学深圳研究生院 Commodity property characteristic word clustering method
CN106339368A (en) * 2016-08-24 2017-01-18 乐视控股(北京)有限公司 Text emotional tendency acquiring method and device
JP2017120634A (en) * 2015-12-28 2017-07-06 株式会社リコー Analytical method and device for sentimental word polarity
CN107291689A (en) * 2017-05-31 2017-10-24 温州市鹿城区中津先进科技研究院 A kind of analysis method based on the Chinese network comments sentence theme semantic tendency of big data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881402A (en) * 2015-06-02 2015-09-02 北京京东尚科信息技术有限公司 Method and device for analyzing semantic orientation of Chinese network topic comment text
CN105243129A (en) * 2015-09-30 2016-01-13 清华大学深圳研究生院 Commodity property characteristic word clustering method
JP2017120634A (en) * 2015-12-28 2017-07-06 株式会社リコー Analytical method and device for sentimental word polarity
CN106339368A (en) * 2016-08-24 2017-01-18 乐视控股(北京)有限公司 Text emotional tendency acquiring method and device
CN107291689A (en) * 2017-05-31 2017-10-24 温州市鹿城区中津先进科技研究院 A kind of analysis method based on the Chinese network comments sentence theme semantic tendency of big data

Also Published As

Publication number Publication date
CN109977392A (en) 2019-07-05

Similar Documents

Publication Publication Date Title
Furlan et al. Semantic similarity of short texts in languages with a deficient natural language processing support
CN103049435A (en) Text fine granularity sentiment analysis method and text fine granularity sentiment analysis device
CN104978332B (en) User-generated content label data generation method, device and correlation technique and device
CN107798622B (en) Method and device for identifying user intention
CN113326420B (en) Question retrieval method, device, electronic equipment and medium
US20180039889A1 (en) Surfacing unique facts for entities
CN113836314B (en) Knowledge graph construction method, device, equipment and storage medium
CN114861889A (en) Deep learning model training method, target object detection method and device
Singh et al. Sentiment analysis using lexicon based approach
CN110245357B (en) Main entity identification method and device
CN112686035A (en) Method and device for vectorizing unknown words
El-Halees Opinion mining from Arabic comparative sentences
CN114116997A (en) Knowledge question answering method, knowledge question answering device, electronic equipment and storage medium
CN113919424A (en) Training of text processing model, text processing method, device, equipment and medium
CN112926298A (en) News content identification method, related device and computer program product
CN110807097A (en) Method and device for analyzing data
CN111046169B (en) Method, device, equipment and storage medium for extracting subject term
CN109977392B (en) Text feature analysis method and device
CN112926297B (en) Method, apparatus, device and storage medium for processing information
Singh et al. Computing sentiment polarity of texts at document and aspect levels
CN113378015B (en) Search method, search device, electronic apparatus, storage medium, and program product
WO2010132062A1 (en) System and methods for sentiment analysis
CN110750708A (en) Keyword recommendation method and device and electronic equipment
CN114255067A (en) Data pricing method and device, electronic equipment and storage medium
CN114218431A (en) Video searching method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant