CN109977392B

CN109977392B - Text feature analysis method and device

Info

Publication number: CN109977392B
Application number: CN201711459613.1A
Authority: CN
Inventors: 王鑫; 董浩
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2017-12-28
Filing date: 2017-12-28
Publication date: 2024-02-09
Anticipated expiration: 2037-12-28
Also published as: CN109977392A

Abstract

The invention discloses a text feature analysis method and device, and relates to the technical field of computers. One embodiment of the method comprises the following steps: acquiring feature words in a target text, and determining grammar structures between the feature words and words in the target text; determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule; and calculating the characteristic values of the characteristic words and the evaluation object to obtain the characteristic value of the target text, and determining the characteristic of the target text. According to the embodiment, according to the grammar structures of the feature words and the words, the coverage range and the acquisition accuracy of the evaluation object are improved, and the accuracy of the target text feature is determined.

Description

Text feature analysis method and device

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for text feature analysis.

Background

Emotion analysis, mainly mining the ideas and emotion polarities expressed by users from texts, is used for helping other users to make decisions. And the emotion tendency acquisition mainly focuses on emotion information extraction work, namely extracting emotion words and evaluation objects in a text so as to mine emotion values expressed by each evaluation object and emotion word combination in the text. The user can learn the ideas or emotions expressed by the text by browsing the words with subjective colors.

However, when the same emotion word modifies different evaluation objects, the polarities are also different, for example, the semantic polarity of the emotion word "decline" is negative, the profit decline "is a negative polarity emotion phrase, but the cost decline" is a positive polarity emotion phrase, so that the identification of the evaluation object-emotion word pair is helpful for further judging the emotion tendencies of the evaluation objects.

Related mining for text emotion mining in the prior art is mainly focused on the commodity comment field, and text emotion mining is generally carried out by means of word rule, dictionary matching and the like.

In carrying out the present invention, the inventors have found that at least the following problems exist in the prior art:

in the emotion mining mode provided by the prior art, the relation in the sentence structure in the text is often ignored, only the core words in the sentence are extracted, and part of evaluation object information is lost, so that the semantics of the evaluation object are incomplete, the text emotion mining is not facilitated, and the emotion analysis error condition exists.

Disclosure of Invention

In view of this, the embodiments of the present invention provide a method and an apparatus for text feature analysis, which at least can solve the problem of text feature analysis errors caused by incomplete evaluation object semantics in the prior art.

To achieve the above object, according to one aspect of the embodiments of the present invention, there is provided a method for text feature analysis, including: acquiring feature words in a target text, and determining grammar structures between the feature words and words in the target text; determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule; and calculating the characteristic values of the characteristic words and the evaluation object to obtain the characteristic value of the target text, and determining the characteristic of the target text.

Optionally, the determining, according to the grammar structure and a preset evaluation object grammar structure extraction rule, the evaluation object corresponding to the feature word in the target text includes:

when the feature words serve as predicates in the target text, determining that a subject corresponding to the feature words is a first evaluation object; or (b)

Determining a subject and a predicate corresponding to the feature word as a second evaluation object when the feature word serves as an object, a idiom or a complement of the predicate in the target text; or (b)

Determining a subject, a predicate and an object corresponding to the feature word as a third evaluation object when the feature word serves as an object's object or complement in the target text; or (b)

And when the feature word serves as a fixed language in the target text, determining that a non-feature word corresponding to the feature word is a fourth evaluation object.

Optionally, the method further comprises: when a parallel word corresponding to the evaluation object exists, determining the parallel word as a fifth evaluation object; or (b)

When a fixed language corresponding to the evaluation object exists, adding a non-feature word corresponding to the fixed language to the evaluation object, and generating a sixth evaluation object.

Optionally, after determining the evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule, the method further includes: when parallel words or interlocking words corresponding to the feature words exist, determining the parallel words or the interlocking words as first feature words, and determining the evaluation object as a seventh evaluation object corresponding to the first feature words;

the calculating the feature value of the feature word and the evaluation object to obtain the feature value of the target text, and the determining the feature of the target text comprises: and calculating the characteristic values of the characteristic words and the evaluation objects, and the characteristic values of the first characteristic words and the seventh evaluation objects to obtain the characteristic values of the target text, and determining the characteristics of the target text.

Optionally, the method further comprises: and analyzing part-of-speech features of each word in the target text, and acquiring nouns with the distance from the feature words within a preset distance range as eighth evaluation objects.

Optionally, the calculating the feature value of the feature word and the evaluation object to obtain the feature value of the target text, and determining the feature of the target text includes: calculating the similarity between the evaluation object and each preset representative object, determining the representative object with the similarity exceeding a preset similarity threshold value, and replacing the evaluation object as the determined representative object; and calculating the characteristic words and the determined characteristic values representing the objects to obtain the characteristic values of the target text, and obtaining the characteristics of the target text.

Optionally, the feature word is an emotion word, and the feature value is an emotion value.

To achieve the above object, according to another aspect of an embodiment of the present invention, there is provided an apparatus for text feature analysis, including: the grammar structure determining module is used for obtaining feature words in the target text and determining grammar structures between the feature words and words in the target text; the evaluation object determining module is used for determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset extraction rule of the evaluation object grammar structure; and the text feature determining module is used for calculating the feature values of the feature words and the evaluation object, obtaining the feature values of the target text and determining the features of the target text.

Optionally, the evaluation object determining module is configured to:

Optionally, the method further comprises an evaluation object expansion module for: when a parallel word corresponding to the evaluation object exists, determining the parallel word as a fifth evaluation object; or (b)

Optionally, the method further comprises a feature word expansion module for: when parallel words or interlocking words corresponding to the feature words exist, determining the parallel words or the interlocking words as first feature words, and determining the evaluation object as a seventh evaluation object corresponding to the first feature words;

The text feature determining module is used for: and calculating the characteristic values of the characteristic words and the evaluation objects, and the characteristic values of the first characteristic words and the seventh evaluation objects to obtain the characteristic values of the target text, and determining the characteristics of the target text.

Optionally, the device further comprises a part-of-speech feature analysis module for: and analyzing part-of-speech features of each word in the target text, and acquiring nouns with the distance from the feature words within a preset distance range as eighth evaluation objects.

Optionally, the text feature determining module is configured to: calculating the similarity between the evaluation object and each preset representative object, determining the representative object with the similarity exceeding a preset similarity threshold value, and replacing the evaluation object as the determined representative object; and calculating the characteristic words and the determined characteristic values representing the objects to obtain the characteristic values of the target text, and obtaining the characteristics of the target text.

To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided an electronic device for text feature analysis.

The electronic equipment of the embodiment of the invention comprises: one or more processors; and a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of text feature analysis as described in any of the above.

To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided a computer-readable medium having stored thereon a computer program which, when executed by a processor, implements the method of text feature analysis described in any of the above.

According to the solution provided by the present invention, one embodiment of the above invention has the following advantages or beneficial effects: according to the grammar structure of the feature words and each word, the coverage range and the acquisition accuracy of the evaluation object are improved, and the accuracy of determining the target text features is improved.

Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic flow diagram of a method of text feature analysis according to an embodiment of the present invention;

FIG. 2 is a flow diagram of an alternative method of text feature analysis according to an embodiment of the invention;

FIG. 3 is a flow diagram of another alternative method of text feature analysis according to an embodiment of the invention;

FIG. 4 is a flow diagram of a method of yet another alternative text feature analysis in accordance with an embodiment of the present invention;

FIG. 5 is a flow diagram of a method of still another alternative text feature analysis in accordance with an embodiment of the present invention;

FIG. 6 is a schematic diagram of the main modules of an apparatus for text feature analysis according to an embodiment of the present invention;

FIG. 7 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;

fig. 8 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, the text features provided by the embodiments of the present invention include, but are not limited to, text emotion, but may also be determining that the text is news, entertainment, an administrative category, etc. Thus, the determined feature words include, but are not limited to, affective words, but may also be other words that can represent text features, such as entertainment coils. The embodiment of the invention is described by taking emotion words, emotion values and text emotion as examples.

In addition, the target text may be text data in the form of a microblog, a commodity comment, a forum comment, a blog, or the like. The emotion words may be words with emotion colors in the target text, including positive emotion words, negative emotion words, neutral emotion words, e.g., beautiful, angry, etc., and the invention is not limited herein.

Referring to fig. 1, a main flowchart of a text feature analysis method provided by an embodiment of the present invention is shown, including the following steps:

s101: and acquiring feature words in the target text, and determining grammar structures of the feature words and words in the target text.

S102: and determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule.

S103: and calculating the characteristic values of the characteristic words and the evaluation objects to obtain the characteristic values of the target text, and determining the characteristics of the target text.

In the above embodiment, for the step S101, the method of obtaining the feature word may be to perform word segmentation on the target text, match the word in the text with the feature word in the feature word dictionary, and determine the successfully matched word as the feature word of the target text. For example, the emotion dictionary may be an existing emotion dictionary, such as a large company emotion dictionary or a web emotion dictionary, or may be an emotion dictionary set according to characteristics of the target text, or may be a emotion dictionary that extracts the target text based on a machine learning method, and the embodiment of the invention is not limited herein. For example, after word segmentation processing is performed on the target text "flower very beautiful", flower/very beautiful/beautiful is obtained, where "beautiful" is an emotion word.

After word segmentation is performed on the target text, predicate verbs (predicates for short) can be identified from the target text according to semantic role labels (Semantic Role Labeling, SRL), semantic information contained in sentences is not subjected to deep analysis by taking the predicates as centers, and only grammar structures between words and predicates in the sentences are analyzed. For example, stock price/up/fast, where "fast" is an affective word, a main term Structure (SBV) between "stock price" and "up", and a dynamic complement structure (CMP) between "up" and "fast".

Semantic role labels are the main implementation form of the current shallow semantic analysis, and the meaning of each label is shown in table 1:

TABLE 1 semantic role labeling

Marking	Interpretation of the drawings
		A0	Story of (subject)
A2	Indirect acting object
		DIR	Direction
EXT	Length, range
		MOD	General modifications
MNR	Hold at all
		PRP	Proposal of
TMP	Time
		A1	Atress (object)
DIS	Chapter mark
		LOC	Position of
NEG	Negative receipt
		PRD	Second predicate
REC	Antisense sense
		ADV	Adverb modification

However, the feature words do not just serve as predicates in sentences, but need to be expanded on the basis of semantic role labels when serving as other components, and specifically, the dependency relationship among the words in the target text can be analyzed according to the dependency grammar (Dependency Parsing, DP) so as to reveal the grammar structure of the feature words.

That is, the dependency syntax can analyze and identify the dependency relationship among the words of the target text, reveal the semantic modification relationship, point out the syntactic collocation relationship of the words in the sentence, and analyze the main, the predicate, the guest, the definite, the form and the complement structure of one sentence. The language technology platform provides a series of Chinese language processing rules, which totally define 24 dependency relationships, and the specific reference is shown in table 2:

TABLE 2 dependency grammar tagging system

Marking	Full spelling	Interpretation of the drawings
			ATT	attribute	Centering relationship
QUN	quantity	Quantitative relationship
			COO	coordinate	Parallel relationship
APP	appositive	Co-located relationship
			LAD	left adjunct	Front attachment relation
RAD	right adjunct	Post-attachment relationship
			VOB	verb-object	Relation of moving guest
POB	preposition-object	Medium guest relationship
			SBV	subject-verb	Relationship of main and secondary terms
SIM	similarity	Analogue relationship
			HED	head	Core(s)
VV	verb-verb	Linkage structure
			DE		"word" structure
DI		Ground character structure
			DEI		'get' word structure
BA		Handle-shaped structure
			BEI		Quilt-shaped structure
ADV	adverbial	Structure in form
			MT	mood-tense	Language structure
CMP	complement	Dynamic compensation structure
			IS	independent structure	Independent structure
CNJ	conjunctive	Correlation structure
			IC	independent clause	Independent clauses
DC	dependent clause	Dependency clause

For step S102, the structural component that the feature word serves as in the target text sentence may be predicate, object, subject, or the like. For different situations, there may be a rule for extracting an evaluation object correspondingly, and when there is a dependency structure between two words in the target text and one of the two words is a feature word, the other word in the dependency structure may be determined as the evaluation object associated with the feature word. The dependency relationship includes a master-slave relationship, a moving-guest relationship, a centering relationship, a parity relationship, and the like, which is not limited herein.

The evaluation object associated with the feature word is determined in the following manner:

(1) When a feature word serves as a predicate in a sentence, its evaluation object is a word decorated by the predicate, which may be a subject associated with the feature word.

Taking the "maximum elasticity of insurance rise" as an example, the term "rise" is a feature word, and "insurance" and "rise" are the main relationships, so < insurance (evaluation object of predicate), rise > is extracted.

Taking the example of stable growth of the current market demand of solar photovoltaic power generation and good long-term development prospect, wherein the characteristic words of growth and market demand, solar energy, photovoltaic and power generation form a main relation, but the market demand is nearest to the growth, so that < market demand (evaluation object of predicate) is extracted, and growth >; similarly, for the feature word "best", < long-term development prospect (evaluation target of predicate), best > is extracted.

Further, the word can also be a nearest action, incident or indirect action object adjacent to the feature word; wherein, the event is usually the subject and the subject is usually the object. However, when a plurality of subjects, events, and indirect action targets exist at the same time, the evaluation target is a combination of the plurality of subjects, events, and indirect action targets.

(2) When a feature word serves as a non-predicate component in a sentence, such as an object, a sense or a complement of a predicate, a sense or complement of an object, a subject or a subject's subject, it is necessary to determine the predicate associated therewith first, and then determine an evaluation object thereof based on the predicate; the manner in which the evaluation object is determined based on the predicate is shown in the rule (1) described above.

(1) Object, object or complement with characteristic words being predicates

When the feature is an object, a scholartree, or a complement of the predicate, the whole main predicate structure is modified, and the associated predicate (the predicate is a core of the VOB, the ADV, or the CMP) can be determined according to the VOB (dynamic guest relation), the ADV (dynamic complement structure), or the CMP (dynamic complement structure) to determine a comment object in which the main predicate structure is the feature, that is, an evaluation object-feature pair is < an evaluation object of the predicate+the predicate, the feature >.

Taking "quick stock price rising" as an example, wherein "quick" is a feature word, "rising" is a predicate, and "stock price" and "rising" form a main-called structure, and "rising" and "quick" form a dynamic compensation structure. According to rule (1), "fast" is the complement of the predicate "rising", then < stock price (evaluation object of predicate) +rising (predicate), fast > should be extracted.

(2) The feature word being the object's idiom or complement

When the feature word is the object's object or complement, although the direct modifier of the feature word is the object, in order to ensure the semantic integrity thereof, the evaluation object thereof is selected as the main-predicate structure of the whole sentence.

Specifically: first, when a feature word is an object's object or complement, the feature word is a modifier of ADV (in-shape structure) or CMP (dynamic complement structure), and an object associated with the feature word can be determined from ADV or CMP; secondly, the object is ADV or CMP core word and is also modifier of VOB, while the predicate is core word of VOB, and the associated predicate can be determined according to the VOB (moving object relation); finally, the evaluation object-feature word pair of the feature word is < evaluation object of predicate+predicate+object, feature word >.

Taking "the silicon raw material supply problem is effectively solved" as an example, wherein "effective" is a feature word, and "effective" and "solution" constitute a structure in a form, according to rule (2), then < the silicon raw material supply problem (subject) +get (predicate) +solution (object), effective > should be extracted.

(3) The characteristic words being the subject or object's subject

When the feature word is a subject or object's subject (the subject is not associated with a predicate), the evaluation object is a word modified by the subject, that is, a core word of ATT, and the evaluation object-feature word pair of the feature word is < core word of ATT, feature word >.

Taking the "market low-fans condition for last year of the quarter as an example, wherein" low-fans "are feature words, and" low-fans "and" conditions "are centering relations, the < conditions (core words of the centering relations) are extracted according to the rule (3), and low-fans >.

For step S103, after determining < evaluation object, feature word > of the target text, the polarity intensity of the target text may be determined, specifically, may be calculated by the polarity of each sentence in the target text, see the formula:

wherein θ (o _i ,t _j ) Representing one<Evaluation object, feature word>Polar intensity of E (o) _i ) Represents an evaluation object o _i Polarity in the target Text is expected, E (o, text) indicates that the target Text is directed to the evaluation object o _i N represents n feature words in a sentence, f (o) _i ) Represents the evaluation object o _i Importance of different locations in the target text. The feature value of the target text can be calculated by all<Evaluation object, feature word>The average intensity of the polarity is determined and can be used for text analysis, for example, positive and negative electrodes.

For the weight f (o _i ) The number of the value-taking modes can be various, and sentences of the text title, the beginning, the head and the end of the section are more representative of the characteristics of the text than sentences of other positions, so that the positions are weighted higher than the other positions. For example, the values are 5, 4, 3 and 1; wherein 5 corresponds to the evaluation object o _i At the text header position, 4 corresponds to the evaluation object o _i The position at the beginning or end of the text, 3, corresponds to the evaluation object o _i At the first position of the text segment, 1 corresponds to the evaluation object o _i In other locations of the text. However, for an evaluation object that is located at a plurality of positions at the same time, for example, an evaluation object that is located at both the beginning of a text segment and the beginning of the text, the value is taken as the maximum value, that is, 4.

The method provided by the embodiment provides an extraction idea of the < evaluation object, feature word > based on the combination of syntactic analysis and semantic role marking, and the structural features and semantic integrity of the phrase evaluation object are fully considered according to the grammar structure between the feature word and each word, so that the acquisition accuracy of the evaluation object is improved; in addition, the feature value based on text position weighting is calculated in a dividing mode, and therefore accuracy of determining the target text features is improved.

Referring to fig. 2, a main flowchart of an alternative text feature analysis method provided by an embodiment of the present invention is shown, including the following steps:

s201: and acquiring feature words in the target text, and determining grammar structures of the feature words and words in the target text.

S202: and determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule.

S203: when there is a parallel word corresponding to the evaluation target, the parallel word is determined to be the fifth evaluation target.

S204: when a stationary word corresponding to the evaluation object exists, a non-feature word corresponding to the stationary word is added to the evaluation object, and a sixth evaluation object is generated.

S205: all evaluation objects associated with the feature word are determined.

S206: and calculating the characteristic values of the characteristic words and the evaluation objects to obtain the characteristic values of the target text, and determining the characteristics of the target text.

In the above embodiment, for the steps S201, S202, S206, reference may be made to the descriptions of the steps S101, S102, S103 shown in fig. 1, and the descriptions are omitted here.

According to the manner of extracting the evaluation object shown in step S102 of fig. 1, the extracted core word mainly associated with the feature word may cause a situation that a part of the evaluation object is lost or the semantic of the evaluation object is incomplete. For example, in the rule (2) (3) in step S102, the extracted evaluation object-feature word pair is < situation, low fan > for the feature word "low fan", but it is known from the context semantic analysis that the evaluation object adopts the semantics of "market situation" more accurately than "situation". Therefore, expansion can be performed on the basis of the acquired evaluation object.

In the above embodiment, when a plurality of parallel terms are included in the evaluation target in step S203, the plurality of evaluation targets may be extracted by the parallel structure, and a plurality of evaluation target-feature term pairs may be generated by combining the plurality of evaluation targets.

Taking "important advantage of company is inexpensive electricity and labor" as an example, wherein "inexpensive" is a feature word, according to the rule of (3) in step S102 shown in fig. 1, < electricity (core word of centering relationship), inexpensive > can be extracted. Since "electricity" and "labor" are juxtaposed structures, it is possible to extract < labor (juxtaposed structures of electricity), which is inexpensive >.

For step S204, when the evaluation object has a idiom modification, a centering structure associated with the idiom may be determined, and a non-feature modification word in the centering structure may be added to the evaluation object to supplement the evaluation object.

Similarly, in the rules (2) and (3), "market low-quiz status of the last year in the quarter" is an example, wherein "low-quiz" is a feature word, the evaluation object "status" includes 3 centering structures, respectively (status, last year), (status, market) and (status, low-quiz), and "last year" constitutes a centering structure with the last year (last year ), so the expanded evaluation object-feature word pair is < last year market status (non-feature modifier), low-quiz >.

Further, in step S203, after the parallel evaluation targets are determined, the evaluation targets may be expanded in step S204. Referring to the example "company's important advantage is inexpensive electricity and labor", since "electricity" does not form a centering structure with other words, no subsequent expansion is required.

The method provided by the embodiment provides an idea of expanding the evaluation object so as to reduce the situation of losing the evaluation object or incomplete semantics, fully considers the phrase structural characteristics, improves the accuracy of acquiring the evaluation object, and further improves the accuracy of text feature analysis.

Referring to fig. 3, a main flow chart of another alternative text feature analysis method provided by the embodiment of the present invention is shown, including the following steps:

s301: and acquiring feature words in the target text, and determining grammar structures of the feature words and words in the target text.

S302: and determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule.

S303: when there is a parallel word or a interlocking word corresponding to the feature word, the parallel word or the interlocking word is determined to be the first feature word, and the evaluation object is determined to be a seventh evaluation object corresponding to the first feature word.

S304: and calculating the characteristic values of the characteristic words and the evaluation objects, and the characteristic values of the first characteristic words and the seventh evaluation objects to obtain the characteristic values of the target text, and determining the characteristics of the target text.

In the above embodiment, for the steps S301 and S302, reference may be made to the descriptions of the steps S101 and S102 shown in fig. 1, and the descriptions are omitted here.

The feature words may be present in the parallel structure and the linkage structure in addition to the sentence components (predicates, stationary words, idioms, etc.) in the sentence in the target text, that is, the feature words are linked with predicates, objects, stationary words, idioms, or complements in the sentence by the parallel structure or the linkage structure.

In the above embodiment, when the feature word appears at the modifier position in the parallel structure or the interlocked structure, the evaluation target is the evaluation target of the core word in the parallel structure or the interlocked structure in step S303.

It should be noted that, the core words in the parallel structure or the linkage structure may be feature words or not: in the case of the feature word, the evaluation object may be determined by the rule in step S102 shown in fig. 1; in the case of a non-feature word, the evaluation target of the non-feature core word may be determined according to the rules of steps S203 and S204 shown in fig. 2. However, when the feature word has a plurality of parallel structures or linkage structures, it is necessary to identify a core word in the parallel structures or linkage structures until the core word is a predicate, object, fixed, object, or complement in the sentence, and then identify an evaluation target by the core word.

Taking "the size of the property of the industry is greatly reduced, 33.86% is reduced in the last quarter" as an example:

according to rule (1) in step S102 shown in fig. 1, "scale" and "drop" are the main-predicate relations, and < scale, drop > is extracted; according to the rule of step S303, "decrease" and "decrease" are feature words and are in parallel relationship, there is < scale, decrease >.

Further, after determining the first feature word and the corresponding seventh evaluation object, the seventh evaluation object may be further expanded, for example, according to the rule of step S204 shown in fig. 2, the "peer" and the "asset", "asset" and "scale" are all in a centering relationship, so that the "peer asset scale" is expanded, and then the < peer asset scale is extracted, and the >.

The method provided by the embodiment provides an evaluation object-feature word extraction thought based on a feature word parallel structure or a linkage structure, expands the extraction range of the feature word and improves the accuracy of text feature determination.

Referring to fig. 4, there is shown a main flow chart of a method for still another alternative text feature analysis provided by an embodiment of the present invention, including the steps of:

s401: and acquiring feature words in the target text, analyzing part-of-speech features of each word in the target text, and acquiring nouns with the distance from the feature words within a preset distance range as eighth evaluation objects.

S402: and calculating the characteristic values of the characteristic words and the eighth evaluation object to obtain the characteristic value of the target text, and determining the characteristic of the target text.

In the above embodiment, for step S401, after performing word segmentation processing on the target text by using the word segmentation device, part-of-speech tagging may also be performed on the target text by combining the parts-of-speech of each word in the word stock; wherein the parts of speech include, but are not limited to, verbs, nouns, adjectives, adverbs, and the invention is not limited thereto.

If a noun exists in a predetermined distance range of the feature word, the noun is extracted as an evaluation object of the feature word. Considering that the distance between nouns and feature words has a certain relation, when the distance between nouns and feature words is far, the feature value is smaller. Therefore, a distance threshold value between the noun and the feature word may be preset, and a corresponding distance range may be determined according to the distance threshold value, for example, the distance may be set to 1; the noun nearest to the feature word may be set, and the noun may be located before the feature word or may be located after the feature word.

After determining the nouns associated with the feature words, the obtained nouns may be expanded according to steps S203 and S204 shown in fig. 2, or the feature words may be expanded according to step S303 shown in fig. 3, so as to improve the accuracy of determining the features of the target text.

Taking the example of "the solar photovoltaic power generation market demand steadily increases at present and the long-term development prospect looks good" in the rule (1) of step S102 shown in fig. 1, wherein the term closest to the feature word "increase" is "demand", and the < demand, increase > is extracted according to the rule of step S401. In addition, the "market" and the "demand" are in parallel relation, and < market demand (evaluation target of predicate) is extracted and grown > at this time according to the rule of step S203 shown in fig. 2.

For step S402, the feature values of the nouns and feature words obtained by calculation are just needed, and the rest can be referred to the description of step S103 shown in fig. 1, which is not repeated here.

The method provided by the embodiment provides a way for extracting the evaluation object based on the part-of-speech features, and the words with far distance from the feature words and small feature values can be prevented from being extracted by setting the distance range, so that the coverage range of the evaluation object is improved, and the accuracy of determining the features of the target text is ensured.

Referring to fig. 5, a main flowchart of a method for still another optional text feature analysis according to an embodiment of the present invention is shown, including the following steps:

s501: and acquiring feature words in the target text, and determining grammar structures of the feature words and words in the target text.

S502: and determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule.

S503: and calculating the similarity between the evaluation object and each preset representative object, determining the representative object with the similarity exceeding a preset similarity threshold value, and replacing the evaluation object with the determined representative object.

S504: and calculating the characteristic words and the determined characteristic values representing the objects to obtain the characteristic values of the target text, and obtaining the characteristics of the target text.

In the above embodiment, for step S501, reference is made to the description of step S101 shown in fig. 1; step S502 may refer to the description of step S102 shown in fig. 1, and may refer to the descriptions of fig. 2 to 4, which are not repeated here.

In the above embodiment, for step S503, for the target text with more feature words, the obtained evaluation objects are also more, and in order to reduce the processing workload, a clustering algorithm, for example, single-pass (text clustering algorithm) and LDA (Latent Dirichlet Allocation, topic model algorithm) may be used to effectively perform online clustering on the evaluation objects, and a specific processing manner may be as follows:

(1) Obtaining a representative object library, wherein each representative object corresponds to a class cluster; wherein, the representative object can be one of evaluation objects with similar semantics, such as sweater, skirt and shorts, and 'sweater' can be selected as the representative object; the aggregate object may be configured to be a representative object, such as sweater, skirt, and shorts, according to the actual meaning of the semantically similar evaluation object, and the representative object is "clothes";

(2) Performing similarity calculation on the obtained evaluation object and representative objects under all class clusters, for example, cosine similarity and Jaccard Index algorithm;

(3) Screening out a class cluster with highest similarity, or setting a preset similarity threshold value, judging whether the calculated similarity is larger than the preset similarity threshold value, and classifying the evaluation object into the class cluster if the calculated similarity is larger than or equal to the preset similarity threshold value; if the evaluation object is smaller than the new class cluster, a new class cluster is required to be determined, and the evaluation object is determined to be a representative object of the new class cluster.

For example, < sweater, beautiful >, < skirt, like >, < shorts, favorites >, wherein the evaluation object is sweater, skirt, shorts. If the representative object is one of the evaluation objects having similar semantics, for example, "sweater", the correspondence between the feature word and the evaluation object becomes: < sweater beautiful >, < sweater, like >, < sweater, preference >. If the representative object is an aggregate object, for example, "clothes", the correspondence relationship between the feature word and the evaluation object becomes < clothes, beautiful >, < clothes, like >, < clothes, preference >.

In addition, the representative object may be pre-established; the target text can also be extracted according to the feature words; after the evaluation objects are classified into class clusters with the similarity exceeding a predetermined similarity threshold, counting the occurrence times of each evaluation object in the class clusters, and selecting the evaluation object with the largest occurrence time as a representative object.

With step S303, after determining a representative object corresponding to the evaluation object or determining a representative object in each class cluster, it is necessary to replace all the evaluation objects in the class cluster with the representative object when calculating the text feature.

And matching based on the representative object and the feature words, and acquiring the feature value of each sentence aiming at the representative object in the target text. For example, < sweater, beautiful >, < skirt, like >, < shorts, preference >, based on the fact that the representative object is < clothes, beautiful >, < clothes, like >, < clothes, preference >, the characteristic value of "clothes" in the text is calculated at this time.

However, when there is no cluster or representative object corresponding to the evaluation object, the evaluation object is the representative object at this time, for example, < sweater, beautiful >, and the feature value of "sweater" in the text is calculated at this time.

Further, in order to reduce the calculation workload, the evaluation objects in the class clusters can be ranked from large to small, the first n class clusters are selected, and only the characteristic value of each representative object in the class clusters in the target text is calculated.

The method provided by the embodiment provides a concept of replacing the evaluation object with the representative object, reduces the processing workload of subsequent calculation of the feature value of the < evaluation object and the feature word >, and improves the accuracy of acquiring the target text feature.

The method provided by the embodiment of the invention provides an extraction idea of < evaluation object, feature word > based on the combination of syntactic analysis and semantic role marking, and fully considers the structural features and semantic integrity of the phrase evaluation object according to the syntactic structure of the feature word and each word, and improves the coverage range and acquisition accuracy of the evaluation object by combining the expansion of the evaluation object and the feature word, and further improves the accuracy of determining the target text feature based on the calculation of the feature score weighted by the text position.

Referring to fig. 6, a schematic diagram of main modules of an apparatus 600 for text feature analysis according to an embodiment of the present invention is shown, including:

the grammar structure determining module 601 is configured to obtain feature words in a target text, and determine grammar structures between the feature words and words in the target text;

the evaluation object determining module 602 is configured to determine an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset extraction rule of the evaluation object grammar structure;

and a text feature determining module 603, configured to calculate feature values of the feature words and the evaluation object, obtain feature values of the target text, and determine features of the target text.

In the apparatus provided by the embodiment of the present invention, the evaluation object determining module 602 is configured to:

The device provided by the embodiment of the invention further comprises an evaluation object expansion module 604, which is used for:

when a parallel word corresponding to the evaluation object exists, determining the parallel word as a fifth evaluation object; or (b)

The device provided by the embodiment of the invention further comprises a feature word expansion module 605, which is used for:

When parallel words or interlocking words corresponding to the feature words exist, determining the parallel words or the interlocking words as first feature words, and determining the evaluation object as a seventh evaluation object corresponding to the first feature words;

the text feature determining module is used for:

and calculating the characteristic values of the characteristic words and the evaluation objects, and the characteristic values of the first characteristic words and the seventh evaluation objects to obtain the characteristic values of the target text, and determining the characteristics of the target text.

The apparatus provided by the embodiment of the present invention further includes a part-of-speech feature analysis module 606, configured to:

and analyzing part-of-speech features of each word in the target text, and acquiring nouns with the distance from the feature words within a preset distance range as eighth evaluation objects.

In the apparatus provided by the embodiment of the present invention, the text feature determining module 603 is configured to:

calculating the similarity between the evaluation object and each preset representative object, determining the representative object with the similarity exceeding a preset similarity threshold value, and replacing the evaluation object as the determined representative object;

and calculating the characteristic words and the determined characteristic values representing the objects to obtain the characteristic values of the target text, and obtaining the characteristics of the target text.

In the device provided by the embodiment of the invention, the characteristic words are emotion words, and the characteristic values are emotion values.

The device provided by the embodiment of the invention provides an extraction idea of < evaluation object, feature word > based on the combination of syntactic analysis and semantic role marking, and the structural feature and semantic integrity of the phrase evaluation object are fully considered according to the syntactic structure of the feature word and each word, so that the coverage range and the acquisition accuracy of the evaluation object are improved by combining the expansion of the evaluation object and the feature word, and the accuracy of determining the target text feature is further improved by calculating the feature score based on the text position weighting.

Referring to fig. 7, an exemplary system architecture 700 to which the text feature analysis method or text feature analysis apparatus of embodiments of the present invention may be applied is shown.

As shown in fig. 7, a system architecture 700 may include terminal devices 701, 702, 703, a network 704, and a server 705. The network 704 is the medium used to provide communication links between the terminal devices 701, 702, 703 and the server 705. The network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 705 via the network 704 using the terminal devices 701, 702, 703 to receive or send messages or the like. Various communication client applications such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social system software, etc., may be installed on the terminal devices 701, 702, 703, as just examples.

The terminal devices 701, 702, 703 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 705 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using the terminal devices 701, 702, 703. The background management server may analyze and process the received data such as the product information query request, and feedback the processing result (e.g., the target push information, the product information—only an example) to the terminal device.

It should be noted that, the method for arranging objects in a list provided by the embodiment of the present invention is generally performed by the server 705, and accordingly, the device for arranging objects in the list is generally disposed in the server 705.

It should be understood that the number of terminal devices, networks and servers in fig. 7 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring to fig. 8, there is shown a schematic diagram of a computer system 800 suitable for use in implementing an embodiment of the invention. The terminal device shown in fig. 8 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.

As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU) 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

The following components are connected to the I/O interface 805: an entry section 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 801.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes a grammar structure determination module, an evaluation object determination module, and a text feature determination module. The names of these modules do not constitute limitations on the module itself in some cases, and for example, the grammar structure determination module may also be described as a "feature word and grammar structure determination module".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include:

acquiring feature words in a target text, and determining grammar structures between the feature words and words in the target text;

determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule;

and calculating the characteristic values of the characteristic words and the evaluation object to obtain the characteristic value of the target text, and determining the characteristic of the target text.

According to the technical scheme of the embodiment of the invention, the structural characteristics and semantic integrity of the phrase evaluation object are fully considered according to the grammar structure between the feature words and each word, and the coverage range and the acquisition accuracy of the evaluation object are improved and the accuracy of the target text characteristics is determined by combining the expansion of the evaluation object and the feature words.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method of text feature analysis, comprising:

determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule; when the feature words serve as predicates in sentences, the evaluation objects of the feature words are words decorated by the predicates; when the feature words serve as non-predicate components in the sentence, the predicate associated with the feature words needs to be determined first, and then an evaluation object is determined according to the predicate;

when a parallel word corresponding to the evaluation object exists, determining the parallel word as a fifth evaluation object; or when a fixed language corresponding to the evaluation object exists, adding a non-characteristic word corresponding to the fixed language to the evaluation object to generate a sixth evaluation object;

Determining weights according to the importance degrees of the obtained evaluation objects at different positions in the target text, calculating the feature values of the feature words and the obtained evaluation objects, obtaining the feature values of the target text, and determining the features of the target text; wherein the obtained evaluation object is one of the evaluation object and the fifth evaluation object, the evaluation object and the sixth evaluation object, the evaluation object and the fifth evaluation object and the sixth evaluation object.

2. The method according to claim 1, wherein the determining, according to the grammar structure and a preset evaluation object grammar structure extraction rule, an evaluation object corresponding to the feature word in the target text includes:

3. The method according to claim 1, wherein after determining the evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset evaluation object grammar structure extraction rule, the method further comprises:

calculating the feature value of the feature word and the obtained evaluation object to obtain the feature value of the target text, wherein determining the feature of the target text comprises the following steps:

and calculating the characteristic values of the characteristic words and the obtained evaluation objects, and the characteristic values of the first characteristic words and the seventh evaluation objects to obtain the characteristic values of the target text, and determining the characteristics of the target text.

4. The method as recited in claim 1, further comprising:

5. The method of claim 1, wherein the calculating the feature value of the feature word and the obtained evaluation object to obtain the feature value of the target text, and determining the feature of the target text comprises:

calculating the similarity between the obtained evaluation object and each preset representative object, determining the representative object with the similarity exceeding a preset similarity threshold value, and replacing the obtained evaluation object with the determined representative object;

6. The method of any one of claims 1-5, wherein the feature word is an affective word and the feature value is an affective value.

7. An apparatus for text feature analysis, comprising:

the grammar structure determining module is used for obtaining feature words in the target text and determining grammar structures between the feature words and words in the target text;

the evaluation object determining module is used for determining an evaluation object corresponding to the feature word in the target text according to the grammar structure and a preset extraction rule of the evaluation object grammar structure;

The evaluation object expansion module is used for determining that the parallel words are fifth evaluation objects when the parallel words corresponding to the evaluation objects exist; or when a fixed language corresponding to the evaluation object exists, adding a non-characteristic word corresponding to the fixed language to the evaluation object to generate a sixth evaluation object;

the text feature determining module is used for determining weights according to the importance degrees of the obtained evaluation objects at different positions in the target text, calculating the feature values of the feature words and the obtained evaluation objects, obtaining the feature values of the target text, and determining the features of the target text; wherein the obtained evaluation object is one of the evaluation object and the fifth evaluation object, the evaluation object and the sixth evaluation object, the evaluation object and the fifth evaluation object and the sixth evaluation object.

8. The apparatus of claim 7, wherein the evaluation object determination module is configured to:

9. The apparatus of claim 7, further comprising a feature word expansion module configured to:

the text feature determining module is used for:

10. The apparatus of claim 7, further comprising a part-of-speech feature analysis module to:

11. The apparatus of claim 7, wherein the text feature determination module is configured to:

12. The apparatus according to any one of claims 7 to 11, wherein the feature words are emotion words and the feature values are emotion values.

13. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.

14. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.