CN108376134A - A kind of electric business online comment text word analysis method based on position-order statistics - Google Patents
A kind of electric business online comment text word analysis method based on position-order statistics Download PDFInfo
- Publication number
- CN108376134A CN108376134A CN201810355960.8A CN201810355960A CN108376134A CN 108376134 A CN108376134 A CN 108376134A CN 201810355960 A CN201810355960 A CN 201810355960A CN 108376134 A CN108376134 A CN 108376134A
- Authority
- CN
- China
- Prior art keywords
- word
- text
- target word
- electric business
- presult
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0623—Item investigation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0639—Item locations
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
The invention belongs to technical field of information processing, and in particular to a kind of electric business online comment text word analysis method based on position-order statistics.The technical problem to be solved by the present invention is to propose a kind of electric business online comment text word analysis method counted based on position-order, by analyzing word appearance sequence and its meaning represented in the text, it solves the problems, such as to ignore word position sequence in electric business online comment text analyzing, the present invention has the effect of the importance for helping electric business enterprise and customer to examine topic word closely again.
Description
Technical field
The invention belongs to technical field of information processing, and in particular to a kind of electric business online comment text based on position-order statistics
This word analysis method.
Background technology
Electric business online comment is a kind of text data, certain information resources can be obtained after text participle statistics, for electricity
Quotient's enterprise management decision-making and customer's purchase decision provide data and support.
After existing electric business online comment text participle statistics, considers word importance according only to word frequency quantity, ignore word
The meaning of appearance sequence and its representative in the text.This, which is ignored, causes topic importance in electric business online comment text analyzing to believe
The loss of breath, also easily causes incorrect decision.
Therefore, a kind of electric business online comment text word analysis method based on position-order statistics is invented, is current electric business
Field urgent problem.
Invention content
According to the above-mentioned deficiencies of the prior art, a kind of based on position-order system the technical problem to be solved by the present invention is to propose
The electric business online comment text word analysis method of meter, by analyzing word appearance sequence and its meaning represented in the text,
Solve the problems, such as to ignore word position sequence in electric business online comment text analyzing have and electric business enterprise and customer is helped to examine again
Speech reading writes inscription the effect of the importance of language.
In order to solve the above-mentioned technical problem, the technical solution adopted by the present invention is:
A kind of electric business online comment text word analysis method based on position-order statistics, includes the following steps:
Step S1:Electric business online comment text segments, and carries out statistics screening to word frequency, therefrom selects target word
Language collection;
Step S2:Computer obtains target word and the position of each target word is concentrated to record number, and it is total to count text
Number of words;There is position according to first and is recorded in the target word wherein repeatedly occur, and the meter note of the word does not occur
Record 0;
Step S3:Statistical model is established, calculates target word position sequence presult values, the position-order refers to target word
The sequence of positions that language occurs in the text;
Step S4:Text word judges and decision compares target according to the position-order result presult values of target word
Word position;
Step S5:Text word is analyzed and is judged, binding needs or management method carries out judgement and decision.
Preferably, the position-order presult values usePublicity calculates, wherein
Presult is the combination of particular expression, i.e. position and result;X simultaneouslyiFor position of the target word in each text
Set the first number of record, if target word repeatedly occur in some text be only with first first number of the position occurred record
As a result;When calculating, n is text sentence sum, using text sum n as denominator.Since target word is after text participle
Word screens, therefore presult values are non-zero value.
Preferably, the presult values use percentage, retain two-decimal according to rounding up.
Present invention has the advantages that:The present invention, which solves in electric business online comment text analyzing, ignores asking for word position sequence
Topic, it can be found that word sequence and influence power factor, have good pattern in the analysis of a large amount of electric business online comment text datas
It was found that with information excavating feature, electric business enterprise and customer can be effectively helped to examine the importance of topic word closely again, to carry
Rise enterprise operation and management ability and customer's purchase decision ability.
Description of the drawings
The content expressed by this specification attached drawing and the label in figure are briefly described below:
Fig. 1 is the method flow diagram of the specific implementation mode of the present invention.
Specific implementation mode
Below by the description to embodiment, the shape of for example involved each component of specific implementation mode of the invention, structure
It makes, the mutual alignment between each section and connection relation, the effect of each section and operation principle, manufacturing process and the side of operating with
Method etc., is described in further detail, completeer to help those skilled in the art to have the inventive concept of the present invention, technical solution
Whole, accurate and deep understanding.
A kind of electric business online comment text word analysis method based on position-order statistics, as shown in Figure 1, including:
Step S1:Electric business online comment text segments, and carries out statistics screening to word frequency, therefrom selects target word
Language collection.
Step S2:Computer obtains target word and the position of each target word is concentrated to record number, and it is total to count text
Number of words.There is position according to first and is recorded in the target word wherein repeatedly occur, and the meter note of the word does not occur
Record 0.
Step S3:Statistical model is established, target word position sequence presult values are calculated.Position-order refers to that target word exists
The sequence of positions occurred in text.Position-order presult values usePublicity calculates, wherein
Presult is the combination of particular expression, i.e. position and result;X simultaneouslyiFor position of the target word in each text
Set the first number of record, if target word repeatedly occur in some text be only with first first number of the position occurred record
As a result.When calculating, n is text sentence sum, using text sum n as denominator, mainly considers to eliminate each target word
Word frequency is different and to overall influence, thus is used as denominator without target word words and phrases frequency;Since target word is from text point
Word screens after word, therefore presult values are non-zero value;Presult values use percentage, retain two-decimal according to rounding up.
Step S4:Text word judges and decision compares target according to the position-order result presult values of target word
Word position.
Step S5:Text word is analyzed and is judged, binding needs or management method carries out judgement and decision.
Following five electric business online comment texts are the implementation example of this method,
Example sentence 1:Packaging is pretty good, but this mango does not have mango taste, just packages.
Example sentence 2:Dotey's packaging is fine, and power is also given in logistics very much.As the amount than buying before is few, but taste is still very
It praises!
Example sentence 3:Well, power is also given in express delivery very much, and boss's attitude is also in need very well to be come again.
Example sentence 4:Two bags are eaten, green fruit is delicious, my cousin also says delicious!Packaging is also fine.
Example sentence 5:It eats good!
Step S1:Electric business online comment text segments, and carries out statistics screening to word frequency, therefrom selects target word
Language collection.Chinese word segmentation (only retaining the single word removal such as word, such as " eating ") result is as follows:
Example sentence 1:Packaging/good/mango/has no/mango/packaging
Example sentence 2:Dotey/packaging/logistics/as/before/but/taste/still/praise very much/
Example sentence 3:Well/express delivery/boss/attitude/needs/
Example sentence 4:Two bags/green fruit/taste/excellent/cousin/taste/excellent/packaging/
Example sentence 5:Well/
Exemplary word word frequency statistics result is as follows:
Word | Occurrence number |
Packaging | 4 |
Well | 3 |
Taste | 3 |
Mango | 2 |
It is excellent | 2 |
But | 1 |
Express delivery | 1 |
Still | 1 |
It needs | 1 |
Only | 1 |
Logistics | 1 |
Attitude | 1 |
Boss | 1 |
It has no | 1 |
Green fruit | 1 |
Cousin | 1 |
It praises very much | 1 |
Two bags | 1 |
Seem | 1 |
Before | 1 |
Dotey | 1 |
Sweet taste | 1 |
Select target word collection
{ packaging, taste }
Step S2:Computer obtains target word and the position of each target word is concentrated to record number, and it is total to count text
Number of words.There is position according to first and is recorded in the target word wherein repeatedly occur, and the meter note of the word does not occur
Record 0.
Position record number and text total number of word of the exemplary target word collection in example sentence:
Example sentence 1:Packaging:1 (position of " packet " word), 2 (positions of " dress " word);Taste:0,0;Total number of word:18
Example sentence 2:Packaging:3,4;Taste:25,26;Total number of word:30
Example sentence 3:Packaging:0,0;Taste:0,0;Total number of word:30
Example sentence 4:Packaging:23,24;Taste:8,9;Total number of word:27
Example sentence 5:Packaging:0,0;Taste:0,0;Total number of word:5
Step S3:Statistical model is established, target word position sequence presult values are calculated.Position-order refers to that target word exists
The sequence of positions occurred in text.Position-order presult values usePublicity calculates, wherein
Presult is the combination of particular expression, i.e. position and result;X simultaneouslyiFor position of the target word in each text
Set the first number of record, if target word repeatedly occur in some text be only with first first number of the position occurred record
As a result, in such as model sentence 1 " packaging " x1=1.When calculating, n is text sentence sum, using text sum n as dividing
Mother mainly considers to eliminate each target word words and phrases frequency difference and make to overall influence, thus without target word words and phrases frequency
For denominator;Since word screens target word after text participle, therefore presult values are non-zero value;Presult values use hundred
Score retains two-decimal according to rounding up.
Such as the presult values calculating of target word collection is as follows
The position-order of " packaging "
Similarly calculate the position-order presult=22.59% of target word " taste "
Step S5:Text word is analyzed and is judged, binding needs or management method carries out judgement and decision.
According to the principle of text topic " being first mentioned preferential important ", to target word concentration " packaging " and " taste "
Speech, presult values are smaller more " being first mentioned ", therefore, it is considered that the position-order of " packaging " is located further forward, in electric business online comment more
It is concerned about.
The present invention is exemplarily described above, it is clear that present invention specific implementation is not subject to the restrictions described above,
As long as using the improvement of the various unsubstantialities of inventive concept and technical scheme of the present invention progress, or not improved this is sent out
Bright design and technical solution directly applies to other occasions, within protection scope of the present invention.The protection of the present invention
Range should be determined by the scope of protection defined in the claims.
Claims (3)
1. a kind of electric business online comment text word analysis method based on position-order statistics, which is characterized in that including walking as follows
Suddenly:
Step S1:Electric business online comment text segments, and carries out statistics screening to word frequency, therefrom selects target word
Collection;
Step S2:Computer obtains target word and the position of each target word is concentrated to record number, and counts text total number of word;
There is position according to first and is recorded in the target word wherein repeatedly occur, and the meter record 0 of the word does not occur;
Step S3:Statistical model is established, calculates target word position sequence presult values, the position-order refers to that target word exists
The sequence of positions occurred in text;
Step S4:Text word judges and decision compares target word according to the position-order result presult values of target word
Position;
Step S5:Text word is analyzed and is judged, binding needs or management method carries out judgement and decision.
2. the electric business online comment text word analysis method according to claim 1 based on position-order statistics, feature
It is, the position-order presult values usePublicity calculates, and wherein presult is specific
Expression, the i.e. combination of position and result;X simultaneouslyiThe first number of the position for being target word in each text record,
If target word repeatedly occurs in some text only with first first number of the position occurred record for result;It is calculating
When, n is text sentence sum, using text sum n as denominator;Wherein, due to target word, word sieves after text participle
Choosing, therefore presult values are non-zero value.
3. the electric business online comment text word analysis method according to claim 2 based on position-order statistics, feature
It is, the presult values use percentage, retain two-decimal according to rounding up.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810355960.8A CN108376134A (en) | 2018-04-19 | 2018-04-19 | A kind of electric business online comment text word analysis method based on position-order statistics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810355960.8A CN108376134A (en) | 2018-04-19 | 2018-04-19 | A kind of electric business online comment text word analysis method based on position-order statistics |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108376134A true CN108376134A (en) | 2018-08-07 |
Family
ID=63032352
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810355960.8A Pending CN108376134A (en) | 2018-04-19 | 2018-04-19 | A kind of electric business online comment text word analysis method based on position-order statistics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108376134A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110969025A (en) * | 2019-11-19 | 2020-04-07 | 维沃移动通信有限公司 | Text comment method and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105740229A (en) * | 2016-01-26 | 2016-07-06 | 中国人民解放军国防科学技术大学 | Keyword extraction method and device |
US20160314191A1 (en) * | 2015-04-24 | 2016-10-27 | Linkedin Corporation | Topic extraction using clause segmentation and high-frequency words |
CN107357779A (en) * | 2017-06-27 | 2017-11-17 | 北京神州泰岳软件股份有限公司 | A kind of method and device for obtaining organization names |
CN107748743A (en) * | 2017-09-20 | 2018-03-02 | 安徽商贸职业技术学院 | A kind of electric business online comment text emotion analysis method |
-
2018
- 2018-04-19 CN CN201810355960.8A patent/CN108376134A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160314191A1 (en) * | 2015-04-24 | 2016-10-27 | Linkedin Corporation | Topic extraction using clause segmentation and high-frequency words |
CN105740229A (en) * | 2016-01-26 | 2016-07-06 | 中国人民解放军国防科学技术大学 | Keyword extraction method and device |
CN107357779A (en) * | 2017-06-27 | 2017-11-17 | 北京神州泰岳软件股份有限公司 | A kind of method and device for obtaining organization names |
CN107748743A (en) * | 2017-09-20 | 2018-03-02 | 安徽商贸职业技术学院 | A kind of electric business online comment text emotion analysis method |
Non-Patent Citations (1)
Title |
---|
郭建波,等: "基于多特征的关键词抽取算法", 《合肥工业大学学报(自然科学版)》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110969025A (en) * | 2019-11-19 | 2020-04-07 | 维沃移动通信有限公司 | Text comment method and electronic equipment |
CN110969025B (en) * | 2019-11-19 | 2024-01-23 | 维沃移动通信有限公司 | Text comment method and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108491377A (en) | A kind of electric business product comprehensive score method based on multi-dimension information fusion | |
WO2019214236A1 (en) | User-generated content summary determining and user-generated content recommending | |
US11880382B2 (en) | Systems and methods for generating tables from print-ready digital source documents | |
CN109388712A (en) | A kind of trade classification method and terminal device based on machine learning | |
CN104035968B (en) | The construction method and device of training corpus collection based on social networks | |
CN104881458B (en) | A kind of mask method and device of Web page subject | |
CN111666761B (en) | Fine-grained emotion analysis model training method and device | |
CN109522412B (en) | Text emotion analysis method, device and medium | |
CN108388660A (en) | A kind of improved electric business product pain spot analysis method | |
CN102929860B (en) | Chinese clause emotion polarity distinguishing method based on context | |
CN107688630B (en) | Semantic-based weakly supervised microbo multi-emotion dictionary expansion method | |
CN109858034A (en) | A kind of text sentiment classification method based on attention model and sentiment dictionary | |
CN104850617A (en) | Short text processing method and apparatus | |
CN109508373A (en) | Calculation method, equipment and the computer readable storage medium of enterprise's public opinion index | |
Singh et al. | Sentiment analysis of Twitter data using TF-IDF and machine learning techniques | |
CN108170685B (en) | Text emotion analysis method and device and computer readable storage medium | |
CN105279148A (en) | User review consistency judgment method of APP (Application) software | |
CN110134799A (en) | A kind of text corpus based on BM25 algorithm build and optimization method | |
CN113360647A (en) | 5G mobile service complaint source-tracing analysis method based on clustering | |
JP7221526B2 (en) | Analysis method, analysis device and analysis program | |
CN108376134A (en) | A kind of electric business online comment text word analysis method based on position-order statistics | |
CN106227720B (en) | A kind of APP software users comment mode identification method | |
CN108491390A (en) | A kind of main line logistics goods title automatic recognition classification method | |
CN106294689B (en) | A kind of method and apparatus for selecting to carry out dimensionality reduction based on text category feature | |
CN105574530B (en) | The method and apparatus for extracting the line of text in document |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180807 |