CN108376134A - A kind of electric business online comment text word analysis method based on position-order statistics - Google Patents

A kind of electric business online comment text word analysis method based on position-order statistics Download PDF

Info

Publication number
CN108376134A
CN108376134A CN201810355960.8A CN201810355960A CN108376134A CN 108376134 A CN108376134 A CN 108376134A CN 201810355960 A CN201810355960 A CN 201810355960A CN 108376134 A CN108376134 A CN 108376134A
Authority
CN
China
Prior art keywords
word
text
target word
electric business
presult
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810355960.8A
Other languages
Chinese (zh)
Inventor
刘玉林
王召义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Business College
Original Assignee
Anhui Business College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Business College filed Critical Anhui Business College
Priority to CN201810355960.8A priority Critical patent/CN108376134A/en
Publication of CN108376134A publication Critical patent/CN108376134A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0639Item locations

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to technical field of information processing, and in particular to a kind of electric business online comment text word analysis method based on position-order statistics.The technical problem to be solved by the present invention is to propose a kind of electric business online comment text word analysis method counted based on position-order, by analyzing word appearance sequence and its meaning represented in the text, it solves the problems, such as to ignore word position sequence in electric business online comment text analyzing, the present invention has the effect of the importance for helping electric business enterprise and customer to examine topic word closely again.

Description

A kind of electric business online comment text word analysis method based on position-order statistics
Technical field
The invention belongs to technical field of information processing, and in particular to a kind of electric business online comment text based on position-order statistics This word analysis method.
Background technology
Electric business online comment is a kind of text data, certain information resources can be obtained after text participle statistics, for electricity Quotient's enterprise management decision-making and customer's purchase decision provide data and support.
After existing electric business online comment text participle statistics, considers word importance according only to word frequency quantity, ignore word The meaning of appearance sequence and its representative in the text.This, which is ignored, causes topic importance in electric business online comment text analyzing to believe The loss of breath, also easily causes incorrect decision.
Therefore, a kind of electric business online comment text word analysis method based on position-order statistics is invented, is current electric business Field urgent problem.
Invention content
According to the above-mentioned deficiencies of the prior art, a kind of based on position-order system the technical problem to be solved by the present invention is to propose The electric business online comment text word analysis method of meter, by analyzing word appearance sequence and its meaning represented in the text, Solve the problems, such as to ignore word position sequence in electric business online comment text analyzing have and electric business enterprise and customer is helped to examine again Speech reading writes inscription the effect of the importance of language.
In order to solve the above-mentioned technical problem, the technical solution adopted by the present invention is:
A kind of electric business online comment text word analysis method based on position-order statistics, includes the following steps:
Step S1:Electric business online comment text segments, and carries out statistics screening to word frequency, therefrom selects target word Language collection;
Step S2:Computer obtains target word and the position of each target word is concentrated to record number, and it is total to count text Number of words;There is position according to first and is recorded in the target word wherein repeatedly occur, and the meter note of the word does not occur Record 0;
Step S3:Statistical model is established, calculates target word position sequence presult values, the position-order refers to target word The sequence of positions that language occurs in the text;
Step S4:Text word judges and decision compares target according to the position-order result presult values of target word Word position;
Step S5:Text word is analyzed and is judged, binding needs or management method carries out judgement and decision.
Preferably, the position-order presult values usePublicity calculates, wherein Presult is the combination of particular expression, i.e. position and result;X simultaneouslyiFor position of the target word in each text Set the first number of record, if target word repeatedly occur in some text be only with first first number of the position occurred record As a result;When calculating, n is text sentence sum, using text sum n as denominator.Since target word is after text participle Word screens, therefore presult values are non-zero value.
Preferably, the presult values use percentage, retain two-decimal according to rounding up.
Present invention has the advantages that:The present invention, which solves in electric business online comment text analyzing, ignores asking for word position sequence Topic, it can be found that word sequence and influence power factor, have good pattern in the analysis of a large amount of electric business online comment text datas It was found that with information excavating feature, electric business enterprise and customer can be effectively helped to examine the importance of topic word closely again, to carry Rise enterprise operation and management ability and customer's purchase decision ability.
Description of the drawings
The content expressed by this specification attached drawing and the label in figure are briefly described below:
Fig. 1 is the method flow diagram of the specific implementation mode of the present invention.
Specific implementation mode
Below by the description to embodiment, the shape of for example involved each component of specific implementation mode of the invention, structure It makes, the mutual alignment between each section and connection relation, the effect of each section and operation principle, manufacturing process and the side of operating with Method etc., is described in further detail, completeer to help those skilled in the art to have the inventive concept of the present invention, technical solution Whole, accurate and deep understanding.
A kind of electric business online comment text word analysis method based on position-order statistics, as shown in Figure 1, including:
Step S1:Electric business online comment text segments, and carries out statistics screening to word frequency, therefrom selects target word Language collection.
Step S2:Computer obtains target word and the position of each target word is concentrated to record number, and it is total to count text Number of words.There is position according to first and is recorded in the target word wherein repeatedly occur, and the meter note of the word does not occur Record 0.
Step S3:Statistical model is established, target word position sequence presult values are calculated.Position-order refers to that target word exists The sequence of positions occurred in text.Position-order presult values usePublicity calculates, wherein Presult is the combination of particular expression, i.e. position and result;X simultaneouslyiFor position of the target word in each text Set the first number of record, if target word repeatedly occur in some text be only with first first number of the position occurred record As a result.When calculating, n is text sentence sum, using text sum n as denominator, mainly considers to eliminate each target word Word frequency is different and to overall influence, thus is used as denominator without target word words and phrases frequency;Since target word is from text point Word screens after word, therefore presult values are non-zero value;Presult values use percentage, retain two-decimal according to rounding up.
Step S4:Text word judges and decision compares target according to the position-order result presult values of target word Word position.
Step S5:Text word is analyzed and is judged, binding needs or management method carries out judgement and decision.
Following five electric business online comment texts are the implementation example of this method,
Example sentence 1:Packaging is pretty good, but this mango does not have mango taste, just packages.
Example sentence 2:Dotey's packaging is fine, and power is also given in logistics very much.As the amount than buying before is few, but taste is still very It praises!
Example sentence 3:Well, power is also given in express delivery very much, and boss's attitude is also in need very well to be come again.
Example sentence 4:Two bags are eaten, green fruit is delicious, my cousin also says delicious!Packaging is also fine.
Example sentence 5:It eats good!
Step S1:Electric business online comment text segments, and carries out statistics screening to word frequency, therefrom selects target word Language collection.Chinese word segmentation (only retaining the single word removal such as word, such as " eating ") result is as follows:
Example sentence 1:Packaging/good/mango/has no/mango/packaging
Example sentence 2:Dotey/packaging/logistics/as/before/but/taste/still/praise very much/
Example sentence 3:Well/express delivery/boss/attitude/needs/
Example sentence 4:Two bags/green fruit/taste/excellent/cousin/taste/excellent/packaging/
Example sentence 5:Well/
Exemplary word word frequency statistics result is as follows:
Word Occurrence number
Packaging 4
Well 3
Taste 3
Mango 2
It is excellent 2
But 1
Express delivery 1
Still 1
It needs 1
Only 1
Logistics 1
Attitude 1
Boss 1
It has no 1
Green fruit 1
Cousin 1
It praises very much 1
Two bags 1
Seem 1
Before 1
Dotey 1
Sweet taste 1
Select target word collection
{ packaging, taste }
Step S2:Computer obtains target word and the position of each target word is concentrated to record number, and it is total to count text Number of words.There is position according to first and is recorded in the target word wherein repeatedly occur, and the meter note of the word does not occur Record 0.
Position record number and text total number of word of the exemplary target word collection in example sentence:
Example sentence 1:Packaging:1 (position of " packet " word), 2 (positions of " dress " word);Taste:0,0;Total number of word:18
Example sentence 2:Packaging:3,4;Taste:25,26;Total number of word:30
Example sentence 3:Packaging:0,0;Taste:0,0;Total number of word:30
Example sentence 4:Packaging:23,24;Taste:8,9;Total number of word:27
Example sentence 5:Packaging:0,0;Taste:0,0;Total number of word:5
Step S3:Statistical model is established, target word position sequence presult values are calculated.Position-order refers to that target word exists The sequence of positions occurred in text.Position-order presult values usePublicity calculates, wherein Presult is the combination of particular expression, i.e. position and result;X simultaneouslyiFor position of the target word in each text Set the first number of record, if target word repeatedly occur in some text be only with first first number of the position occurred record As a result, in such as model sentence 1 " packaging " x1=1.When calculating, n is text sentence sum, using text sum n as dividing Mother mainly considers to eliminate each target word words and phrases frequency difference and make to overall influence, thus without target word words and phrases frequency For denominator;Since word screens target word after text participle, therefore presult values are non-zero value;Presult values use hundred Score retains two-decimal according to rounding up.
Such as the presult values calculating of target word collection is as follows
The position-order of " packaging "
Similarly calculate the position-order presult=22.59% of target word " taste "
Step S5:Text word is analyzed and is judged, binding needs or management method carries out judgement and decision.
According to the principle of text topic " being first mentioned preferential important ", to target word concentration " packaging " and " taste " Speech, presult values are smaller more " being first mentioned ", therefore, it is considered that the position-order of " packaging " is located further forward, in electric business online comment more It is concerned about.
The present invention is exemplarily described above, it is clear that present invention specific implementation is not subject to the restrictions described above, As long as using the improvement of the various unsubstantialities of inventive concept and technical scheme of the present invention progress, or not improved this is sent out Bright design and technical solution directly applies to other occasions, within protection scope of the present invention.The protection of the present invention Range should be determined by the scope of protection defined in the claims.

Claims (3)

1. a kind of electric business online comment text word analysis method based on position-order statistics, which is characterized in that including walking as follows Suddenly:
Step S1:Electric business online comment text segments, and carries out statistics screening to word frequency, therefrom selects target word Collection;
Step S2:Computer obtains target word and the position of each target word is concentrated to record number, and counts text total number of word; There is position according to first and is recorded in the target word wherein repeatedly occur, and the meter record 0 of the word does not occur;
Step S3:Statistical model is established, calculates target word position sequence presult values, the position-order refers to that target word exists The sequence of positions occurred in text;
Step S4:Text word judges and decision compares target word according to the position-order result presult values of target word Position;
Step S5:Text word is analyzed and is judged, binding needs or management method carries out judgement and decision.
2. the electric business online comment text word analysis method according to claim 1 based on position-order statistics, feature It is, the position-order presult values usePublicity calculates, and wherein presult is specific Expression, the i.e. combination of position and result;X simultaneouslyiThe first number of the position for being target word in each text record, If target word repeatedly occurs in some text only with first first number of the position occurred record for result;It is calculating When, n is text sentence sum, using text sum n as denominator;Wherein, due to target word, word sieves after text participle Choosing, therefore presult values are non-zero value.
3. the electric business online comment text word analysis method according to claim 2 based on position-order statistics, feature It is, the presult values use percentage, retain two-decimal according to rounding up.
CN201810355960.8A 2018-04-19 2018-04-19 A kind of electric business online comment text word analysis method based on position-order statistics Pending CN108376134A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810355960.8A CN108376134A (en) 2018-04-19 2018-04-19 A kind of electric business online comment text word analysis method based on position-order statistics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810355960.8A CN108376134A (en) 2018-04-19 2018-04-19 A kind of electric business online comment text word analysis method based on position-order statistics

Publications (1)

Publication Number Publication Date
CN108376134A true CN108376134A (en) 2018-08-07

Family

ID=63032352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810355960.8A Pending CN108376134A (en) 2018-04-19 2018-04-19 A kind of electric business online comment text word analysis method based on position-order statistics

Country Status (1)

Country Link
CN (1) CN108376134A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110969025A (en) * 2019-11-19 2020-04-07 维沃移动通信有限公司 Text comment method and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740229A (en) * 2016-01-26 2016-07-06 中国人民解放军国防科学技术大学 Keyword extraction method and device
US20160314191A1 (en) * 2015-04-24 2016-10-27 Linkedin Corporation Topic extraction using clause segmentation and high-frequency words
CN107357779A (en) * 2017-06-27 2017-11-17 北京神州泰岳软件股份有限公司 A kind of method and device for obtaining organization names
CN107748743A (en) * 2017-09-20 2018-03-02 安徽商贸职业技术学院 A kind of electric business online comment text emotion analysis method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160314191A1 (en) * 2015-04-24 2016-10-27 Linkedin Corporation Topic extraction using clause segmentation and high-frequency words
CN105740229A (en) * 2016-01-26 2016-07-06 中国人民解放军国防科学技术大学 Keyword extraction method and device
CN107357779A (en) * 2017-06-27 2017-11-17 北京神州泰岳软件股份有限公司 A kind of method and device for obtaining organization names
CN107748743A (en) * 2017-09-20 2018-03-02 安徽商贸职业技术学院 A kind of electric business online comment text emotion analysis method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭建波,等: "基于多特征的关键词抽取算法", 《合肥工业大学学报(自然科学版)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110969025A (en) * 2019-11-19 2020-04-07 维沃移动通信有限公司 Text comment method and electronic equipment
CN110969025B (en) * 2019-11-19 2024-01-23 维沃移动通信有限公司 Text comment method and electronic equipment

Similar Documents

Publication Publication Date Title
CN108491377A (en) A kind of electric business product comprehensive score method based on multi-dimension information fusion
WO2019214236A1 (en) User-generated content summary determining and user-generated content recommending
US11880382B2 (en) Systems and methods for generating tables from print-ready digital source documents
CN109388712A (en) A kind of trade classification method and terminal device based on machine learning
CN104035968B (en) The construction method and device of training corpus collection based on social networks
CN104881458B (en) A kind of mask method and device of Web page subject
CN111666761B (en) Fine-grained emotion analysis model training method and device
CN109522412B (en) Text emotion analysis method, device and medium
CN108388660A (en) A kind of improved electric business product pain spot analysis method
CN102929860B (en) Chinese clause emotion polarity distinguishing method based on context
CN107688630B (en) Semantic-based weakly supervised microbo multi-emotion dictionary expansion method
CN109858034A (en) A kind of text sentiment classification method based on attention model and sentiment dictionary
CN104850617A (en) Short text processing method and apparatus
CN109508373A (en) Calculation method, equipment and the computer readable storage medium of enterprise's public opinion index
Singh et al. Sentiment analysis of Twitter data using TF-IDF and machine learning techniques
CN108170685B (en) Text emotion analysis method and device and computer readable storage medium
CN105279148A (en) User review consistency judgment method of APP (Application) software
CN110134799A (en) A kind of text corpus based on BM25 algorithm build and optimization method
CN113360647A (en) 5G mobile service complaint source-tracing analysis method based on clustering
JP7221526B2 (en) Analysis method, analysis device and analysis program
CN108376134A (en) A kind of electric business online comment text word analysis method based on position-order statistics
CN106227720B (en) A kind of APP software users comment mode identification method
CN108491390A (en) A kind of main line logistics goods title automatic recognition classification method
CN106294689B (en) A kind of method and apparatus for selecting to carry out dimensionality reduction based on text category feature
CN105574530B (en) The method and apparatus for extracting the line of text in document

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180807