CN109918508A - User's portrait generation method based on web crawlers acquisition technique - Google Patents
User's portrait generation method based on web crawlers acquisition technique Download PDFInfo
- Publication number
- CN109918508A CN109918508A CN201910177182.2A CN201910177182A CN109918508A CN 109918508 A CN109918508 A CN 109918508A CN 201910177182 A CN201910177182 A CN 201910177182A CN 109918508 A CN109918508 A CN 109918508A
- Authority
- CN
- China
- Prior art keywords
- user
- sentence
- portrait
- classification
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Abstract
The invention discloses user's portrait generation methods based on web crawlers acquisition technique, comprising the following steps: S1: obtains the keyword and target network address that user specifies;S2: obtaining the data flow of target network address, extracts content of text;S3: content of text is carried out to eliminate the sentence for not including practical significance except making an uproar;S4: the specific category for belonging to user's portrait based on word in expert vocabulary list matches the sentence retained, is categorized into corresponding user's portrait classification;S5: finding similar word based on keyword in expert vocabulary list, and all sentences comprising keyword and its similar word are screened in classification sentence, remove all unmatched sentences;S6: being pushed to Web service end for all matched classification sentences, generates user's portrait painting canvas, and the sentence of each classification is filled into the corresponding classification of painting canvas.
Description
Technical field
The present invention relates to Software Development technical fields, and in particular to user's portrait based on web crawlers acquisition technique
Generation method.
Background technique
User's portrait is that product opens very important tool during experience innovation and design, it can help our shapes
The behavioural characteristic of the understanding target user of elephant, helps us to judge user demand.User, which draws a portrait, to establish deep to real user
Understand, and on the high precisely summary of related data, user's portrait is the virtual representations of real user, it is based on true first
, it is not a specific people, another is that different type is divided into according to the difference of the behavior viewpoint of target, rapid group
It is woven in together, then the type newly obtained is extract, form user's portrait an of type.To each user portrait institute's body
Reveal the minutia description come should be it is true, be built upon that user's interview, focus group, culture is sought including questionnaire
On the actual user data that the qualitative and quantitative studies means such as investigation are collected.
During establishing user's portrait, due to the shortage referring to data, researcher or designers is caused only to be led to
Brainstorming is crossed, meeting is had and discusses, just makes user's portrait, this way wastes should go real user originally
The time of information is collected in there, and causes to generate the deviation that user understands.
Existing generation is drawn a portrait generation technique referring to the user of data, be normally based on mass data (including transaction data,
Social media data etc.) some feature tag extractions are carried out, due to being " signature " of some real users, these labels
The precision marketing being typically used in product sales process, it is more difficult to apply in product design process, also relatively be difficult to be designed
Teacher understands.
Specifically, in the prior art user draw a portrait generation technique the shortcomings that be main by manually obtaining mass data
Afterwards, multiple researchers' meeting brainstorming discussion, then just generate user's portrait.Problem there are two such mode is main, one
It is to be required to peopleware, it is necessary to be that professional just can be carried out this operation;Second is that more people are manually generated after discussing, it is different
The discussion result of personnel has certain error, and user's portrait of output is unstable.
Summary of the invention
To solve the above-mentioned problems, the technical solution adopted by the present invention is as described below:
The present invention provides a kind of user's portrait generation method based on web crawlers acquisition technique, comprising the following steps:
S1: the keyword and target network address that user specifies are obtained;
S2: obtaining the data flow of target network address, extracts content of text;
S3: content of text is carried out to eliminate the sentence for not including practical significance except making an uproar;
S4: the specific category for belonging to user's portrait based on word in expert vocabulary list matches the sentence retained, classifies
Into corresponding user portrait classification;
S5: finding similar word based on keyword in expert vocabulary list, screens in classification sentence all comprising keyword and its phase
Like the sentence of word, remove all unmatched sentences;
S6: being pushed to Web service end for all matched classification sentences, generates user and draws a portrait painting canvas, and by each classification
Sentence be filled into the corresponding classification of painting canvas.
As a kind of optimal technical scheme, content of text refers to, opens target network address, all that can be checked by browser
Text category information, including the text in web page title, menu, text and sidebar.
As a kind of optimal technical scheme, in above-mentioned steps S3, judging sentence, whether significant detailed process is as follows:
S301: subordinate sentence is carried out to text based on space and Chinese punctuate algorithm;
S302: it is scored based on the Chinese syntactic structure in natural language processing the integrity degree of sentence;Include subject and predicate, guest
The sentence of structure is determined as significant sentence;
S303: to the sentence only comprising English, spcial character therein is checked, if spcial character density is less than in sentence
0.3%, it is determined as significant sentence.
As a kind of optimal technical scheme, in above-mentioned steps S4, user's classification of drawing a portrait includes idea, sees, experiences, goes
It is dynamic.
As a kind of optimal technical scheme, expert vocabulary list is system creation, and user can define vocabulary in expert vocabulary list
Attribute, the attribute of vocabulary refer to the classification of vocabulary owning user portrait.
As a kind of optimal technical scheme, also it can judge whether sentence has emotion by natural language algorithm, if
Sentence is then referred to the impression classification of user's portrait with emotion.
Compared with prior art, the present invention having the advantages that is:
Existing user's portrait generally obtained related fields staff consulting by professional consultation personnel later, such side
Formula heavy workload, at the same it is very high to the requirement degree of profession, to all trades and professions cannot be allowed to be widely used.
The realization carrier of the step of the design method, is supported by software platform.The stream that software desk Implementation this method provides
Journey and message structure check and accept specification;By shirtsleeve operation, million grades of data can be acquired and be arranged rapidly, user is generated and draws
Picture.
See clearly method the present invention is based on the user during products innovation establish user draw a portrait model, it is all collected
Data become corresponding concrete composition part in model by expert vocabulary list classification polymerization, enable by user study person and
Designer understands.
Detailed description of the invention
Fig. 1 is flow diagram of the invention.
Fig. 2 is the schematic diagram of user's portrait.
Specific embodiment
The present invention is further described below by way of specific embodiment, the present invention can not also depart from the present invention by others
The scheme of technical characteristic describes, thus it is all within the scope of the present invention or the change in the equivalent scope of the invention is by this
Invention includes.
Embodiment
User's portrait generation method based on web crawlers acquisition technique, utilizes user's portrait generation system to realize, user
Portrait, which generates, has expert vocabulary list in system database, the effect of expert vocabulary list is defined to word, this definition is to word
The classification of the portrait of user described in language is defined.User portrait include four quadrants (i.e. four classifications), respectively idea, see
See, experience, take action.
The word belonged in expert vocabulary list all has affiliated user's portrait classification.Certainly, user can be independently to expert
Word in dictionary carries out attribute definition, word increases.For example, " will think " in expert vocabulary list, the attribute of the words such as " feeling "
It is defined as idea class;In expert vocabulary list by " visible ", " seeing ", " seeing ", " it was found that " etc. words attribute definition be see class;
The attribute definition of the words such as " impression ", " feeling " is to experience class by expert vocabulary list;Actional verb is defined as action classes.
The generation method specifically, user based on web crawlers acquisition technique draws a portrait, comprising the following steps:
Step S1: the keyword and target network address that user specifies are obtained, wherein keyword is subjected in any user's input
English, may include number, and target network address is legal URL.
The concrete operations of this step are that user inputs keyword and target network address in system, are closed so that system obtains
Keyword and target network address, the effect of keyword are to define to the main body of user's portrait, for example user's portrait is for " residence
This group of male ", then the keyword inputted in systems is " geek ", and target network address is independently to pass through search by user to draw
The address correlation for having information association with keyword inquired is held up, for example user passes through Baidu search engine search geek, meeting
There is the network address of " geek " Baidupedia, then user can be in this network address key entry system.Certainly, user can also key in it
He thinks other network address relevant to geek's information, than referring to " geek " if any news web page, then user can also be new by this
The network address for hearing webpage keys in system as target network address.
Specific example: such as, user is if it is desired to construct user's portrait of a coffee consumer, he can be from one
Or several network address comprising for information about and keyword generate.Concrete operations are as follows: user inputs network address: http: //
www.yingxiao360.com/htm/2014313/10856.htm;User inputs theme: coffee consumer;User, which clicks, to be used
What family portrait generation system was realized starts to grab button.
Step S2: obtaining target network address data flow, extracts content of text.
In this step, content of text refers to, by browser opening target network address, all the text category informations that can be checked,
Including the text in web page title, menu, text and sidebar.
Step S3: content of text eliminate the sentence for not including practical significance except making an uproar, retain the sentence being of practical significance
Son.It certainly, further include removing the irrelevant informations such as menu, column, advertisement in webpage except making an uproar, completion text, which removes, makes an uproar.
In this step, judging sentence, whether significant detailed process is as follows:
S301: subordinate sentence is carried out to text based on space and Chinese punctuate algorithm.
S302: it is scored based on the Chinese syntactic structure in natural language processing the integrity degree of sentence;Comprising master,
It calls, the sentence of guest's structure is determined as significant sentence.
S303: to the sentence only comprising English, spcial character therein is checked, if spcial character density is small in sentence
In 0.3%, it is determined as significant sentence.
Step S4: based on idea class, the word pair seen class, experience each category associations of class, action classes in expert vocabulary list
The sentence being of practical significance obtained in step S3 carries out classification and matching, according to matching weight, classifies to sentence.
For example, the vocabulary attribute such as " thinking ", " feeling " belongs in user's portrait model " user's idea " in expert vocabulary list
This classification, that includes the sentence of these vocabulary, will be classified into " user's idea ".
For example, " user's impression " this classification that will be then assigned in sentence containing " impression ", " feeling ", Huo Zhetong
The natural language algorithm process to sentence is crossed, judges sentence with emotion, then these sentences can be classified " user's sense
By " in.
For example, in experts database " presentation ", " it was found that " etc. vocabulary attribute belong to user draw a portrait model in " user sees " this
One classification, then the sentence of these vocabulary is protected, " user sees " that will be classified.
For example, by the natural language algorithm process to sentence, with having actional verb in sentence, then will be divided
Class is that " user's action " is this kind of.
Step S5: the keyword that system provides user carries out stem extraction, and the extraction of stem is general dependent on Chinese and English
Vocabulary in dictionary and expert vocabulary list;Meanwhile system finds the similar word of word bar in expert vocabulary list;In categorized sentence
All similar words comprising keyword and keyword are screened, all unmatched sentences are removed.
Such as: to coffee consumer, it can extract three stems " coffee ", " consumption ", " consumers ".
Step S6: being pushed to Web service end for all matched classification sentences, generates user's portrait painting canvas, and will
The sentence of each classification is filled into the corresponding classification of painting canvas.
It is worth noting that based under the premise of said structure design, to solve same technical problem, even if in this hair
That makes on bright is some without substantive changes or polishing, the essence of used technical solution still as the present invention, therefore
It should also be as within the scope of the present invention.
Claims (6)
- The generation method 1. user based on web crawlers acquisition technique draws a portrait, which comprises the following steps:S1: the keyword and target network address that user specifies are obtained;S2: obtaining the data flow of target network address, extracts content of text;S3: content of text is carried out to eliminate the sentence for not including practical significance except making an uproar;S4: the specific category for belonging to user's portrait based on word in expert vocabulary list matches the sentence retained, classifies Into corresponding user portrait classification;S5: finding similar word based on keyword in expert vocabulary list, screens in classification sentence all comprising keyword and its phase Like the sentence of word, remove all unmatched sentences;S6: being pushed to Web service end for all matched classification sentences, generates user and draws a portrait painting canvas, and by each classification Sentence be filled into the corresponding classification of painting canvas.
- The generation method 2. user according to claim 1 based on web crawlers acquisition technique draws a portrait, which is characterized in that text This content refers to, opens target network address by browser, all the text category informations that can be checked, including web page title, menu, just Text in text and sidebar.
- The generation method 3. user according to claim 1 based on web crawlers acquisition technique draws a portrait, which is characterized in that on It states in step S3, judging sentence, whether significant detailed process is as follows:S301: subordinate sentence is carried out to text based on space and Chinese punctuate algorithm;S302: it is scored based on the Chinese syntactic structure in natural language processing the integrity degree of sentence;Include subject and predicate, guest The sentence of structure is determined as significant sentence;S303: to the sentence only comprising English, spcial character therein is checked, if spcial character density is less than in sentence 0.3%, it is determined as significant sentence.
- The generation method 4. user according to claim 1 based on web crawlers acquisition technique draws a portrait, which is characterized in that on It states in step S4, user's portrait classification includes idea, sees, experiences, takes action.
- The generation method 5. user according to claim 1 based on web crawlers acquisition technique draws a portrait, which is characterized in that specially Family's dictionary is system creation, and user can define the attribute of vocabulary in expert vocabulary list, and the attribute of vocabulary refers to vocabulary owning user The classification of portrait.
- The generation method 6. user according to claim 4 based on web crawlers acquisition technique draws a portrait, which is characterized in that also It can judge whether sentence has emotion by natural language algorithm, sentence is referred to user if with emotion and is drawn The impression classification of picture.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910177182.2A CN109918508A (en) | 2019-03-08 | 2019-03-08 | User's portrait generation method based on web crawlers acquisition technique |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910177182.2A CN109918508A (en) | 2019-03-08 | 2019-03-08 | User's portrait generation method based on web crawlers acquisition technique |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109918508A true CN109918508A (en) | 2019-06-21 |
Family
ID=66964002
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910177182.2A Withdrawn CN109918508A (en) | 2019-03-08 | 2019-03-08 | User's portrait generation method based on web crawlers acquisition technique |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109918508A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112989038A (en) * | 2021-02-08 | 2021-06-18 | 浙江连信科技有限公司 | Sentence-level user portrait generation method and device and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106339806A (en) * | 2016-08-24 | 2017-01-18 | 北京创业公社征信服务有限公司 | Industry holographic image constructing method and industry holographic image constructing system for enterprise information |
CN107038237A (en) * | 2017-04-18 | 2017-08-11 | 昆山数泰数据技术有限公司 | User's portrait system and portrait method based on big data |
CN107578292A (en) * | 2017-09-19 | 2018-01-12 | 上海财经大学 | A kind of user's portrait constructing system |
US20180032508A1 (en) * | 2016-07-28 | 2018-02-01 | Abbyy Infopoisk Llc | Aspect-based sentiment analysis using machine learning methods |
-
2019
- 2019-03-08 CN CN201910177182.2A patent/CN109918508A/en not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180032508A1 (en) * | 2016-07-28 | 2018-02-01 | Abbyy Infopoisk Llc | Aspect-based sentiment analysis using machine learning methods |
CN106339806A (en) * | 2016-08-24 | 2017-01-18 | 北京创业公社征信服务有限公司 | Industry holographic image constructing method and industry holographic image constructing system for enterprise information |
CN107038237A (en) * | 2017-04-18 | 2017-08-11 | 昆山数泰数据技术有限公司 | User's portrait system and portrait method based on big data |
CN107578292A (en) * | 2017-09-19 | 2018-01-12 | 上海财经大学 | A kind of user's portrait constructing system |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112989038A (en) * | 2021-02-08 | 2021-06-18 | 浙江连信科技有限公司 | Sentence-level user portrait generation method and device and storage medium |
CN112989038B (en) * | 2021-02-08 | 2022-06-21 | 浙江连信科技有限公司 | Sentence-level user portrait generation method and device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ma et al. | Sentiment analysis–a review and agenda for future research in hospitality contexts | |
CN106776711B (en) | Chinese medical knowledge map construction method based on deep learning | |
Gao et al. | Developing simplified Chinese psychological linguistic analysis dictionary for microblog | |
CN103309862B (en) | Webpage type recognition method and system | |
Bisiada | The editor’s invisibility: Analysing editorial intervention in translation | |
CN107402912A (en) | Parse semantic method and apparatus | |
KR20120109943A (en) | Emotion classification method for analysis of emotion immanent in sentence | |
Nandi et al. | Bangla news recommendation using doc2vec | |
Alharbi et al. | Identifying comparative opinions in Arabic text in social media using machine learning techniques | |
Shi et al. | Mining chinese reviews | |
Pandey et al. | Sentiment analysis using lexicon based approach | |
CN107908749B (en) | Character retrieval system and method based on search engine | |
Das et al. | Developing bengali wordnet affect for analyzing emotion | |
CN103970865B (en) | Microblog text level subject finding method and system based on seed words | |
CN109918508A (en) | User's portrait generation method based on web crawlers acquisition technique | |
CN110990530A (en) | Microblog owner character analysis method based on deep learning | |
KR101265467B1 (en) | Method for extracting experience and classifying verb in blog | |
Im et al. | A study on brand identity and image utilizing SNA | |
Syed et al. | Automatic discovery of semantic relations using MindNet | |
Sungsri et al. | The analysis and summarizing system of thai hotel reviews using opinion mining technique | |
Zarifi et al. | Gender identification of short text author using conceptual vectorization | |
Yilmaz | A Corpus Investigation on the Journal of Social Sciences of the Turkic World. | |
Bohnemeyer | Semantic typology: the crosslinguistic study of semantic categorization | |
Le et al. | Hotel services preferences across cultures: a case study of applying opinion mining on Vietnamese and American online reviews | |
MANSOUR | The functional and aesthetic values for the sign through ancient civilizations as an entrance to enrich the visual and cultural formulas of the contemporary service sign |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190621 |