CN109918508A - User's portrait generation method based on web crawlers acquisition technique - Google Patents

User's portrait generation method based on web crawlers acquisition technique Download PDF

Info

Publication number
CN109918508A
CN109918508A CN201910177182.2A CN201910177182A CN109918508A CN 109918508 A CN109918508 A CN 109918508A CN 201910177182 A CN201910177182 A CN 201910177182A CN 109918508 A CN109918508 A CN 109918508A
Authority
CN
China
Prior art keywords
user
sentence
portrait
classification
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910177182.2A
Other languages
Chinese (zh)
Inventor
陈加兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Field Technology Co Ltd
Original Assignee
Chengdu Field Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Field Technology Co Ltd filed Critical Chengdu Field Technology Co Ltd
Priority to CN201910177182.2A priority Critical patent/CN109918508A/en
Publication of CN109918508A publication Critical patent/CN109918508A/en
Withdrawn legal-status Critical Current

Links

Abstract

The invention discloses user's portrait generation methods based on web crawlers acquisition technique, comprising the following steps: S1: obtains the keyword and target network address that user specifies;S2: obtaining the data flow of target network address, extracts content of text;S3: content of text is carried out to eliminate the sentence for not including practical significance except making an uproar;S4: the specific category for belonging to user's portrait based on word in expert vocabulary list matches the sentence retained, is categorized into corresponding user's portrait classification;S5: finding similar word based on keyword in expert vocabulary list, and all sentences comprising keyword and its similar word are screened in classification sentence, remove all unmatched sentences;S6: being pushed to Web service end for all matched classification sentences, generates user's portrait painting canvas, and the sentence of each classification is filled into the corresponding classification of painting canvas.

Description

User's portrait generation method based on web crawlers acquisition technique
Technical field
The present invention relates to Software Development technical fields, and in particular to user's portrait based on web crawlers acquisition technique Generation method.
Background technique
User's portrait is that product opens very important tool during experience innovation and design, it can help our shapes The behavioural characteristic of the understanding target user of elephant, helps us to judge user demand.User, which draws a portrait, to establish deep to real user Understand, and on the high precisely summary of related data, user's portrait is the virtual representations of real user, it is based on true first , it is not a specific people, another is that different type is divided into according to the difference of the behavior viewpoint of target, rapid group It is woven in together, then the type newly obtained is extract, form user's portrait an of type.To each user portrait institute's body Reveal the minutia description come should be it is true, be built upon that user's interview, focus group, culture is sought including questionnaire On the actual user data that the qualitative and quantitative studies means such as investigation are collected.
During establishing user's portrait, due to the shortage referring to data, researcher or designers is caused only to be led to Brainstorming is crossed, meeting is had and discusses, just makes user's portrait, this way wastes should go real user originally The time of information is collected in there, and causes to generate the deviation that user understands.
Existing generation is drawn a portrait generation technique referring to the user of data, be normally based on mass data (including transaction data, Social media data etc.) some feature tag extractions are carried out, due to being " signature " of some real users, these labels The precision marketing being typically used in product sales process, it is more difficult to apply in product design process, also relatively be difficult to be designed Teacher understands.
Specifically, in the prior art user draw a portrait generation technique the shortcomings that be main by manually obtaining mass data Afterwards, multiple researchers' meeting brainstorming discussion, then just generate user's portrait.Problem there are two such mode is main, one It is to be required to peopleware, it is necessary to be that professional just can be carried out this operation;Second is that more people are manually generated after discussing, it is different The discussion result of personnel has certain error, and user's portrait of output is unstable.
Summary of the invention
To solve the above-mentioned problems, the technical solution adopted by the present invention is as described below:
The present invention provides a kind of user's portrait generation method based on web crawlers acquisition technique, comprising the following steps:
S1: the keyword and target network address that user specifies are obtained;
S2: obtaining the data flow of target network address, extracts content of text;
S3: content of text is carried out to eliminate the sentence for not including practical significance except making an uproar;
S4: the specific category for belonging to user's portrait based on word in expert vocabulary list matches the sentence retained, classifies Into corresponding user portrait classification;
S5: finding similar word based on keyword in expert vocabulary list, screens in classification sentence all comprising keyword and its phase Like the sentence of word, remove all unmatched sentences;
S6: being pushed to Web service end for all matched classification sentences, generates user and draws a portrait painting canvas, and by each classification Sentence be filled into the corresponding classification of painting canvas.
As a kind of optimal technical scheme, content of text refers to, opens target network address, all that can be checked by browser Text category information, including the text in web page title, menu, text and sidebar.
As a kind of optimal technical scheme, in above-mentioned steps S3, judging sentence, whether significant detailed process is as follows:
S301: subordinate sentence is carried out to text based on space and Chinese punctuate algorithm;
S302: it is scored based on the Chinese syntactic structure in natural language processing the integrity degree of sentence;Include subject and predicate, guest The sentence of structure is determined as significant sentence;
S303: to the sentence only comprising English, spcial character therein is checked, if spcial character density is less than in sentence 0.3%, it is determined as significant sentence.
As a kind of optimal technical scheme, in above-mentioned steps S4, user's classification of drawing a portrait includes idea, sees, experiences, goes It is dynamic.
As a kind of optimal technical scheme, expert vocabulary list is system creation, and user can define vocabulary in expert vocabulary list Attribute, the attribute of vocabulary refer to the classification of vocabulary owning user portrait.
As a kind of optimal technical scheme, also it can judge whether sentence has emotion by natural language algorithm, if Sentence is then referred to the impression classification of user's portrait with emotion.
Compared with prior art, the present invention having the advantages that is:
Existing user's portrait generally obtained related fields staff consulting by professional consultation personnel later, such side Formula heavy workload, at the same it is very high to the requirement degree of profession, to all trades and professions cannot be allowed to be widely used.
The realization carrier of the step of the design method, is supported by software platform.The stream that software desk Implementation this method provides Journey and message structure check and accept specification;By shirtsleeve operation, million grades of data can be acquired and be arranged rapidly, user is generated and draws Picture.
See clearly method the present invention is based on the user during products innovation establish user draw a portrait model, it is all collected Data become corresponding concrete composition part in model by expert vocabulary list classification polymerization, enable by user study person and Designer understands.
Detailed description of the invention
Fig. 1 is flow diagram of the invention.
Fig. 2 is the schematic diagram of user's portrait.
Specific embodiment
The present invention is further described below by way of specific embodiment, the present invention can not also depart from the present invention by others
The scheme of technical characteristic describes, thus it is all within the scope of the present invention or the change in the equivalent scope of the invention is by this Invention includes.
Embodiment
User's portrait generation method based on web crawlers acquisition technique, utilizes user's portrait generation system to realize, user Portrait, which generates, has expert vocabulary list in system database, the effect of expert vocabulary list is defined to word, this definition is to word The classification of the portrait of user described in language is defined.User portrait include four quadrants (i.e. four classifications), respectively idea, see See, experience, take action.
The word belonged in expert vocabulary list all has affiliated user's portrait classification.Certainly, user can be independently to expert Word in dictionary carries out attribute definition, word increases.For example, " will think " in expert vocabulary list, the attribute of the words such as " feeling " It is defined as idea class;In expert vocabulary list by " visible ", " seeing ", " seeing ", " it was found that " etc. words attribute definition be see class; The attribute definition of the words such as " impression ", " feeling " is to experience class by expert vocabulary list;Actional verb is defined as action classes.
The generation method specifically, user based on web crawlers acquisition technique draws a portrait, comprising the following steps:
Step S1: the keyword and target network address that user specifies are obtained, wherein keyword is subjected in any user's input English, may include number, and target network address is legal URL.
The concrete operations of this step are that user inputs keyword and target network address in system, are closed so that system obtains Keyword and target network address, the effect of keyword are to define to the main body of user's portrait, for example user's portrait is for " residence This group of male ", then the keyword inputted in systems is " geek ", and target network address is independently to pass through search by user to draw The address correlation for having information association with keyword inquired is held up, for example user passes through Baidu search engine search geek, meeting There is the network address of " geek " Baidupedia, then user can be in this network address key entry system.Certainly, user can also key in it He thinks other network address relevant to geek's information, than referring to " geek " if any news web page, then user can also be new by this The network address for hearing webpage keys in system as target network address.
Specific example: such as, user is if it is desired to construct user's portrait of a coffee consumer, he can be from one Or several network address comprising for information about and keyword generate.Concrete operations are as follows: user inputs network address: http: // www.yingxiao360.com/htm/2014313/10856.htm;User inputs theme: coffee consumer;User, which clicks, to be used What family portrait generation system was realized starts to grab button.
Step S2: obtaining target network address data flow, extracts content of text.
In this step, content of text refers to, by browser opening target network address, all the text category informations that can be checked, Including the text in web page title, menu, text and sidebar.
Step S3: content of text eliminate the sentence for not including practical significance except making an uproar, retain the sentence being of practical significance Son.It certainly, further include removing the irrelevant informations such as menu, column, advertisement in webpage except making an uproar, completion text, which removes, makes an uproar.
In this step, judging sentence, whether significant detailed process is as follows:
S301: subordinate sentence is carried out to text based on space and Chinese punctuate algorithm.
S302: it is scored based on the Chinese syntactic structure in natural language processing the integrity degree of sentence;Comprising master, It calls, the sentence of guest's structure is determined as significant sentence.
S303: to the sentence only comprising English, spcial character therein is checked, if spcial character density is small in sentence In 0.3%, it is determined as significant sentence.
Step S4: based on idea class, the word pair seen class, experience each category associations of class, action classes in expert vocabulary list The sentence being of practical significance obtained in step S3 carries out classification and matching, according to matching weight, classifies to sentence.
For example, the vocabulary attribute such as " thinking ", " feeling " belongs in user's portrait model " user's idea " in expert vocabulary list This classification, that includes the sentence of these vocabulary, will be classified into " user's idea ".
For example, " user's impression " this classification that will be then assigned in sentence containing " impression ", " feeling ", Huo Zhetong The natural language algorithm process to sentence is crossed, judges sentence with emotion, then these sentences can be classified " user's sense By " in.
For example, in experts database " presentation ", " it was found that " etc. vocabulary attribute belong to user draw a portrait model in " user sees " this One classification, then the sentence of these vocabulary is protected, " user sees " that will be classified.
For example, by the natural language algorithm process to sentence, with having actional verb in sentence, then will be divided Class is that " user's action " is this kind of.
Step S5: the keyword that system provides user carries out stem extraction, and the extraction of stem is general dependent on Chinese and English Vocabulary in dictionary and expert vocabulary list;Meanwhile system finds the similar word of word bar in expert vocabulary list;In categorized sentence All similar words comprising keyword and keyword are screened, all unmatched sentences are removed.
Such as: to coffee consumer, it can extract three stems " coffee ", " consumption ", " consumers ".
Step S6: being pushed to Web service end for all matched classification sentences, generates user's portrait painting canvas, and will The sentence of each classification is filled into the corresponding classification of painting canvas.
It is worth noting that based under the premise of said structure design, to solve same technical problem, even if in this hair That makes on bright is some without substantive changes or polishing, the essence of used technical solution still as the present invention, therefore It should also be as within the scope of the present invention.

Claims (6)

  1. The generation method 1. user based on web crawlers acquisition technique draws a portrait, which comprises the following steps:
    S1: the keyword and target network address that user specifies are obtained;
    S2: obtaining the data flow of target network address, extracts content of text;
    S3: content of text is carried out to eliminate the sentence for not including practical significance except making an uproar;
    S4: the specific category for belonging to user's portrait based on word in expert vocabulary list matches the sentence retained, classifies Into corresponding user portrait classification;
    S5: finding similar word based on keyword in expert vocabulary list, screens in classification sentence all comprising keyword and its phase Like the sentence of word, remove all unmatched sentences;
    S6: being pushed to Web service end for all matched classification sentences, generates user and draws a portrait painting canvas, and by each classification Sentence be filled into the corresponding classification of painting canvas.
  2. The generation method 2. user according to claim 1 based on web crawlers acquisition technique draws a portrait, which is characterized in that text This content refers to, opens target network address by browser, all the text category informations that can be checked, including web page title, menu, just Text in text and sidebar.
  3. The generation method 3. user according to claim 1 based on web crawlers acquisition technique draws a portrait, which is characterized in that on It states in step S3, judging sentence, whether significant detailed process is as follows:
    S301: subordinate sentence is carried out to text based on space and Chinese punctuate algorithm;
    S302: it is scored based on the Chinese syntactic structure in natural language processing the integrity degree of sentence;Include subject and predicate, guest The sentence of structure is determined as significant sentence;
    S303: to the sentence only comprising English, spcial character therein is checked, if spcial character density is less than in sentence 0.3%, it is determined as significant sentence.
  4. The generation method 4. user according to claim 1 based on web crawlers acquisition technique draws a portrait, which is characterized in that on It states in step S4, user's portrait classification includes idea, sees, experiences, takes action.
  5. The generation method 5. user according to claim 1 based on web crawlers acquisition technique draws a portrait, which is characterized in that specially Family's dictionary is system creation, and user can define the attribute of vocabulary in expert vocabulary list, and the attribute of vocabulary refers to vocabulary owning user The classification of portrait.
  6. The generation method 6. user according to claim 4 based on web crawlers acquisition technique draws a portrait, which is characterized in that also It can judge whether sentence has emotion by natural language algorithm, sentence is referred to user if with emotion and is drawn The impression classification of picture.
CN201910177182.2A 2019-03-08 2019-03-08 User's portrait generation method based on web crawlers acquisition technique Withdrawn CN109918508A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910177182.2A CN109918508A (en) 2019-03-08 2019-03-08 User's portrait generation method based on web crawlers acquisition technique

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910177182.2A CN109918508A (en) 2019-03-08 2019-03-08 User's portrait generation method based on web crawlers acquisition technique

Publications (1)

Publication Number Publication Date
CN109918508A true CN109918508A (en) 2019-06-21

Family

ID=66964002

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910177182.2A Withdrawn CN109918508A (en) 2019-03-08 2019-03-08 User's portrait generation method based on web crawlers acquisition technique

Country Status (1)

Country Link
CN (1) CN109918508A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989038A (en) * 2021-02-08 2021-06-18 浙江连信科技有限公司 Sentence-level user portrait generation method and device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106339806A (en) * 2016-08-24 2017-01-18 北京创业公社征信服务有限公司 Industry holographic image constructing method and industry holographic image constructing system for enterprise information
CN107038237A (en) * 2017-04-18 2017-08-11 昆山数泰数据技术有限公司 User's portrait system and portrait method based on big data
CN107578292A (en) * 2017-09-19 2018-01-12 上海财经大学 A kind of user's portrait constructing system
US20180032508A1 (en) * 2016-07-28 2018-02-01 Abbyy Infopoisk Llc Aspect-based sentiment analysis using machine learning methods

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032508A1 (en) * 2016-07-28 2018-02-01 Abbyy Infopoisk Llc Aspect-based sentiment analysis using machine learning methods
CN106339806A (en) * 2016-08-24 2017-01-18 北京创业公社征信服务有限公司 Industry holographic image constructing method and industry holographic image constructing system for enterprise information
CN107038237A (en) * 2017-04-18 2017-08-11 昆山数泰数据技术有限公司 User's portrait system and portrait method based on big data
CN107578292A (en) * 2017-09-19 2018-01-12 上海财经大学 A kind of user's portrait constructing system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989038A (en) * 2021-02-08 2021-06-18 浙江连信科技有限公司 Sentence-level user portrait generation method and device and storage medium
CN112989038B (en) * 2021-02-08 2022-06-21 浙江连信科技有限公司 Sentence-level user portrait generation method and device and storage medium

Similar Documents

Publication Publication Date Title
Ma et al. Sentiment analysis–a review and agenda for future research in hospitality contexts
CN106776711B (en) Chinese medical knowledge map construction method based on deep learning
Gao et al. Developing simplified Chinese psychological linguistic analysis dictionary for microblog
CN103309862B (en) Webpage type recognition method and system
Bisiada The editor’s invisibility: Analysing editorial intervention in translation
CN107402912A (en) Parse semantic method and apparatus
KR20120109943A (en) Emotion classification method for analysis of emotion immanent in sentence
Nandi et al. Bangla news recommendation using doc2vec
Alharbi et al. Identifying comparative opinions in Arabic text in social media using machine learning techniques
Shi et al. Mining chinese reviews
Pandey et al. Sentiment analysis using lexicon based approach
CN107908749B (en) Character retrieval system and method based on search engine
Das et al. Developing bengali wordnet affect for analyzing emotion
CN103970865B (en) Microblog text level subject finding method and system based on seed words
CN109918508A (en) User's portrait generation method based on web crawlers acquisition technique
CN110990530A (en) Microblog owner character analysis method based on deep learning
KR101265467B1 (en) Method for extracting experience and classifying verb in blog
Im et al. A study on brand identity and image utilizing SNA
Syed et al. Automatic discovery of semantic relations using MindNet
Sungsri et al. The analysis and summarizing system of thai hotel reviews using opinion mining technique
Zarifi et al. Gender identification of short text author using conceptual vectorization
Yilmaz A Corpus Investigation on the Journal of Social Sciences of the Turkic World.
Bohnemeyer Semantic typology: the crosslinguistic study of semantic categorization
Le et al. Hotel services preferences across cultures: a case study of applying opinion mining on Vietnamese and American online reviews
MANSOUR The functional and aesthetic values for the sign through ancient civilizations as an entrance to enrich the visual and cultural formulas of the contemporary service sign

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20190621