CN107704572A - The creation angle method for digging and device of people entities - Google Patents

The creation angle method for digging and device of people entities Download PDF

Info

Publication number
CN107704572A
CN107704572A CN201710914887.9A CN201710914887A CN107704572A CN 107704572 A CN107704572 A CN 107704572A CN 201710914887 A CN201710914887 A CN 201710914887A CN 107704572 A CN107704572 A CN 107704572A
Authority
CN
China
Prior art keywords
keyword
network
word
personage
people entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710914887.9A
Other languages
Chinese (zh)
Other versions
CN107704572B (en
Inventor
马健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201710914887.9A priority Critical patent/CN107704572B/en
Publication of CN107704572A publication Critical patent/CN107704572A/en
Application granted granted Critical
Publication of CN107704572B publication Critical patent/CN107704572B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Abstract

The invention provides the creation angle method for digging and device of a kind of people entities, wherein, this method includes:First, article metadata is obtained from network data source, and parses the people entities word included in article metadata and the keyword related to people entities word.Then, personage's keyword network is built as network node by the use of people entities word and keyword, and label is added to the network node in personage's keyword network.And then using same label as corporations' partitioning standards, corporations' division, the creation angle using the arbitrary network node in same corporations as people entities word in this corporation are carried out to the network node in personage's keyword network.The embodiment of the present invention can not only provide more novel writing angle for creator, can also widen the creation thought of creator, help creator to create the article of more diversification.

Description

The creation angle method for digging and device of people entities
Technical field
The present invention relates to technical field of internet application, more particularly to a kind of creation angle method for digging of people entities And device.
Background technology
With the development of social informatization, while information emerges in multitude, requirement of the people for information also increasingly swashs Increase.Bulk information is mainly presented in the form of intricate, fast changing information flow at present.Wherein, the presentation form of information flow Mainly there are word, picture and video etc., it is more representational to have today's tops, everyday Netease's news, bulletin etc..
In addition, as personal user is used the depth of internet, arisen at the historic moment from media as emerging media.From matchmaker Body is also known as " Civil Media " or " individual media ", is privatization, the disseminator of popular, generalization, autonomy-oriented, to modernize, The means of electronization, normative and non-standard information new media is transmitted to not specific most of or specific single people General name.The media of development from to(for) current information stream serves hugely impetus.And it is in different poses and with different expressions from media, the good and the bad is not Together, the outstanding inspiration that from media audient can be allowed to be lived or contribute to success of career, allow it is found that living Meaning and value.But most from media is some simple " network transplanting ", some trifles, even not are recorded The content of health.For a long time, new meaning and innovative point are lacked gradually from the content of media.And for creator, creation The main thought of one article is to surround one or more entities, from several angles, personal viewpoint and understanding in addition, Finally arrange and form article.And field known to creator and thinking angle are limited, shape can not be created to all entities Into complete framework.Can open-and-shut form if the angle that each entity is created can be polymerized to one, then creator Just readily appreciate which aspect had been write, which aspect was not created also, and which aspect is that the comparison write is frequent 's.This creation for carrying out new angle for author has and obviously helped, and also has simultaneously for reader permanent sustainable Read value.
But there is presently no a method that writing angle and thinking can be effectively provided for creator.Due to wound It is a comparison abstract concept as angle, therefore possible creation angle how is found out from an article and how from big It is to face huge challenge at present that the most important angle of entity is found out in the creation angle of amount.
The content of the invention
In view of the above problems, it is proposed that the present invention so as to provide one kind overcome above mentioned problem or at least in part solve on State the creation angle method for digging and device of the people entities of problem.
According to an aspect of the present invention, a kind of creation angle method for digging of people entities is disclosed, including:
Article metadata is obtained from network data source, and parses the people entities included in the article metadata Word and the keyword related to the people entities word;
Personage's keyword network is built as network node by the use of the people entities word and the keyword, to the people Network node addition label in thing keyword network;
Using same label as corporations' partitioning standards, corporations are carried out to the network node in personage's keyword network and drawn Point, the creation angle using the arbitrary network node in same corporations as people entities word in this corporation.
Alternatively, it is described to build personage's keyword net as network node by the use of the people entities word and the keyword Network, including:
Using the people entities word and keyword as network node, with the connection weight between people entities word and keyword Connection weight between weight and/or different personage's entity words builds personage's keyword network for side, wherein, the connection weight table Show the tight ness rating between network node.
Alternatively, the network node in personage's keyword network adds label, including:
Tagged people entities word and/or keyword and the net in personage's keyword network are added using preset Network node is matched;
If the match is successful, by corresponding label added to the network node that the match is successful, and obtain and the net that the match is successful Connection weight between network node reaches the network node of predetermined threshold value;
Reach the network node addition same label of predetermined threshold value to the connection weight got.
Alternatively, label is added to the network node in personage's keyword network, including:
Label is added to the network node in personage's keyword network based on Predistribution Algorithm.
Alternatively, the Predistribution Algorithm includes:Label propagation algorithm LPA.
Alternatively, the label includes at least one of:
The attribute information of the people entities word and/or the keyword;
The people information related to the people entities word and/or the keyword;
The event information related to the people entities word and/or the keyword.
Alternatively, it is described using the people entities word and keyword as network node, with people entities word and keyword Between connection weight and/or different personage's entity words between connection weight for side build personage's keyword network before, also Including:
Count the connection between connection weight and/or the different personage's entity words between the people entities word and keyword Weight.
Alternatively, the connection weight between the statistics people entities word and keyword and/or different people entities Connection weight between word, including:
The article title in the article metadata is extracted, people entities word and key are parsed from the article title Word;
The co-occurrence word pair occurred in the article title is counted, the co-occurrence word is to the people to occur in same article title The word pair that thing entity word forms from keyword, different personage's entity words;
The co-occurrence word is calculated to the corresponding weighted value in different articles, and to identical co-occurrence word in different articles The summation of corresponding weighted value, using as in personage's keyword network with co-occurrence word to corresponding people entities word and keyword Between connection weight and/or different personage's entity words between connection weight.
Alternatively, it is described to calculate the co-occurrence word to the corresponding weighted value in different articles, including:
Extract the article text in the article metadata;
The number occurred according to the co-occurrence word to the co-occurrence word included in the article text, calculates the co-occurrence The weighted value of word pair, wherein, the co-occurrence word is personage's entity word and/or the keyword.
Alternatively, it is described to parse the people entities word and keyword included in the article metadata, including:
The article title in the article metadata is extracted, the article title of extraction is segmented;
High frequency words are filtered according to the word frequency after participle, and part-of-speech tagging is carried out to the word after filtering, wherein, by people's name Word is labeled as people entities word, and other words related to the people entities word are labeled as keyword.
According to another aspect of the present invention, a kind of creation angle excavating gear of people entities is additionally provided, including:
Parsing module, suitable for obtaining article metadata from network data source, and parse in the article metadata Comprising people entities word and the keyword related to the people entities word;
Module is built, suitable for building personage's keyword as network node by the use of the people entities word and the keyword Network, label is added to the network node in personage's keyword network;
Module is excavated, suitable for using same label as corporations' partitioning standards, to the network in personage's keyword network Node carries out corporations' division, the creation angle using the arbitrary network node in same corporations as people entities word in this corporation.
Alternatively, the structure module is further adapted for:Using the people entities word and keyword as network node, with personage The connection weight between connection weight and/or different personage's entity words between entity word and keyword builds personage's key for side Word network, wherein, the connection weight represents the tight ness rating between network node.
Alternatively, the structure module is further adapted for:Tagged people entities word and/or keyword are added using preset Matched with the network node in personage's keyword network;
If the match is successful, by corresponding label added to the network node that the match is successful, and obtain and the net that the match is successful Connection weight between network node reaches the network node of predetermined threshold value;
Reach the network node addition same label of predetermined threshold value to the connection weight got.
Alternatively, the structure module is further adapted for:
Label is added to the network node in personage's keyword network based on Predistribution Algorithm.
Alternatively, the Predistribution Algorithm includes:Label propagation algorithm LPA.
Alternatively, the label includes at least one of:
The attribute information of the people entities word and/or the keyword;
The people information related to the people entities word and/or the keyword;
The event information related to the people entities word and/or the keyword.
Alternatively, described device also includes:Statistical module, suitable for it is described structure module with the people entities word and pass Keyword is as network node, between the connection weight and/or different personage's entity words between people entities word and keyword Connection weight is before side builds personage's keyword network, to count the connection weight between the people entities word and keyword And/or the connection weight between different personage's entity words.
Alternatively, the statistical module is further adapted for:The article title in the article metadata is extracted, from the article mark People entities word and keyword are parsed in topic;
The co-occurrence word pair occurred in the article title is counted, the co-occurrence word is to the people to occur in same article title The word pair that thing entity word forms from keyword, different personage's entity words;
The co-occurrence word is calculated to the corresponding weighted value in different articles, and to identical co-occurrence word in different articles The summation of corresponding weighted value, using as in personage's keyword network with co-occurrence word to corresponding people entities word and keyword Between connection weight and/or different personage's entity words between connection weight.
Alternatively, the statistical module is further adapted for:Extract the article text in the article metadata;
The number occurred according to the co-occurrence word to the co-occurrence word included in the article text, calculates the co-occurrence The weighted value of word pair, wherein, the co-occurrence word is personage's entity word and/or the keyword.
Alternatively, the parsing module is further adapted for:The article title in the article metadata is extracted, to the article of extraction Title is segmented;
High frequency words are filtered according to the word frequency after participle, and part-of-speech tagging is carried out to the word after filtering, wherein, by people's name Word is labeled as people entities word, and other words related to the people entities word are labeled as keyword.
In accordance with a further aspect of the present invention, a kind of computer program, including computer-readable code are additionally provided, when described When computer-readable code is run on the computing device, cause the computing device described above based on corporations' division Create angle method for digging.
In accordance with a further aspect of the present invention, a kind of computer-readable medium is additionally provided, wherein storing as described above Computer program.
In embodiments of the present invention, first, article metadata is obtained from network data source, and parses article member number The people entities word and the keyword related to people entities word included in.Then, made using people entities word and keyword Personage's keyword network is built for network node, and label is added to the network node in personage's keyword network.Finally, with phase With label as corporations' partitioning standards, corporations' division is carried out to the network node in personage's keyword network, by same corporations Creation angle of the arbitrary network node as people entities word in this corporation.Thus, the embodiment of the present invention passes through to a large amount of nets The people entities word and keyword that the excavation of network data is included with extracting in article metadata, closed using each word structure personage After keyword network, corporations' division is carried out to the network node of the network, so as to be used as this corporation by the use of the word in same corporations The creation angle of middle people entities word, the cluster of the creation angle of people entities word is realized, so that the creation angle of people entities word Degree pools can open-and-shut form.The present invention program can not only provide more novel writing angle for creator, also The creation thought of creator can be widened, helps creator to create the article of more diversification.
Further, creator is helped to carry out article creation using the scheme of the embodiment of the present invention, for the audient of article Person also has permanent sustainable value.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of specification, and in order to allow above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by the embodiment of the present invention.
According to the accompanying drawings will be brighter to the detailed description of the specific embodiment of the invention, those skilled in the art Above-mentioned and other purposes, the advantages and features of the present invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, it is various other the advantages of and benefit it is common for this area Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 shows the flow signal of the creation angle method for digging of people entities according to an embodiment of the invention Figure;
Fig. 2 shows personage's keyword schematic network structure according to an embodiment of the invention;
Fig. 3 shows the flow signal of the creation angle method for digging of people entities in accordance with another embodiment of the present invention Figure;
Fig. 4 shows the structural representation of the creation angle excavating gear of people entities according to an embodiment of the invention Figure;
Fig. 5 shows the structural representation of the creation angle excavating gear of people entities in accordance with another embodiment of the present invention Figure;
Fig. 6 shows the frame of the computing device of the creation angle method for digging for performing the people entities according to the present invention Figure;And
Fig. 7 is shown for keeping or carrying realization according to the creation angle method for digging of the people entities of the present invention The memory cell of program code.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in accompanying drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Completely it is communicated to those skilled in the art.
In order to solve the above technical problems, the embodiments of the invention provide a kind of creation angle method for digging of people entities. Fig. 1 shows the schematic flow sheet of the creation angle method for digging of people entities according to an embodiment of the invention.Referring to figure 1, this method can at least include step S102 to step S106.
Step S102, article metadata is obtained from network data source, and parse the people included in article metadata Thing entity word and the keyword related to people entities word.
In this step, network data source can be included from media, such as interesting history, phoenix news, Tengxun's news, today Article that top news, Netease's news, everyday bulletin etc. or micro-blog information, wechat public number are delivered, blog articles etc. Deng.The embodiment of the present invention does not do specific restriction to the source of network data.
Step S104, personage's keyword network is built as network node by the use of people entities word and keyword, and to people Network node addition label in thing keyword network.
In this step, label include people entities word and/or keyword attribute information, with people entities word and/or At least one of the related people information of keyword, event information related to people entities word and/or keyword etc..
Wherein, the attribute information of people entities word includes personal essential information, occupation, hobby etc..For example, " Zhuge Liang " Attribute information include:Occupation is " military counsellor family " and " politician ", and hobby is " reading " etc..The attribute information of keyword includes The affiliated type of keyword, the implication represented etc..For example, keyword includes for the attribute information of " The Romance of the Three Kingdoms ":Subject matter is to go through Novel of history subject matter etc..
Step S106, using same label as corporations' partitioning standards, the network node in personage's keyword network is carried out Corporations divide, the creation angle using the arbitrary network node in same corporations as people entities word in this corporation.
The personage that the embodiment of the present invention is included by the excavation to a large amount of network datas with extracting in article metadata is real Pronouns, general term for nouns, numerals and measure words and keyword, after using each word structure personage's keyword network, corporations' division is carried out to the node of the network, so as to Creation angle by the use of the word in same corporations as people entities word in this corporation, realizes the creation angle of people entities word Cluster, so that the creation angle collection of people entities word is into can open-and-shut form.The present invention program can not only be creation Person provides more novel writing angle, can also widen the creation thought of creator, helps creator to create more polynary The article of change.
Further, creator is helped to carry out article creation using the scheme of the embodiment of the present invention, for the audient of article Person also has permanent sustainable value.
Step S102 is seen above, in an embodiment of the present invention, because the word in article title is the height to article Summarize, therefore, the embodiment of the present invention parses people entities word and relative keyword from title.Specifically, extraction text Article title in chapter metadata, the article title of extraction is segmented.Then according to the word frequency after participle to high frequency words mistake Filter, and part-of-speech tagging is carried out to the word after filtering, wherein, personage's noun is labeled as people entities word, related to people entities word Other words be labeled as keyword.
In this embodiment it is possible to using noun and verb as the keyword of candidate, then by word frequency by high frequency words mistake Filter, for example the high frequency words such as have a meal, live will be filtered.And then using the noun after filtering and verb as keyword, personage Noun is as people entities word.Certainly, the word of other parts of speech can also be used to implement as keyword, the present invention after participle Example is not specifically limited to this.
Referring to table 1, the article title and Ge Wen being resolved to from the article in terms of several amusement, history are listed in table 1 People entities word that chapter title is included, the keyword related to people entities word.
Table 1
Step S104 is seen above, in an embodiment of the present invention, network is being used as by the use of people entities word and keyword When node builds personage's keyword network, network node can be used as using people entities word and keyword, with people entities word and The connection weight between connection weight and/or different personage's entity words between keyword builds personage's keyword network for side. Wherein, connection weight represents the tight ness rating between network node.For example, connection weight is bigger, represent between two network nodes Relation it is closer.
For example, Fig. 2 is to be closed using the people entities word and keyword parsed in table 1 as the personage that network node is built Keyword network.Keyword " The Romance of the Three Kingdoms " in fig. 2 has been directly connected to 4 personage's entity words, that is, be respectively " Zhuge Liang ", " Guan Yu ", " Liu is standby " and " Zhang Fei ", wherein the connection weight between " The Romance of the Three Kingdoms " and " Zhuge Liang " is up to 214, with " Fly " connection weight minimum 69, it follows that the tight ness rating of " Zhuge Liang " and " The Romance of the Three Kingdoms " ratio " Zhang Fei " and " three states drill The tight ness rating of justice " is larger.Certainly, the numerical value of the connection weight between each network node shown in Fig. 2 is only to illustrate Property, the embodiment of the present invention is not specifically limited to this.
With continued reference to above step S104, in an embodiment of the present invention, to the network node in personage's keyword network Adding tagged detailed process can be:
First, added using preset in tagged people entities word and/or keyword and personage's keyword network Network node is matched.If the match is successful, corresponding label is added to the network node that the match is successful.Then, obtain Reach the network node of predetermined threshold value with the connection weight between the network node that the match is successful.And then the connection to getting Weight reaches the network node addition same label of predetermined threshold value.In the embodiment, predetermined threshold value can be set by user, The embodiment of the present invention is not especially limited to predetermined threshold value.
Wherein, preset added tagged people entities word and/or keyword can be existing personage's knowledge mappings The word included in network data.Personage's knowledge mapping is can be by the relation between the personage of encyclopaedia structure, such as man and wife, son The important relationships such as female, brother.But the keyword included in personage's knowledge mapping is considerably less, it is therefore possible to use personage's knowledge Collection of illustrative plates collection of illustrative plates removes to predict network node in the embodiment of the present invention in personage's keyword network as having added tagged source Label.
In addition, network node (such as people entities word and keyword) can also using the surrounding network node that is completely embedded as The label of itself.Each network node has individual probable value for the network node on periphery, exceedes certain threshold by select probability Label of the network node of value as present networks node.For example, with reference to Fig. 2 content, according to each network being connected with " Zhang San " Connection weight between node, because the connection weight of people entities word " Zhang San " and keyword " derailed " is of a relatively high.Therefore, Can be by the label information of " derailed " as " Zhang San ".
In an alternative embodiment of the invention, when adding label to the network node in personage's keyword network, can be based on Predistribution Algorithm adds label to the network node in personage's keyword network.Wherein, Predistribution Algorithm can include label propagation calculation Method (Label propagation algorithm, LPA).Label propagation algorithm is a kind of semi-supervised learning method based on figure, Its basic ideas is to be gone to predict the label information of unmarked node with the label information of marked node.Utilize the relation between sample The complete graph model of opening relationships, in complete graph, node includes having marked and unlabeled data, and its side represents the tight of two nodes Density, the label of node pass to other nodes by similarity.Referred herein to tight ness rating be connection weight in foregoing embodiments Weight, in personage's keyword network, can be transferred to it according to the size of the value of the direct connection weight of network node by label His network node.Label data is like a source, can be to being labeled without label data, and the similarity of node is got over Greatly, the easier propagation of label.
For example, with reference to Fig. 2 and foregoing embodiments, the connection weight of people entities word " Zhang San " and people entities word " little Rong " Also it is higher.Using label propagation algorithm, the label " derailed " of " Zhang San " can be passed to " little Rong ".Thus, connect before not The contact also established between close keyword and people entities word.
Step S106 is seen above, in an embodiment of the present invention, using same label as corporations' partitioning standards, to Fig. 2 Network node in shown personage's keyword network carries out corporations' division, regard the arbitrary network node in same corporations as this society The creation angle of people entities word in group.In the illustrated embodiment of table 2, each personage's entity word and its corresponding important angle represent A corporations after division.
People entities word Create angle
Zhang San Overstep the limit, little Rong, Li Si, divorce, stupid root
Liu Bei Straw sandals, Guan Yu, Zhang Fei
Guan Yu Liu Bei, force five passes and slay six captains, heat wine cuts Hua Xiong
Chu Yuan-chang Kill person who has rendered outstanding service, horse queen, Buddhist monk, beggar, Liu Baiwen
Little Rong Zhang San, Li Si, properties division, overstep the limit
Table 2
In an embodiment of the present invention, creation angle can not only include the content of network node, can also be to net The label of network node addition.For example, there are some to create angle, such as " divorce ", " stupid root ", " forcing five passes and slay six captains " in table 2 It is that these words can be to network section in preceding step in the people entities keyword network shown in the Fig. 2 not occurred The label word added during point addition label.Due to the label and its close relation of network node itself, accordingly it is also possible to will Important creation angle of the label of network node as people entities word in corporations.
Described above, network node can also be used as the label of itself using the surrounding network node being completely embedded.Therefore, draw In the corporations divided, what some network nodes can't be single is divided into some specific corporation.For example, in table 2, it is crucial In corporations where " Zhang San " and " little Rong " has been respectively divided each in word " derailed ".
The embodiment of the present invention additionally provides the creation angle method for digging of another people entities.Fig. 3 is shown according to this Invent the schematic flow sheet of the creation angle method for digging of the people entities of another embodiment.Referring to Fig. 3, this method at least may be used With including step S302 to step S310.
Step S302, article metadata is obtained from network data source, and parse the people included in article metadata Thing entity word and the keyword related to people entities word.
In this step, people entities word can be noun, and keyword can be noun, verb etc..The embodiment of the present invention The specific part of speech of people entities word and keyword is not limited.
Step S304, count between connection weight and/or the different personage's entity words between people entities word and keyword Connection weight.
In this step, count connection weight between people entities word and keyword and/or different personage's entity words it Between the process of connection weight can include:
First, the article title in article metadata is extracted, people entities word and keyword are parsed from article title.
Then, the co-occurrence word pair occurred in article title is counted.Wherein, co-occurrence word is to that can represent in same article title The people entities word of appearance and crucial phrase into word pair, the word pair formed between different personage's entity words can also be represented.
Finally, co-occurrence word is calculated to the corresponding weighted value in different articles, and to identical co-occurrence word in different articles In the summation of corresponding weighted value, using as in personage's keyword network with co-occurrence word to corresponding people entities word and keyword it Between connection weight and/or different personage's entity words between connection weight.
In an embodiment of the present invention, co-occurrence word is calculated to during corresponding weighted value, can first be extracted in different articles Article text in article metadata.Then the number occurred according to co-occurrence word to the co-occurrence word included in article text, The weighted value of co-occurrence word pair is calculated, wherein, co-occurrence word is personage's entity word and/or keyword.
In order to more clearly embody the connection weight of the embodiment of the present invention, below with a specific embodiment to connection weight The statistic processes of weight illustrates.
For example, the word for being segmented to obtain to article A title includes 4:W1, w2, w3, w4 are (assuming that w1 and w2 is people Thing entity word, w3 and w4 are other classifiers (i.e. keyword)).Because word w1, w2, w3, w4 are appeared in a title simultaneously, Therefore, these words co-occurrence each other is thought herein.Next the co-occurrence word pair that statistics occurs is, it is necessary to which explanation is that co-occurrence word centering must Must include people entities word, and co-occurrence word to be it is unordered, thus, the co-occurrence word arrived of statistics to including:<w1,w2>,< w1,w3>,<w1,w4>,<w2,w3>,<w2,w4>.
In embodiments of the present invention, only calculated once for same piece article, co-occurrence number, in order that between network node Connection weight is more accurate, and giving each co-occurrence word, the calculating process of weighted value is as follows to being multiplied by a weighted value.
Assuming that<w1,w2>Weighted value represented with weightA, then,
weightA<w1,w2>=min (c (w1), c (w2))/max (c (w1), c (w2), c (w3), c (w4)).
In the formula, c (w) represents the number that word w occurs in article.The weighted value being calculated be one be less than etc. In 1 numerical value.
If w1 and w2 is respectively as the network node in personage's keyword network, then<w1,w2>Connection weight be institute There is the weighted value sum of article corresponding to the article title comprising the two network nodes, calculation formula is:weight<w1,w2> =∑ weightA<w1,w2>.
Step S306, using people entities word and keyword as network node, between people entities word and keyword Connection weight between connection weight and/or different personage's entity words builds personage's keyword network for side.
In this step, connection weight represents the tight ness rating between network node.
Step S308, label is added to the network node in personage's keyword network based on Predistribution Algorithm.
Step S310, using same label as corporations' partitioning standards, the network node in personage's keyword network is carried out Corporations divide, the creation angle using the arbitrary network node in same corporations as people entities word in this corporation.
Divided by foregoing embodiments after obtaining multiple corporations, the arbitrary network node in each corporations can serve as The creation angle of people entities word in this corporation.The embodiment of the present invention can will create angle and recommend creator.Specifically, work as When creator carries out article writing by correlation writing APP or website, recorded, will be created with it according to the creation before creator The creation angle for correlation of noting down recommends creator.
Red often published an article for example, creator is small using blog.Due to it is small it is red be a star fan, it is therefore, small red The article often delivered in blog is all the content related to star.It is small it is red be video display star " Wang Xiaoxiao " iron bean vermicelli, Deliver write on she article it is most.The corporations divided according to embodiments of the present invention, also wrapped in corporations where " Wang Xiaoxiao " Contain keyword " culinary art ".In fact, " Wang Xiaoxiao " also frequently does some cuisines in addition to shooting films and television programs, still, She likes the thing external world few people for doing cuisines to know.Thus, " culinary art " can be used as by " Wang Xiao by the embodiment of the present invention Dawn " new creation angle recommend it is small red, red to provide new writing thought to be small.
In the embodiment of the present invention, the form of creation angle is recommended to include for creator a variety of.For example, can be with bullet The form of window recommends multiple creation angles on people entities on creator's current written interface, can also be with scroll bar Form recommends multiple creation angles on people entities in the optional position at author's current written interface.Certainly, it is being actually When creator recommends multiple creation angles on people entities, it is necessary to by the content displaying of recommendation in more eye-catching place, To cause the attention of creator.
Meanwhile in order to not influence the writing work of creator, the display time of content recommendation can be set.For example, setting The display time of the creation angle of recommendation is 20 seconds, 30 seconds, 40 seconds etc..Or the creation angle of recommendation is set every 30 seconds Renewal once, i.e., recommended new creation angle etc. every 30 seconds.The creation angle that the embodiment of the present invention obtains to excavation pushes away Recommend form and do not do specific restriction.
Based on same inventive concept, the embodiment of the present invention additionally provides a kind of creation angle excavating gear of people entities, Fig. 4 shows the structural representation of the creation angle excavating gear of people entities according to an embodiment of the invention, referring to figure 4, the creation angle excavating gear 400 of people entities can at least include parsing module 410, structure module 420 and excavate mould Block 430.
Now introduce the function of each composition or device of the creation angle excavating gear 400 of the people entities of the embodiment of the present invention And the annexation between each several part:
Parsing module 410, suitable for obtaining article metadata from network data source, and parse and wrapped in article metadata The people entities word and the keyword related to people entities word contained;
Module 420 is built, is coupled with parsing module 410, suitable for being used as network node by the use of people entities word and keyword Personage's keyword network is built, label is added to the network node in personage's keyword network;
Wherein, label includes the attribute information and people entities word of at least one of people entities word and/or keyword And/or people information, the event information related to people entities word and/or keyword of keyword correlation.
Module 430 is excavated, is coupled with structure module 420, suitable for using same label as corporations' partitioning standards, being closed to personage Network node in keyword network carries out corporations' division, real using the arbitrary network node in same corporations as personage in this corporation The creation angle of pronouns, general term for nouns, numerals and measure words.
In an embodiment of the present invention, structure module 420 is further adapted for, and network section is used as using people entities word and keyword Point, the connection weight between connection weight and/or different personage's entity words between people entities word and keyword is side structure Personage's keyword network is built, wherein, connection weight represents the tight ness rating between network node.
In an embodiment of the present invention, structure module 420 is further adapted for, and has added tagged people entities word using preset And/or keyword is matched with the network node in personage's keyword network.If the match is successful, corresponding label is added to The network node that the match is successful, and the connection weight obtained between the network node that the match is successful reaches the network of predetermined threshold value Node.And then the connection weight to getting reaches the network node addition same label of predetermined threshold value.
In an embodiment of the present invention, structure module 420 is further adapted for, based on Predistribution Algorithm in personage's keyword network Network node adds label.Wherein, Predistribution Algorithm includes label propagation algorithm LPA.
In an embodiment of the present invention, parsing module 410 is further adapted for, and the article title in article metadata is extracted, to carrying The article title taken is segmented.And high frequency words are filtered according to the word frequency after participle, and part of speech mark is carried out to the word after filtering Note, wherein, personage's noun is labeled as people entities word, other words related to people entities word are labeled as keyword.
The embodiment of the present invention additionally provides the creation angle excavating gear of another people entities, and Fig. 5 is shown according to this The structural representation of the creation angle excavating gear of the people entities of another embodiment is invented, referring to Fig. 5, the wound of people entities Make angle excavating gear 400 in addition to the modules limited in comprising above-described embodiment, statistical module can also be included 440。
Statistical module 440, coupled respectively with parsing module 410 and structure module 420, suitable for building module 420 with people Thing entity word and keyword are as network node, connection weight and/or different personages between people entities word and keyword Before connection weight between entity word builds personage's keyword network for side, the company between people entities word and keyword is counted Connect the connection weight between weight and/or different personage's entity words.
In an embodiment of the present invention, statistical module 440 is further adapted for, and the article title in article metadata is extracted, from text People entities word and keyword are parsed in chapter title.And the co-occurrence word pair occurred in article title is counted, co-occurrence word is to be same The word pair that the people entities word occurred in one article title forms from keyword, different personage's entity words.And then calculate co-occurrence word To the corresponding weighted value in different articles, and to identical co-occurrence word to the corresponding weighted value summation in different articles, to make For in personage's keyword network with co-occurrence word to the connection weight and/or different people between corresponding people entities word and keyword Connection weight between thing entity word.
In an embodiment of the present invention, statistical module 440 is further adapted for, and extracts the article text in article metadata.And according to The number occurred according to co-occurrence word to the co-occurrence word included in article text, the weighted value of co-occurrence word pair is calculated, wherein, co-occurrence Word is personage's entity word and/or keyword.
According to the combination of any one above-mentioned preferred embodiment or multiple preferred embodiments, the embodiment of the present invention can reach Following beneficial effect:
In embodiments of the present invention, first, article metadata is obtained from network data source, and parses article member number The people entities word and the keyword related to people entities word included in.Then, made using people entities word and keyword Personage's keyword network is built for network node, and label is added to the network node in personage's keyword network.Finally, with phase With label as corporations' partitioning standards, corporations' division is carried out to the network node in personage's keyword network, by same corporations Creation angle of the arbitrary network node as people entities word in this corporation.Thus, the embodiment of the present invention passes through to a large amount of nets The people entities word and keyword that the excavation of network data is included with extracting in article metadata, closed using each word structure personage After keyword network, corporations' division is carried out to the node of the network, so as to be used as people in this corporation by the use of the word in same corporations The creation angle of thing entity word, the cluster of the creation angle of people entities word is realized, so that the creation angle of people entities word is converged Being polymerized to can open-and-shut form.The present invention program can not only provide more novel writing angle for creator, can be with The creation thought of creator is widened, helps creator to create the article of more diversification.
Further, creator is helped to carry out article creation using the scheme of the embodiment of the present invention, for the audient of article Person also has permanent sustainable value.
In the specification that this place provides, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention Example can be put into practice in the case of these no details.In some instances, known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect, Above in the description to the exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor The application claims of shield features more more than the feature being expressly recited in each claim.It is more precisely, such as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following embodiment are expressly incorporated in the embodiment, wherein each claim is in itself Separate embodiments all as the present invention.
Those skilled in the art, which are appreciated that, to be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit exclude each other, it can use any Combination is disclosed to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so to appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power Profit requires, summary and accompanying drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation Replace.
In addition, it will be appreciated by those of skill in the art that although some embodiments in this include institute in other embodiments Including some features rather than further feature, but the combination of the feature of different embodiments means to be in the scope of the present invention Within and form different embodiments.For example, in detail in the claims, the one of any of embodiment claimed all may be used Used in a manner of in any combination.
The all parts embodiment of the present invention can be realized with hardware, or to be run on one or more processor Software module realize, or realized with combinations thereof.It will be understood by those of skill in the art that it can use in practice Microprocessor or digital signal processor (DSP) realize that the creation angle of people entities according to embodiments of the present invention is excavated The some or all functions of some or all parts in device.The present invention is also implemented as being used to perform being retouched here The some or all equipment or program of device (for example, computer program and computer program product) for the method stated. Such program for realizing the present invention can store on a computer-readable medium, or can have one or more signal Form.Such signal can be downloaded from internet website and obtained, either provide on carrier signal or with it is any its He provides form.
The embodiment of the present invention additionally provides a kind of computer program, including computer-readable code, when computer-readable generation Code is when running on the computing device, the creation angle excavation side based on corporations' division that causes computing device described above Method.A kind of computer-readable medium is additionally provided, wherein storing computer program as described above.
For example, Fig. 6 shows the computing device for the creation angle method for digging that can realize people entities.The computing device Conventionally comprise processor 610 and the computer program product or computer-readable medium of the form of memory 620.Memory 620 can be the electricity of such as flash memory, EEPROM (Electrically Erasable Read Only Memory), EPROM, hard disk or ROM etc Quantum memory.There is memory 620 storage to be used for the storage for performing the program code 631 of any method and step in the above method Space 630.For example, the memory space 630 of store program codes can be various in above method including being respectively used to realize Each program code 631 of step.These program codes can be read from one or more computer program product or It is written in this one or more computer program product.These computer program products include such as hard disk, compact-disc (CD), the program code carrier of storage card or floppy disk etc.Such computer program product is usually for example shown in Fig. 7 Portable or static memory cell.The memory cell can have and the similar arrangement of memory 620 in Fig. 6 computing device Memory paragraph, memory space etc..Program code for example can be compressed in a suitable form.Generally, memory cell includes being used for Perform the computer-readable code 631 ' of the method and step of the present invention, you can with the generation read by such as 610 etc processor Code, when these codes are run by computing device, causes each step in the computing device method described above.
It should be noted that the present invention will be described rather than limits the invention for above-described embodiment, and ability Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between bracket should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" before element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of some different elements and being come by means of properly programmed computer real It is existing.In if the unit claim of equipment for drying is listed, several in these devices can be by same hardware branch To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame Claim.
So far, although those skilled in the art will appreciate that detailed herein have shown and described multiple showing for the present invention Example property embodiment, still, still can be direct according to present disclosure without departing from the spirit and scope of the present invention It is determined that or derive many other variations or modifications for meeting the principle of the invention.Therefore, the scope of the present invention is understood that and recognized It is set to and covers other all these variations or modifications.

Claims (10)

1. a kind of creation angle method for digging of people entities, including:
Article metadata is obtained from network data source, and parse the people entities word that is included in the article metadata and The keyword related to the people entities word;
Personage's keyword network is built as network node by the use of the people entities word and the keyword, the personage is closed Network node addition label in keyword network;
Using same label as corporations' partitioning standards, corporations' division is carried out to the network node in personage's keyword network, Creation angle using the arbitrary network node in same corporations as people entities word in this corporation.
2. the method according to claim 11, wherein, it is described to be used as network by the use of the people entities word and the keyword Node builds personage's keyword network, including:
Using the people entities word and keyword as network node, with the connection weight between people entities word and keyword And/or the connection weight between different personage's entity words is that side builds personage's keyword network, wherein, the connection weight represents Tight ness rating between network node.
3. method according to claim 1 or 2, wherein, the network node in personage's keyword network adds Tag, including:
Tagged people entities word and/or keyword and the network section in personage's keyword network are added using preset Point is matched;
If the match is successful, by corresponding label added to the network node that the match is successful, and obtain and the network section that the match is successful Connection weight between point reaches the network node of predetermined threshold value;
Reach the network node addition same label of predetermined threshold value to the connection weight got.
4. according to the method described in claim any one of 1-3, wherein, the network node in personage's keyword network is added Tag, including:
Label is added to the network node in personage's keyword network based on Predistribution Algorithm.
5. according to the method described in claim any one of 1-4, wherein, the Predistribution Algorithm includes:Label propagation algorithm LPA.
6. according to the method described in claim any one of 1-5, wherein, the label includes at least one of:
The attribute information of the people entities word and/or the keyword;
The people information related to the people entities word and/or the keyword;
The event information related to the people entities word and/or the keyword.
7. according to the method described in claim any one of 1-6, wherein, it is described that net is used as using the people entities word and keyword Network node, the connection weight between connection weight and/or different personage's entity words using between people entities word and keyword as Before the structure personage's keyword network of side, in addition to:
Count the connection weight between connection weight and/or the different personage's entity words between the people entities word and keyword Weight.
8. a kind of creation angle excavating gear of people entities, including:
Parsing module, suitable for obtaining article metadata from network data source, and parse and included in the article metadata People entities word and the keyword related to the people entities word;
Module is built, suitable for building personage's keyword net as network node by the use of the people entities word and the keyword Network, label is added to the network node in personage's keyword network;
Module is excavated, suitable for using same label as corporations' partitioning standards, to the network node in personage's keyword network Carry out corporations' division, the creation angle using the arbitrary network node in same corporations as people entities word in this corporation.
9. a kind of computer program, including computer-readable code, when the computer-readable code is run on the computing device When, cause the creation angle that based on corporations divides of the computing device according to claim 1-7 described in any one to be excavated Method.
A kind of 10. computer-readable medium, wherein storing computer program as claimed in claim 9.
CN201710914887.9A 2017-09-30 2017-09-30 Method and device for mining creation angle of character entity Active CN107704572B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710914887.9A CN107704572B (en) 2017-09-30 2017-09-30 Method and device for mining creation angle of character entity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710914887.9A CN107704572B (en) 2017-09-30 2017-09-30 Method and device for mining creation angle of character entity

Publications (2)

Publication Number Publication Date
CN107704572A true CN107704572A (en) 2018-02-16
CN107704572B CN107704572B (en) 2021-07-13

Family

ID=61183245

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710914887.9A Active CN107704572B (en) 2017-09-30 2017-09-30 Method and device for mining creation angle of character entity

Country Status (1)

Country Link
CN (1) CN107704572B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104520A (en) * 2019-11-21 2020-05-05 新华智云科技有限公司 Figure entity linking method based on figure identity
CN112685534A (en) * 2020-12-23 2021-04-20 上海掌门科技有限公司 Method and apparatus for generating context information of authored content during authoring process
CN113220901A (en) * 2021-05-11 2021-08-06 中国科学院自动化研究所 Writing concept auxiliary system and network system based on enhanced intelligence

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7624081B2 (en) * 2006-03-28 2009-11-24 Microsoft Corporation Predicting community members based on evolution of heterogeneous networks using a best community classifier and a multi-class community classifier
CN102393843A (en) * 2011-06-29 2012-03-28 广州市动景计算机科技有限公司 Method and system for establishing relational network of user by using communication information of mobile terminal
CN103327092A (en) * 2012-11-02 2013-09-25 中国人民解放军国防科学技术大学 Cell discovery method and system on information networks
CN103744887A (en) * 2013-12-23 2014-04-23 北京百度网讯科技有限公司 Method and device for people search and computer equipment
CN103942189A (en) * 2014-03-19 2014-07-23 百度在线网络技术(北京)有限公司 Method and device for determining keywords of compositions
CN106372239A (en) * 2016-09-14 2017-02-01 电子科技大学 Social network event correlation analysis method based on heterogeneous network
WO2017026303A1 (en) * 2015-08-12 2017-02-16 国立研究開発法人情報通信研究機構 Future scenario generation device and method, and computer program
CN106682169A (en) * 2016-12-27 2017-05-17 北京奇虎科技有限公司 Application label mining method and device, and application searching method and server
CN106682142A (en) * 2016-12-21 2017-05-17 兰州交通大学 Method for excavating user emotions and analyzing propagation features under specific event situation
CN107016072A (en) * 2017-03-23 2017-08-04 成都市公安科学技术研究所 Knowledge-based inference system and method based on social networks knowledge mapping
CN107133877A (en) * 2017-06-06 2017-09-05 安徽师范大学 The method for digging of overlapping corporations in network
CN107194818A (en) * 2017-04-13 2017-09-22 天津科技大学 Label based on pitch point importance propagates community discovery algorithm

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7624081B2 (en) * 2006-03-28 2009-11-24 Microsoft Corporation Predicting community members based on evolution of heterogeneous networks using a best community classifier and a multi-class community classifier
CN102393843A (en) * 2011-06-29 2012-03-28 广州市动景计算机科技有限公司 Method and system for establishing relational network of user by using communication information of mobile terminal
CN103327092A (en) * 2012-11-02 2013-09-25 中国人民解放军国防科学技术大学 Cell discovery method and system on information networks
CN103744887A (en) * 2013-12-23 2014-04-23 北京百度网讯科技有限公司 Method and device for people search and computer equipment
CN103942189A (en) * 2014-03-19 2014-07-23 百度在线网络技术(北京)有限公司 Method and device for determining keywords of compositions
WO2017026303A1 (en) * 2015-08-12 2017-02-16 国立研究開発法人情報通信研究機構 Future scenario generation device and method, and computer program
CN106372239A (en) * 2016-09-14 2017-02-01 电子科技大学 Social network event correlation analysis method based on heterogeneous network
CN106682142A (en) * 2016-12-21 2017-05-17 兰州交通大学 Method for excavating user emotions and analyzing propagation features under specific event situation
CN106682169A (en) * 2016-12-27 2017-05-17 北京奇虎科技有限公司 Application label mining method and device, and application searching method and server
CN107016072A (en) * 2017-03-23 2017-08-04 成都市公安科学技术研究所 Knowledge-based inference system and method based on social networks knowledge mapping
CN107194818A (en) * 2017-04-13 2017-09-22 天津科技大学 Label based on pitch point importance propagates community discovery algorithm
CN107133877A (en) * 2017-06-06 2017-09-05 安徽师范大学 The method for digging of overlapping corporations in network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BOUTEMINE,OUALID等: "Mining Community Structures in Multidimensional Networks", 《ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA》 *
KIANIAN, SAHAR等: "Semantic community detection using label propagation algorithm", 《JOURNAL OF INFORMATION SCIENCE》 *
蔡国永等: "社会语义网社区发现标签传递算法研究", 《计算机科学》 *
黄攀: "基于主题的社团发现", 《中国优秀硕士学位论文全文数据库(电子期刊)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104520A (en) * 2019-11-21 2020-05-05 新华智云科技有限公司 Figure entity linking method based on figure identity
CN112685534A (en) * 2020-12-23 2021-04-20 上海掌门科技有限公司 Method and apparatus for generating context information of authored content during authoring process
CN113220901A (en) * 2021-05-11 2021-08-06 中国科学院自动化研究所 Writing concept auxiliary system and network system based on enhanced intelligence

Also Published As

Publication number Publication date
CN107704572B (en) 2021-07-13

Similar Documents

Publication Publication Date Title
Kanakaraj et al. Performance analysis of Ensemble methods on Twitter sentiment analysis using NLP techniques
Rangel Pardo et al. Overview of the 3rd Author Profiling Task at PAN 2015
Kaati et al. Detecting multipliers of jihadism on twitter
CN104991899B (en) The recognition methods of user property and device
CN105069021B (en) Chinese short text sensibility classification method based on field
CN112015949A (en) Video generation method and device, storage medium and electronic equipment
CN105824923A (en) Movie and video resource recommendation method and device
Lee et al. Emotion in code-switching texts: Corpus construction and analysis
CN107704572A (en) The creation angle method for digging and device of people entities
CN107203520A (en) The method for building up of hotel&#39;s sentiment dictionary, the sentiment analysis method and system of comment
CN106294473B (en) Entity word mining method, information recommendation method and device
Pöschko Exploring twitter hashtags
CN107924398B (en) System and method for providing a review-centric news reader
US20170046312A1 (en) Using content structure to socially connect users
CN105447144B (en) Microblogging forwarding visual analysis method and system based on big data analysis technology
CN108536676B (en) Data processing method and device, electronic equipment and storage medium
Lv et al. Understanding the users and videos by mining a novel danmu dataset
JP2014153977A (en) Content analysis device, content analysis method, content analysis program, and content reproduction system
Andriotis et al. Smartphone message sentiment analysis
CN107169011A (en) The original recognition methods of webpage based on artificial intelligence, device and storage medium
CN110162793A (en) It is a kind of name entity recognition methods and relevant device
JP6446987B2 (en) Video selection device, video selection method, video selection program, feature amount generation device, feature amount generation method, and feature amount generation program
Kutuzov et al. Cross-Lingual Trends Detection for Named Entities in News Texts with Dynamic Neural Embedding Models.
Phuvipadawat et al. Detecting a multi-level content similarity from microblogs based on community structures and named entities
Handler et al. Relational summarization for corpus analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant