CN111737590A - Social relationship mining method and device, electronic equipment and storage medium - Google Patents

Social relationship mining method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111737590A
CN111737590A CN202010442783.4A CN202010442783A CN111737590A CN 111737590 A CN111737590 A CN 111737590A CN 202010442783 A CN202010442783 A CN 202010442783A CN 111737590 A CN111737590 A CN 111737590A
Authority
CN
China
Prior art keywords
conversation
queue
real
dialog
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010442783.4A
Other languages
Chinese (zh)
Other versions
CN111737590B (en
Inventor
王鹏
刘春阳
张丽
张旭
张翔宇
陈志鹏
解峥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Computer Network and Information Security Management Center filed Critical National Computer Network and Information Security Management Center
Priority to CN202010442783.4A priority Critical patent/CN111737590B/en
Publication of CN111737590A publication Critical patent/CN111737590A/en
Application granted granted Critical
Publication of CN111737590B publication Critical patent/CN111737590B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a social relationship mining method and device, electronic equipment and a storage medium. The method comprises the following steps: acquiring conversation flow data of a group; dividing the conversation flow data into a plurality of conversation queues, wherein the time span of each conversation queue is smaller than or equal to a time threshold value; determining a conversation queue forming a real conversation scene according to the context correlation degree of the conversation information in each conversation queue; and extracting the user corresponding to the conversation queue forming the real conversation scene as the user with social relation. Based on the method and the device, the conversation scene can be restored, the conversation users can be mapped more accurately, and the social relationship of the users is mined.

Description

Social relationship mining method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data mining technologies, and in particular, to a social relationship mining method and apparatus, an electronic device, and a storage medium.
Background
With the rapid development of the internet, group chat becomes the most dominant way for multiple people to conduct interactive discussions, and the generated conversation information stream usually does not contain linear discussion of a single topic, but a multi-thread conversation stream with multiple topics interleaved together. These conversational flows imply rich semantic information and social relationship information. However, these streams are very interleaved, often short and incomplete, and contain some extraneous information, which makes it difficult to extract valuable information directly and efficiently. This presents a huge challenge to accurate mapping dialog users, mining user social relationships.
Disclosure of Invention
An object of the present invention is to solve at least the above problems and/or disadvantages and to provide at least the advantages described hereinafter.
The invention provides a social relationship mining method and device, and based on the method and device, a dialogue user can be mapped more accurately, and the social relationship of the user is mined.
In a first aspect, a social relationship mining method is provided, including:
acquiring conversation flow data of a group;
dividing the conversation flow data into a plurality of conversation queues, wherein the time span of each conversation queue is smaller than or equal to a time threshold value;
determining a conversation queue forming a real conversation scene according to the context correlation degree of the conversation information in each conversation queue;
and extracting the user corresponding to the conversation queue forming the real conversation scene as the user with social relation.
Optionally, before dividing the dialog flow data into a plurality of dialog queues, the method further includes:
determining a conversation period according to the release condition of the conversation information in the conversation flow data along with the time;
and the value of the time threshold is determined according to the conversation period.
Optionally, the determining a session period according to a distribution condition of session information in the session stream data over time includes:
determining at least two conversation boundaries according to the release condition of the conversation information in the conversation flow data along with the time, wherein the at least two conversation boundaries comprise at least one group of conversation starting boundaries and conversation ending boundaries;
determining the dialog period based on a time span between the at least one set of dialog start boundaries and dialog end boundaries.
Optionally, determining at least two session boundaries according to a distribution condition of the session information in the session stream data over time includes:
and calculating a minimum inflection point of the dialog information issuing rate in the dialog flow data, and taking the calculated minimum inflection point as the dialog boundary.
Optionally, the dividing the dialog flow data into a plurality of dialog queues includes:
selecting the conversation flow data by using a sliding time window, and forming each conversation queue by using the conversation flow data selected each time by using the sliding time window;
wherein the length of the sliding time window coincides with the time threshold.
Optionally, the determining a dialog queue constituting a real dialog scene according to the contextual relevance of the dialog information in each dialog queue includes:
matching the dialogue information in each dialogue queue into a dialogue pair;
and determining the conversation queue forming the real conversation scene according to the context correlation degree between the conversation pairs in the conversation queues.
Optionally, the determining a dialog queue constituting a real dialog scene according to the contextual relevance of the dialog information in each dialog queue includes:
after the dialogue information in each dialogue queue is matched into a dialogue pair in pairs, the dialogue pairs formed by the dialogue information issued by the same user are removed from the dialogue queue.
Optionally, the determining a dialog queue constituting a real dialog scene according to the contextual relevance between dialog pairs in each dialog queue includes:
determining a real conversation pair according to the content similarity of two pieces of conversation information contained in each conversation pair in each conversation queue;
and determining the conversation queue forming the real conversation scene according to the topic distribution condition of the real conversation pair in each conversation queue.
Optionally, the determining a real dialog pair according to the content similarity of two pieces of dialog information included in each dialog pair in each dialog queue includes:
performing topic identification on two pieces of dialogue information contained in each dialogue pair column;
and if the topics of the two pieces of dialogue information contained in any dialogue pair are the same, judging that the dialogue pair is a real dialogue pair.
Optionally, the determining, according to the topic distribution of the real conversation pair in each conversation queue, a conversation queue constituting a real conversation scene includes:
and if the frequency of the occurrence frequency of the topic of one real conversation pair in any conversation queue exceeds a frequency threshold value, removing the real conversation pairs of other topics from the conversation queue, and taking the conversation queue formed by the real conversation pairs of the same topic as the conversation queue forming a real conversation scene.
Optionally, after extracting the user corresponding to the conversation queue constituting the real conversation scene as the user having the social relationship, the method further includes:
extracting an interactive user group in the conversation queue forming the real conversation scene, wherein the interactive user group is two users corresponding to at least one real conversation pair in the conversation queue forming the real conversation scene;
and calculating the average value of the issuing time difference of the at least one real conversation pair corresponding to the interactive user group according to the issuing time difference between the two pieces of conversation information in each real conversation pair corresponding to the interactive user group, and taking the average value as the relationship distance between the two users in the interactive user group.
Optionally, the method further comprises:
and extracting the user who publishes the @ symbol in the dialog flow data and the other user pointed by the @ symbol as users with social relations.
Optionally, after the user with the social relationship is extracted, the method further includes:
and constructing a social network based on the extracted users with social relations.
In a second aspect, a social relationship mining apparatus is provided, including:
the conversation flow data acquisition module is used for acquiring conversation flow data of the group;
the conversation queue dividing module is used for dividing the conversation flow data into a plurality of conversation queues, wherein the time span of each conversation queue is less than or equal to a time threshold;
the real conversation scene determining module is used for determining a conversation queue forming a real conversation scene according to the context correlation degree of the conversation information in each conversation queue;
and the first user extraction module is used for extracting the user corresponding to the conversation queue forming the real conversation scene as the user with the social relation.
Optionally, the apparatus further comprises:
the conversation period determining module is used for determining a conversation period according to the release condition of the conversation information in the conversation flow data along with the time; and the value of the time threshold is determined according to the conversation period.
Optionally, the dialog period determination module includes:
the conversation boundary determining submodule is used for determining at least two conversation boundaries according to the distribution condition of the conversation information in the conversation flow data along with time, wherein the at least two conversation boundaries comprise at least one group of conversation starting boundary and conversation ending boundary;
and the conversation period determining submodule is used for determining the conversation period according to the time span between the at least one group of conversation starting boundaries and the conversation ending boundaries.
Optionally, the dialog boundary determining submodule is specifically configured to calculate a minimum inflection point of a dialog information distribution rate in the dialog flow data, and use the calculated minimum inflection point as the dialog boundary.
Optionally, the session queue dividing module is specifically configured to select the session stream data by using a sliding time window, and form each session queue by using the session stream data selected each time by using the sliding time window; wherein the length of the sliding time window coincides with the dialog period.
Optionally, the real dialog scenario determination module includes:
the dialogue pair matching submodule is used for matching the dialogue information in each dialogue queue into a dialogue pair pairwise;
and the real conversation scene determining submodule is used for determining the conversation queue forming the real conversation scene according to the context correlation between the conversation pairs in each conversation queue.
Optionally, the real dialog scenario determination module includes:
and the dialogue pair removing module is used for removing dialogue pairs formed by dialogue information issued by the same user from the dialogue queues after matching the dialogue information in the dialogue queues pairwise into a dialogue pair.
Optionally, the real dialog scene determination submodule includes:
the real conversation pair determining unit is used for determining a real conversation pair according to the content similarity of two pieces of conversation information contained in each conversation pair in each conversation queue;
and the real conversation scene determining unit determines the conversation queue forming the real conversation scene according to the topic distribution condition of the real conversation pair in each conversation queue.
Optionally, the real dialog pair determining unit includes:
the topic identification unit is used for carrying out topic identification on two pieces of dialogue information contained in each dialogue pair column;
and the real conversation pair judging unit is used for judging that any conversation pair is a real conversation pair if the topics of the two pieces of conversation information contained in the conversation pair are the same.
Optionally, the real dialog scene determining unit includes:
the real dialogue pair topic counting unit is used for counting the topics of the real dialogue pairs in each dialogue pairing column;
and the real conversation scene construction unit is used for removing real conversation pairs of other topics from the conversation pair column if the frequency of occurrence of the topic of one of the real conversation pairs in any conversation queue exceeds a frequency threshold value, and taking a conversation queue formed by the real conversation pairs of the same topic as a conversation queue forming a real conversation scene.
Optionally, the apparatus further comprises:
the interactive user group extraction module is used for extracting an interactive user group in the conversation queue forming the real conversation scene, wherein the interactive user group comprises two users corresponding to at least one real conversation pair in the conversation queue forming the real conversation scene;
and the relationship distance calculation module is used for calculating the average value of the issuing time difference of the at least one real conversation pair corresponding to the interactive user group according to the issuing time difference between two pieces of conversation information in each real conversation pair corresponding to the interactive user group, and the average value is used as the relationship distance between two users in the interactive user group.
Optionally, the apparatus further comprises:
and the second user extraction module is used for extracting the user who publishes the @ symbol and the other user to which the @ symbol points in the dialog flow data as users with social relations.
Optionally, the apparatus further comprises:
and the social network building module is used for building a social network based on the extracted users with social relations.
In a third aspect, an electronic device is provided, including: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method described above.
In a fourth aspect, a storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the method described above.
The invention at least comprises the following beneficial effects:
the social relationship mining method and device provided by the embodiment of the invention divide acquired conversation flow data of a group into a plurality of conversation queues, and the time span of each conversation queue is less than or equal to a time threshold value in the dividing process, then the conversation alignment forming a real conversation scene is determined according to the context correlation degree of the conversation information in each conversation queue, and the conversation queue forming the real conversation scene is extracted to extract a corresponding user as a user with a social relationship. Based on the method and the device, the conversation scene can be restored, the conversation users can be mapped more accurately, and the social relationship of the users is mined.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Drawings
Fig. 1 is a schematic view of an application scenario of a social relationship mining method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a social relationship mining method according to an embodiment of the present invention;
FIG. 3 is a user-group profile provided by one embodiment of the present invention;
fig. 4(a) -fig. 4(b) are graphs of distribution number-time of dialog messages according to an embodiment of the present invention;
FIG. 5 is a flowchart of a social relationship mining method according to another embodiment of the present invention;
FIG. 6 is a diagram illustrating a social relationship mining method according to another embodiment of the present invention;
FIG. 7 is a schematic diagram of a social network provided by yet another embodiment of the present invention;
fig. 8 is a schematic structural diagram of a social relationship mining device according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention is further described in detail below with reference to the attached drawings so that those skilled in the art can implement the invention by referring to the description text.
Group chat data (namely conversation stream data) has the characteristic of strong interlacing, conversation information in the conversation stream data appears in an interlaced mode, adjacent conversation information may discuss different topics, and if user relations are organized and analyzed according to the time sequence of the conversation stream data, text semantics are ignored, so that the user relations cannot be extracted accurately. The social relationship mining method and device provided by the embodiment of the invention are used for mining and analyzing the social relationship of the user based on the conversation scene, so that the conversation user can be more accurately mapped and the social relationship of the user can be mined.
Fig. 1 is a schematic view of an application scenario of a social relationship mining method according to an embodiment of the present invention. The implementation scenario includes the terminal 110, the terminal 120, the terminal 130, and the server 140, where the terminal 110, the terminal 120, the terminal 130, and the server 140 perform data communication through a communication network, optionally, the communication network may be a wired network or a wireless network, and the communication network may be at least one of a local area network, a geographic area network, and a wide area network.
The terminals 110, 120, and 130 are electronic devices installed with an instant messaging application, and the electronic devices may be a smart phone, a tablet computer, a personal portable computer, and the like, which is not limited in the present invention. The instant messaging application may be a native instant messaging application of the system in the terminal 110, the terminal 120, or the terminal 130, or a third party instant messaging application downloaded through a network, and the source of the instant messaging application is not limited in the present invention.
The server 140 may be implemented as one server, or may be implemented as a server cluster formed by a group of servers, which may be physical servers or cloud servers. In some embodiments, server 140 is a background server for the instant messaging application in terminal 110, terminal 120, or terminal 130.
In some embodiments, different users perform group chatting using the instant messenger applications in the terminals 110, 120, and 130, respectively, and the instant messenger applications transmit group chatting data (i.e., conversation stream data) input in real time to the server 140, and the server 140 receives the conversation stream data. Specifically, the server 140 collects the conversation flow data of the target group every day, and the server distributes the conversation flow data of each day to the ports of the server equally by using the pull-push scheme in the transport layer ZMQ. After the server receives the conversation flow data, the conversation flow data of each group is stored in the corresponding table, namely, each group is provided with a conversation flow data storage table. The database can adopt an Elasticissearch distributed storage database to realize large-scale data rapid processing and storage. The server 140 analyzes the conversation stream data according to the received conversation stream data, restores the conversation scene, maps the conversation users more accurately, and mines the social relations of the users.
Fig. 2 is a flowchart of a social relationship mining method according to an embodiment of the present invention, where the method is executed by a system with processing capability, a server, or a social relationship mining device. As shown in fig. 2, the method includes:
step 210, obtaining conversation flow data of the group.
The conversation stream data of the group may be QQ group chat data, wechat group chat data, or other instant messaging group chat data, which is not limited by the invention.
Fig. 3 is a user-group distribution diagram according to an embodiment of the present invention. The distribution of the user-groups is analyzed in connection with fig. 3 below.
Group chat data of nine target groups (QQ groups) from 5 months in 2019 to 8 months in 2019 are obtained first. Then, statistics is carried out on the obtained chat data of the nine target groups, 8497 users are in total, the average number of users in the group is 944 users, the total number of speeches is 18066, and each user speeches for 2 times on average. Based on the statistical analysis of the group to which the user belongs, 90% of the users join only one group (as shown in fig. 3), so that when analyzing the chat data, only the users and their session information in the same group can be considered, in other words, only the users and their session information in the same group can be considered, and the mining of the social relationship of the users can be realized. Based on the above analysis, preferably, the same group of conversation flow data is selected for analysis mining of the user social relationships.
For the acquired conversation stream data of the group, each piece of conversation stream data at least comprises the group where the conversation stream data (namely conversation information) is located, a publishing user, a publishing time and text content. In order to improve the processing efficiency of the conversation flow data in the subsequent steps and improve the mining efficiency of the social relationship of the user on the whole, the acquired conversation flow data is preprocessed. The preprocessing includes word segmentation, removal of stop words, non-Chinese characters and single words, and removal of link URLs and emoticons using regular expressions.
Step 220, dividing the dialog flow data into a plurality of dialog queues, wherein the time span of each dialog queue is less than or equal to the time threshold.
The observation and analysis of the large amount of group conversation stream data reveals that the group chat information has a clustering effect on the sending time. That is, no session information appears in the group for most of the time period, but a large amount of group chat information appears in the group for a small portion of the time period. A session message that appears later in time is often a response to a session message that appears earlier in time. Therefore, for two pieces of conversation information which are adjacent in time sequence in the group, the possibility of information interaction between the two pieces of conversation information is generally considered, namely, the two pieces of conversation information can form a real conversation and belong to a real conversation scene. In addition, for two pieces of conversation information which are not adjacent in time sequence in the group, for example, a certain time interval exists between the two pieces of conversation information, but if the time interval between the two pieces of conversation information meets the requirement of a certain time range, the two pieces of conversation information may also have the possibility of information interaction, that is, a real conversation may be formed and belong to a real conversation scene. Based on the above analysis, after dividing the dialog flow data into a plurality of dialog queues and requiring that the time span of each dialog queue is less than or equal to the time threshold, a plurality of dialog queues that may constitute a real dialog scene may be obtained, i.e., each dialog queue may constitute a real dialog scene.
Since the session stream data is data having a time-series characteristic, the session queue is divided based on the session stream data, and the session queue also has a time-series characteristic in which a plurality of pieces of session information included in the session queue are a plurality of pieces of session information distributed in chronological order. The time span of the dialog queue can be understood as the difference in issue time between the last dialog information and the first dialog information contained in the dialog queue. In addition, the distribution time difference between any two pieces of dialogue information can be understood as the dialogue distance between the two pieces of dialogue information. The distribution time difference between any two pieces of dialogue information is calculated by the following formula:
Distancel1l2=tl1-tl2
wherein, tl1Is the time of issuance of the dialog information L1, tl2Is the time of issuance of the dialogue information L2.
In some embodiments, partitioning conversation stream data into a plurality of conversation queues includes: selecting conversation flow data by using a sliding time window, and forming each conversation queue by adopting the conversation flow data selected each time by using the sliding time window, wherein the length of the sliding time window is consistent with a time threshold. Each time the sliding time window stays at one position, the issuing time difference between the head and the tail of the dialogue flow data in the dialogue flow data inside the sliding time window is necessarily smaller than or equal to the length of the sliding time window. Therefore, by sliding the time window to traverse all the dialog flow data, all the dialog flow data can be gradually divided into a plurality of dialog columns, and the time span of each dialog column is smaller than or equal to the time threshold. Dividing the dialog flow data into a plurality of dialog columns by using the sliding time window helps to improve the efficiency of the dialog flow data division.
The step size of the sliding time window may be set. For example, there are 10 pieces of dialog information, and one piece of dialog information is issued every 5 seconds, and the length of the sliding time window is 20 seconds. When the step length of the sliding time window is 1, that is, the sliding time window advances one dialog message each time, the dialog queue formed by the dialog flow data selected at the 1 st time of the sliding time window contains the 1 st to 5 th dialog messages, and the dialog queue formed by the dialog flow data selected at the 2 nd time of the sliding time window contains the 2 nd to 6 th dialog messages, that is, the same dialog messages are contained between two adjacent dialog columns. When the step length of the sliding time window is set to 5, the dialog queue formed by the dialog flow data selected at the 1 st time of the sliding time window contains 1 st to 5 th pieces of dialog information, and the dialog queue formed by the dialog flow data selected at the 2 nd time of the sliding time window contains 6 th to 10 th pieces of dialog information, and under the latter setting, the two dialog queues do not contain the same dialog information. When the step size of the sliding time window is smaller, the divided conversation queues are more, which is more beneficial to finding out a real conversation scene, but also can cause the increase of the operation amount.
It is understood that, since the dialog messages are not distributed at the same time interval, the number of dialog messages selected each time the sliding time window is selected may be different, resulting in different numbers of dialog messages included in the dialog pair.
In some embodiments, before dividing the conversation stream data into a plurality of conversation queues, the social relationship mining method further comprises: determining a conversation period according to the release condition of the conversation information in the conversation flow data along with the time; wherein, the value of the time threshold is determined according to the conversation period.
Further, determining a conversation period according to the distribution condition of the conversation information in the conversation flow data over time, including:
step 1) determining at least two conversation boundaries according to the distribution condition of the conversation information in the conversation flow data along with time, wherein the at least two conversation boundaries comprise at least one group of conversation starting boundaries and conversation ending boundaries.
Fig. 4(a) -fig. (b) are diagrams of distribution numbers of session information (which may also be referred to as speaking numbers) -time distribution diagrams according to an embodiment of the present invention. When analyzing the user-group distribution, group chat data of nine target groups (QQ groups) from 5 months in 2019 to 8 months in 2019 are obtained. Here, the analysis of the distribution of the number of times of conversation information distribution is also performed based on the above data. And (b) counting the number of the dialog information releases per hour within 24 hours of a natural day of each group, wherein fig. 4(a) is a dialog information distribution number-time distribution diagram of one group in 2019, 5, month and 14 days per hour, and fig. 4(b) is a dialog information distribution number-time distribution diagram of one group in 5, month and 9 days per hour. Through the analysis of fig. 4(a) and 4(b), it is found that the conversations in the group have a life cycle, which conforms to the development rule of things, i.e., a relatively obvious beginning-climax-ending rule exists, and the character is quantified by the number of the published conversations, i.e., the number of the published conversations gradually increases to the maximum value along with the development of time, and gradually decreases after a period of time. Based on this analysis, the boundaries of the dialog can be determined based on the distribution of the dialog information in the dialog flow data over time.
To group conversation flow data F, function is set
Figure BDA0002504756370000111
The number of session messages issued for the session stream data F from the time of occurrence to time t, function
Figure BDA0002504756370000112
For the session information distribution rate of the session stream data from F to t, according to the life cycle rule of the sessions in the group, the session boundary can be defined as the minimum inflection point of the session information distribution rate in the session stream data of the group, and can be calculated by the following formula:
Figure BDA0002504756370000113
wherein the content of the first and second substances,
Figure BDA0002504756370000114
and the minimum inflection point of the dialogue information issuing rate of the dialogue flow data from F to t is obtained.
Based on the above calculation process, the determined dialog boundaries include a dialog start boundary and a dialog end boundary. The session start boundary refers to a start point of a curve formed by the number of session information distributed with respect to time, and correspondingly, the session end boundary refers to an end point of the same curve. Further, by counting the number of times of distribution of the dialogue information before and after each minimum inflection point and/or the dialogue information distribution rate, it is possible to determine whether each minimum inflection point is a dialogue start boundary or a dialogue end boundary. Specifically, when the number of dialog information releases before a minimum inflection point is 0, the minimum inflection point is a dialog start boundary of a current segment of dialog, and a minimum inflection point immediately following the minimum inflection point is a dialog end boundary. If the number of dialog information releases after the minimum value inflection point determined as the dialog ending boundary is greater than 0, the minimum value inflection point determined as the dialog ending boundary is simultaneously used as the dialog starting boundary of the next dialog. Generally, the first minimum inflection point of the minimum inflection points calculated based on the dialog flow data of one natural day of the group is the dialog start boundary of the first section of dialog. However, the determination of the session start boundary and the session end boundary may be implemented according to other mathematical methods, and the present invention is not limited thereto.
Step 2) determining a dialog period based on the time span between at least one set of dialog start boundaries and dialog end boundaries.
Based on each set of session start and end boundaries, a time span between the two can be derived. As can be seen from fig. 4(a) and 4(b), the time span of each session is not exactly the same. In view of this, in order to relatively accurately reflect the life cycle rule of the conversations in the group, the counted time spans between the conversation start boundary and the conversation end boundary are summed up, and the average value of the time spans is calculated as the conversation period. In one specific embodiment, if the calculated dialog period is 3 hours, the time threshold is determined to be 3 hours. Correspondingly, when the dialog flow data is selected by adopting the sliding time window, the length of the sliding time window is also 3 hours.
Step 230, determining a dialog queue constituting a real dialog scene according to the context correlation of the dialog information in each dialog queue.
In practical application, the dialog information in the real dialog scene not only meets a certain time condition (the release time difference between two pieces of dialog information in a dialog queue is less than or equal to a time threshold, that is, the time span of the dialog queue is less than or equal to the time threshold), but also needs to have semantic level correlation. Based on this, in this step, each dialog queue is analyzed from the semantic level, that is, the contextual relevance of the dialog information in each dialog queue is analyzed, so as to determine the dialog queue constituting the real dialog scene.
In some embodiments, determining a dialog queue that constitutes a real dialog scene according to the contextual relevance of the dialog information in each dialog queue comprises:
(1) and matching the dialog information in each dialog queue pairwise into a dialog pair.
In practical applications, the real dialog scene is established on the basis of the dialog between the users. Therefore, based on the dialog pair formed by two pieces of dialog information in the dialog pair column, whether the dialog pair column constitutes a real dialog scene can be determined more quickly and effectively by measuring the context correlation degree between the dialog pairs in the dialog queue.
Further, after matching the dialogue information in each dialogue queue pairwise into a dialogue pair, the dialogue pairs formed by the dialogue information issued by the same user are removed from the dialogue queue. In order to more quickly mine the social relationship between different users, the conversation pairs formed by the conversation information from the same user are eliminated. It should be understood that, based on the above operation, the dialog queue only contains dialog pairs composed of dialog information issued by different users, and then, when analyzing the context correlation between the dialog pairs in each dialog pair column, the dialog queue is also used for the dialog pairs composed of dialog information issued by different users.
(2) And determining the conversation queue forming the real conversation scene according to the context correlation degree between the conversation pairs in the conversation queues.
In some examples, determining a conversation queue comprising a real conversation scenario according to contextual relevance between pairs of conversations in each conversation queue includes:
step 1) determining a real conversation pair according to the content similarity of two pieces of conversation information contained in each conversation pair in each conversation queue.
In practical applications, two real dialogs with information interaction are necessarily related in terms of existence subject, so that the judgment of the real dialog pair is realized based on the subjects of the two pieces of dialog information in each dialog pair. Specifically, performing topic identification on two pieces of dialogue information contained in each dialogue pair column; and if the topics of the two pieces of dialogue information contained in any dialogue pair are the same, judging that the dialogue pair is a real dialogue pair. Correspondingly, if the two pieces of dialogue information contained in any dialogue pair have different subjects, the dialogue pair is judged not to be a real dialogue pair, and the dialogue pair can be removed from the dialogue queue.
The topic recognition method of the short text topic can be adopted to perform the topic recognition on the dialog information, and other methods can also be adopted, and the invention is not limited herein. It should be understood that the analysis process performed on the content similarity of the dialog messages is actually performed on the text content of the dialog messages.
And 2) determining the conversation queue forming the real conversation scene according to the topic distribution condition of the real conversation pair in each conversation queue.
In some embodiments, determining a dialog queue constituting a real dialog scene according to the topic distribution of the real dialog pair in each dialog queue includes: counting the topics of the real conversation pairs in each conversation pair column; and if the frequency of the occurrence frequency of the topic of one real conversation pair in any conversation queue exceeds a frequency threshold value, removing the real conversation pairs of other topics from the conversation queue, and taking the conversation queue formed by the real conversation pairs of the same topic as the conversation queue forming a real conversation scene.
Specifically, the topics of two pieces of dialog information included in the real dialog pair are used as the topics of the real dialog pair, and the topics of the real dialog pair in the dialog queue are counted. Since one dialog pair column includes a plurality of real dialog pairs, and the topics of different real dialog pairs may be the same or different, the number of occurrences of different topics and the total number of topics (actually, the total number of topics is equal to the number of real dialog pairs) may be counted. And calculating the ratio of the occurrence times of the topics to the total number of the topics, wherein the ratio represents the occurrence frequency of the topics in the dialog list. If the ratio of one topic in the conversation queue to the total number of topics exceeds a set frequency threshold (the topic is referred to as a conversation scene topic hereinafter), the topic distribution in the conversation queue is considered to be concentrated, real conversation pairs of other topics different from the conversation scene topic are removed from the conversation queue, and real conversation pairs corresponding to the conversation scene topic are retained in a conversation pair column, so that a conversation pair column of a real conversation scene is obtained. If the occurrence frequency of a theme which does not exist in the dialogue alignment exceeds the set frequency threshold, the theme distribution in the dialogue alignment is considered to be dispersed, and a real dialogue scene is not formed. The set frequency threshold may be set according to actual conditions, and is not particularly limited herein.
Step 240, extracting the users corresponding to the conversation queue forming the real conversation scene as the users with social relationship.
And extracting users corresponding to the conversation lists forming the real conversation scene, wherein the users are considered to be in the same real conversation scene and have social relations.
In some implementations, users with social relationships may also be extracted from the conversation flow data based on the "@" relationship. Specifically, the user who publishes the @ symbol and another user to which the @ symbol points in the dialog flow data are extracted as users having a social relationship. It should be noted that two users are considered to be users having a social relationship as long as an "@" relationship exists between the two users.
Further, based on the extracted users having social relationships, a social network may be constructed. In a social network, a user is represented by a network node, and an edge connecting two network nodes represents that a social relationship exists between the two users. It should be understood that the "extracted users having social relationships" herein includes determining a conversation queue constituting a real conversation scene from among the plurality of divided conversation queues by dividing the conversation stream data, and further includes users having social relationships extracted based on the "@" relationship based on the social relationships extracted from the conversation queue constituting the real conversation scene.
Fig. 5 is a flowchart of a social relationship mining method according to another embodiment of the present invention. The social relationship mining method comprises the following steps:
step 510, obtaining conversation flow data of the group.
Step 520, dividing the dialog flow data into a plurality of dialog queues, wherein the time span of each dialog queue is less than or equal to a time threshold.
Step 530, determining a dialog queue forming a real dialog scene according to the context correlation of the dialog information in each dialog queue.
And 540, extracting the users corresponding to the conversation queues forming the real conversation scene as the users with social relations.
Step 550, extracting an interactive user group in the dialog queue forming the real dialog scene, wherein the interactive user group is two users corresponding to at least one real dialog pair in the dialog queue forming the real dialog scene.
When two users correspond to at least one real conversation pair in the conversation queue forming the real conversation scene, the two users are indicated to have direct conversation.
And 560, calculating an average value of the issuing time differences of the at least one real conversation pair corresponding to the interactive user group as a relationship distance between two users in the interactive user group according to the issuing time differences between two pieces of conversation information in each real conversation pair corresponding to the interactive user group.
In this step, a relationship distance between two interactive users is calculated to characterize the strength of the social relationship between the two interactive users. Specifically, the distribution time difference between two pieces of session information in each real session pair is calculated first, and then the average value of all the distribution time differences is calculated. It is understood that, when the interactive user group only has one real dialog pair in the dialog pair column, the issuing time difference between two pieces of dialog information in the real dialog pair is actually equal to the average value of the issuing time differences. When the calculated average value of the release time difference is smaller, the relationship distance between two users in the interactive user group is smaller, which shows that one user in the interactive user group responds to the other user faster in the conversation process, and indicates that the social relationship between the two users in the interactive user group is stronger (the strength of the social relationship is stronger).
Fig. 6 is a flowchart of a social relationship mining method according to yet another embodiment of the present invention. The social relationship mining method comprises the following steps:
step 610, obtaining conversation flow data of the group.
Step 620, dividing the dialog flow data into a plurality of dialog queues, wherein the time span of each dialog queue is less than or equal to a time threshold.
Step 630, determining the dialog queue constituting the real dialog scene according to the context correlation of the dialog information in each dialog queue.
And step 640, extracting the users corresponding to the conversation queues forming the real conversation scene as the users with social relations.
And 650, extracting an interactive user group in the conversation queue forming the real conversation scene, wherein the interactive user group is two users corresponding to at least one real conversation pair in the conversation queue forming the real conversation scene.
Step 660, calculating an average value of the issuing time differences of the at least one real conversation pair corresponding to the interactive user group according to the issuing time differences between the two pieces of conversation information in each real conversation pair corresponding to the interactive user group, and taking the average value as a relationship distance between the two users in the interactive user group.
Step 670, building a social network based on the extracted users with social relationships.
A social network is constructed based on the relationship distances calculated in step 660. Specifically, all the calculated relationship distances may be subjected to normalization processing, and then a social network may be constructed based on the relationship distances after normalization. In the social network, the user is represented by network nodes, and the side length is determined by the relationship distance after normalization.
Fig. 7 is a schematic diagram of a social network according to another embodiment of the present invention. Fig. 7 is an undirected graph with users as network nodes and edges representing user relationships. Where the length of the edge represents the strength of the social relationship between the two users. The shorter the length of the edge, the stronger the social relationship between the two users, and vice versa the weaker the social relationship between the two users.
In summary, in the social relationship mining method provided in one or more embodiments of the present invention, the obtained conversation stream data of the group is divided into a plurality of conversation queues, and in the dividing process, the time span of each conversation queue is less than or equal to a time threshold, and then the conversation queue constituting the real conversation scene is determined according to the context correlation degree of the conversation information in each conversation queue, and the conversation queue constituting the real conversation scene is extracted to extract the corresponding user as the user having the social relationship. Based on the method, dialog extraction can be carried out from two levels of structure and semantics of dialog flow data, namely, the time sequence characteristics of dialog in the same scene are considered on the structural level, the context correlation degree of dialog in the same dialog scene is considered on the semantic level, the possibility of forming a dialog relation between dialog information is measured, the dialog scene can be restored through the fusion of the two levels of structure and semantics, dialog users can be mapped more accurately, and the user relation is extracted.
Fig. 8 is a schematic structural diagram of a social relationship mining device according to an embodiment of the present invention. As shown in fig. 8, the social relationship mining device 8000 includes: a conversation flow data acquiring module 8010, configured to acquire conversation flow data of a group; a conversation queue dividing module 8020, configured to divide the conversation flow data into a plurality of conversation queues, where a time span of each conversation queue is less than or equal to a time threshold; a real conversation scene determining module 8030, configured to determine a conversation queue constituting a real conversation scene according to context correlation of conversation information in each conversation queue; the first user extracting module 8040 is configured to extract a user corresponding to the conversation queue forming the real conversation scene as a user with a social relationship.
In some embodiments, the apparatus further comprises: the conversation period determining module is used for determining a conversation period according to the release condition of the conversation information in the conversation flow data along with the time; and the value of the time threshold is determined according to the conversation period.
In some embodiments, the dialog period determination module comprises: the conversation boundary determining submodule is used for determining at least two conversation boundaries according to the distribution condition of the conversation information in the conversation flow data along with time, wherein the at least two conversation boundaries comprise at least one group of conversation starting boundary and conversation ending boundary; and the conversation period determining submodule is used for determining the conversation period according to the time span between the at least one group of conversation starting boundaries and the conversation ending boundaries.
In some embodiments, the dialog boundary determining submodule is specifically configured to calculate a minimum inflection point of a dialog information distribution rate in the dialog flow data, and use the calculated minimum inflection point as the dialog boundary.
In some embodiments, the session queue partitioning module is specifically configured to select the session stream data by using a sliding time window, and form each session queue by using the session stream data selected each time by using the sliding time window; wherein the length of the sliding time window coincides with the dialog period.
In some embodiments, the real dialog scenario determination module comprises: the dialogue pair matching submodule is used for matching the dialogue information in each dialogue queue into a dialogue pair pairwise; and the real conversation scene determining submodule is used for determining the conversation queue forming the real conversation scene according to the context correlation between the conversation pairs in each conversation queue.
In some embodiments, the real dialog scenario determination module comprises: and the dialogue pair removing module is used for removing dialogue pairs formed by dialogue information issued by the same user from the dialogue queues after matching the dialogue information in the dialogue queues pairwise into a dialogue pair.
In some embodiments, the real dialog scenario determination submodule includes: the real conversation pair determining unit is used for determining a real conversation pair according to the content similarity of two pieces of conversation information contained in each conversation pair in each conversation queue; and the real conversation scene determining unit determines the conversation queue forming the real conversation scene according to the topic distribution condition of the real conversation pair in each conversation queue.
In some embodiments, the real dialog pair determination unit includes: the topic identification unit is used for carrying out topic identification on two pieces of dialogue information contained in each dialogue pair column; and the real conversation pair judging unit is used for judging that any conversation pair is a real conversation pair if the topics of the two pieces of conversation information contained in the conversation pair are the same.
In some embodiments, the real dialog scenario determination unit includes: the real dialogue pair topic counting unit is used for counting the topics of the real dialogue pairs in each dialogue pairing column; and the real conversation scene construction unit is used for removing real conversation pairs of other topics from the conversation pair column if the frequency of occurrence of the topic of one of the real conversation pairs in any conversation queue exceeds a frequency threshold value, and taking a conversation queue formed by the real conversation pairs of the same topic as a conversation queue forming a real conversation scene.
In some embodiments, the apparatus further comprises: the interactive user group extraction module is used for extracting an interactive user group in the conversation queue forming the real conversation scene, wherein the interactive user group comprises two users corresponding to at least one real conversation pair in the conversation queue forming the real conversation scene; and the relationship distance calculation module is used for calculating the average value of the issuing time difference of the at least one real conversation pair corresponding to the interactive user group according to the issuing time difference between two pieces of conversation information in each real conversation pair corresponding to the interactive user group, and the average value is used as the relationship distance between two users in the interactive user group.
In some embodiments, the apparatus further comprises: and the second user extraction module is used for extracting the user who publishes the @ symbol and the other user to which the @ symbol points in the dialog flow data as users with social relations.
In some embodiments, the apparatus further comprises: and the social network building module is used for building a social network based on the extracted users with social relations.
Fig. 9 shows an electronic device of an embodiment of the invention. As shown in fig. 9, an electronic device 9000 includes: at least one processor 9010, and a memory 9020 communicatively coupled to the at least one processor 9010, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method.
Specifically, the memory 9020 and the processor 9010 are connected together via a bus 9030, and can be a general memory and a processor, which are not specifically limited herein, and when the processor 9010 executes a computer program stored in the memory 9020, the operations and functions described in the embodiments of the present invention in conjunction with fig. 1 to 8 can be performed.
An embodiment of the present invention further provides a storage medium, on which a computer program is stored, which, when executed by a processor, implements the method. For specific implementation, reference may be made to the method embodiment, which is not described herein again.
While embodiments of the invention have been disclosed above, it is not intended to be limited to the uses set forth in the specification and examples. It can be applied to all kinds of fields suitable for the present invention. Additional modifications will readily occur to those skilled in the art. It is therefore intended that the invention not be limited to the exact details and illustrations described and illustrated herein, but fall within the scope of the appended claims and equivalents thereof.

Claims (16)

1. A social relationship mining method, comprising:
acquiring conversation flow data of a group;
dividing the conversation flow data into a plurality of conversation queues, wherein the time span of each conversation queue is smaller than or equal to a time threshold value;
determining a conversation queue forming a real conversation scene according to the context correlation degree of the conversation information in each conversation queue;
and extracting the user corresponding to the conversation queue forming the real conversation scene as the user with social relation.
2. The social relationship mining method of claim 1, wherein prior to said partitioning the conversation stream data into a plurality of conversation queues, the method further comprises:
determining a conversation period according to the release condition of the conversation information in the conversation flow data along with the time;
and the value of the time threshold is determined according to the conversation period.
3. The social relationship mining method according to claim 2, wherein the determining a conversation period according to the release situation of the conversation information in the conversation flow data over time includes:
determining at least two conversation boundaries according to the release condition of the conversation information in the conversation flow data along with the time, wherein the at least two conversation boundaries comprise at least one group of conversation starting boundaries and conversation ending boundaries;
determining the dialog period based on a time span between the at least one set of dialog start boundaries and dialog end boundaries.
4. The social relationship mining method of claim 3, wherein determining at least two conversation boundaries based on the posting of conversation information in the conversation stream data over time comprises:
and calculating a minimum inflection point of the dialog information issuing rate in the dialog flow data, and taking the calculated minimum inflection point as the dialog boundary.
5. The social relationship mining method of claim 1, wherein said partitioning the conversation stream data into a plurality of conversation queues comprises:
selecting the conversation flow data by using a sliding time window, and forming each conversation queue by using the conversation flow data selected each time by using the sliding time window;
wherein the length of the sliding time window coincides with the time threshold.
6. The social relationship mining method of claim 1, wherein determining the conversation queue constituting the real conversation scene according to the contextual relevance of the conversation information in each conversation queue comprises:
matching the dialogue information in each dialogue queue into a dialogue pair;
and determining the conversation queue forming the real conversation scene according to the context correlation degree between the conversation pairs in the conversation queues.
7. The social relationship mining method of claim 6, wherein determining the conversation queue constituting the real conversation scene according to the contextual relevance of the conversation information in each conversation queue comprises:
after the dialogue information in each dialogue queue is matched into a dialogue pair in pairs, the dialogue pairs formed by the dialogue information issued by the same user are removed from the dialogue queue.
8. The social relationship mining method of claim 6, wherein determining the conversation queue constituting the real conversation scene according to the contextual relevance between the conversation pairs in each conversation queue comprises:
determining a real conversation pair according to the content similarity of two pieces of conversation information contained in each conversation pair in each conversation queue;
and determining the conversation queue forming the real conversation scene according to the topic distribution condition of the real conversation pair in each conversation queue.
9. The social relationship mining method according to claim 8, wherein the determining a real conversation pair according to the content similarity of two pieces of conversation information included in each conversation pair in each conversation queue comprises:
performing topic identification on two pieces of dialogue information contained in each dialogue pair column;
and if the topics of the two pieces of dialogue information contained in any dialogue pair are the same, judging that the dialogue pair is a real dialogue pair.
10. The social relationship mining method of claim 8, wherein the determining a conversation queue constituting a real conversation scene according to the distribution of the topics of the real conversation pairs in each conversation queue comprises:
counting the topics of the real conversation pairs in each conversation pair column;
and if the frequency of the occurrence frequency of the topic of one real conversation pair in any conversation queue exceeds a frequency threshold value, removing the real conversation pairs of other topics from the conversation queue, and taking the conversation queue formed by the real conversation pairs of the same topic as the conversation queue forming a real conversation scene.
11. The social relationship mining method according to claim 8, wherein after extracting the users corresponding to the conversation queue constituting the real conversation scene as the users having the social relationship, the method further comprises:
extracting an interactive user group in the conversation queue forming the real conversation scene, wherein the interactive user group is two users corresponding to at least one real conversation pair in the conversation queue forming the real conversation scene;
and calculating the average value of the issuing time difference of the at least one real conversation pair corresponding to the interactive user group according to the issuing time difference between the two pieces of conversation information in each real conversation pair corresponding to the interactive user group, and taking the average value as the relationship distance between the two users in the interactive user group.
12. The social relationship mining method of claim 1, further comprising:
and extracting the user who publishes the @ symbol in the dialog flow data and the other user pointed by the @ symbol as users with social relations.
13. The social relationship mining method of claim 1, 10 or 11, wherein after extracting users having social relationships, the method further comprises:
and constructing a social network based on the extracted users with social relations.
14. A social relationship mining device, comprising:
the conversation flow data acquisition module is used for acquiring conversation flow data of the group;
the conversation queue dividing module is used for dividing the conversation flow data into a plurality of conversation queues, wherein the time span of each conversation queue is less than or equal to a time threshold;
the real conversation scene determining module is used for determining a conversation queue forming a real conversation scene according to the context correlation degree of the conversation information in each conversation queue;
and the first user extraction module is used for extracting the user corresponding to the conversation queue forming the real conversation scene as the user with the social relation.
15. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of any of claims 1-12.
16. A storage medium on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1-12.
CN202010442783.4A 2020-05-22 2020-05-22 Social relation mining method and device, electronic equipment and storage medium Active CN111737590B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010442783.4A CN111737590B (en) 2020-05-22 2020-05-22 Social relation mining method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010442783.4A CN111737590B (en) 2020-05-22 2020-05-22 Social relation mining method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111737590A true CN111737590A (en) 2020-10-02
CN111737590B CN111737590B (en) 2023-09-12

Family

ID=72648129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010442783.4A Active CN111737590B (en) 2020-05-22 2020-05-22 Social relation mining method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111737590B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663047A (en) * 2012-03-29 2012-09-12 中国科学院计算技术研究所 Method and device for mining social relationship during mobile reading
US20130073389A1 (en) * 2011-09-15 2013-03-21 Stephan HEATH System and method for providing sports and sporting events related social/geo/promo link promotional data sets for end user display of interactive ad links, promotions and sale of products, goods, gambling and/or services integrated with 3d spatial geomapping, company and local information for selected worldwide locations and social networking
CN103631949A (en) * 2013-12-11 2014-03-12 中国科学院计算技术研究所 Data acquisition method and system for social network
CN105407034A (en) * 2015-10-23 2016-03-16 曾劲柏 Social network system based on address list exchange
CN105447179A (en) * 2015-12-14 2016-03-30 清华大学 Microblog social network based topic automated recommendation method and system
CN109034661A (en) * 2018-08-28 2018-12-18 腾讯科技(深圳)有限公司 User identification method, device, server and storage medium
CN110019286A (en) * 2017-07-19 2019-07-16 中国移动通信有限公司研究院 A kind of expression recommended method and device based on user social contact relationship

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130073389A1 (en) * 2011-09-15 2013-03-21 Stephan HEATH System and method for providing sports and sporting events related social/geo/promo link promotional data sets for end user display of interactive ad links, promotions and sale of products, goods, gambling and/or services integrated with 3d spatial geomapping, company and local information for selected worldwide locations and social networking
CN102663047A (en) * 2012-03-29 2012-09-12 中国科学院计算技术研究所 Method and device for mining social relationship during mobile reading
CN103631949A (en) * 2013-12-11 2014-03-12 中国科学院计算技术研究所 Data acquisition method and system for social network
CN105407034A (en) * 2015-10-23 2016-03-16 曾劲柏 Social network system based on address list exchange
CN105447179A (en) * 2015-12-14 2016-03-30 清华大学 Microblog social network based topic automated recommendation method and system
CN110019286A (en) * 2017-07-19 2019-07-16 中国移动通信有限公司研究院 A kind of expression recommended method and device based on user social contact relationship
CN109034661A (en) * 2018-08-28 2018-12-18 腾讯科技(深圳)有限公司 User identification method, device, server and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LEMAN AKOGLU等: "group based anomaly detection and description:a survey", DATA MINING AND KNOWLEDGE DISCOVERY, pages 626 - 688 *
汤小月: "一种空间上下文感知的提及目标推荐方法", 软件学报, vol. 31, no. 4, pages 1189 - 1211 *
王冰玉;吴振宇;沈苏彬;陈佳颖;: "社交媒体事件检测研究综述", 计算机技术与发展, vol. 28, no. 09, pages 105 - 111 *

Also Published As

Publication number Publication date
CN111737590B (en) 2023-09-12

Similar Documents

Publication Publication Date Title
Nguyen et al. Real-time event detection for online behavioral analysis of big social data
CN103336766A (en) Short text garbage identification and modeling method and device
CN109684446B (en) Text semantic similarity calculation method and device
KR20110115543A (en) Method for calculating entity similarities
CN104376010B (en) User recommendation method and device
KR20110115542A (en) Method for calculating semantic similarities between messages and conversations based on enhanced entity extraction
CN106202031B (en) System and method for associating group members based on group chat data
CN105893484A (en) Microblog Spammer recognition method based on text characteristics and behavior characteristics
CN113127746B (en) Information pushing method based on user chat content analysis and related equipment thereof
CN104077417A (en) Figure tag recommendation method and system in social network
CN111061837A (en) Topic identification method, device, equipment and medium
CN111444349A (en) Information extraction method and device, computer equipment and storage medium
CN110390109B (en) Method and device for analyzing association relation among multiple group chat messages
CN111597821A (en) Method and device for determining response probability
RU2612608C2 (en) Social circle formation system and method and computer data carrier
CN111061838A (en) Text feature keyword determination method and device and storage medium
CN113055751A (en) Data processing method and device, electronic equipment and storage medium
CN111737590A (en) Social relationship mining method and device, electronic equipment and storage medium
CN111026835B (en) Chat subject detection method, device and storage medium
JP2016197292A (en) Feeling identifying method, feeling identifying apparatus, and program
CN114724072A (en) Intelligent question pushing method, device, equipment and storage medium
CN104580234B (en) The guard method of behavioural characteristic in a kind of social networks
CN112650595B (en) Communication content processing method and related device
CN113779237B (en) Method, system, mobile terminal and readable storage medium for constructing social behavior sequence diagram
CN105512303A (en) Content presentation method based on big data analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant