CN112148979B - Event-associated user identification method, device, electronic equipment and storage medium - Google Patents

Event-associated user identification method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112148979B
CN112148979B CN202011034605.4A CN202011034605A CN112148979B CN 112148979 B CN112148979 B CN 112148979B CN 202011034605 A CN202011034605 A CN 202011034605A CN 112148979 B CN112148979 B CN 112148979B
Authority
CN
China
Prior art keywords
event
user
candidate
behavior data
network behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011034605.4A
Other languages
Chinese (zh)
Other versions
CN112148979A (en
Inventor
石逸轩
戴明洋
刘子祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011034605.4A priority Critical patent/CN112148979B/en
Publication of CN112148979A publication Critical patent/CN112148979A/en
Application granted granted Critical
Publication of CN112148979B publication Critical patent/CN112148979B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an event associated user identification method, an event associated user identification device, electronic equipment and a storage medium, and relates to the technical field of data processing, in particular to the technical field of big data and natural language processing. The specific implementation scheme is as follows: acquiring the theme of an event to be identified and network behavior data of each candidate user; analyzing the network behavior data of each candidate user to obtain keywords corresponding to each candidate user; obtaining the matching degree of the keywords corresponding to each candidate user and the theme; and extracting the associated users of the event to be identified from the candidate users according to each matching degree. Therefore, the user associated with the event can be rapidly identified according to the network behavior data of the user, so that people can be assisted in making event analysis decisions, sudden events can be better handled, and the cost of manpower and material resources is saved.

Description

Event-associated user identification method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to the field of big data and natural language processing technologies, and in particular, to a method and apparatus for identifying an event-related user, an electronic device, and a storage medium.
Background
Various events may occur in daily life, and some events may be harmful to lives and properties of people, or ecological environments, etc. After an event occurs, it is important how to quickly identify the user associated with the event to predict the direction of event development and determine the next strategy.
Disclosure of Invention
The application provides an event-associated user identification method, an event-associated user identification device, electronic equipment and a storage medium.
According to an aspect of the present application, there is provided a method for identifying event-associated users, including:
acquiring the theme of an event to be identified and network behavior data of each candidate user;
analyzing the network behavior data of each candidate user to obtain keywords corresponding to each candidate user;
obtaining the matching degree of the keywords corresponding to each candidate user and the theme;
and extracting the associated users of the event to be identified from the candidate users according to each matching degree.
According to another aspect of the present application, there is provided an identification apparatus of an event-associated user, including:
the first acquisition module is used for acquiring the theme of the event to be identified and the network behavior data of each candidate user;
The analysis module is used for analyzing the network behavior data of each candidate user so as to acquire keywords corresponding to each candidate user;
the second acquisition module is used for acquiring the matching degree of the keywords corresponding to each candidate user and the theme;
and the extraction module is used for extracting the associated users of the event to be identified from the candidate users according to each matching degree.
According to another aspect of the present application, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of identifying event-associated users as described in the embodiments of the above aspect.
According to another aspect of the present application, there is provided a non-transitory computer-readable storage medium having stored thereon computer instructions for causing the computer to execute the method for identifying event-associated users according to the embodiment of the above aspect.
Other effects of the above alternative will be described below in connection with specific embodiments.
Drawings
The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:
fig. 1 is a flow chart of a method for identifying event-related users according to an embodiment of the present application;
fig. 2 is a flowchart of another method for identifying event-associated users according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating another method for identifying event-associated users according to an embodiment of the present disclosure;
fig. 4 is a flowchart of another method for identifying event-associated users according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating another method for identifying event-associated users according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an identification device of an event-related user according to an embodiment of the present application;
fig. 7 is a block diagram of an electronic device of an event-associated user identification method according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The following describes an event-associated user identification method, an event-associated user identification device, an electronic device and a storage medium according to embodiments of the present application with reference to the accompanying drawings.
Fig. 1 is a flowchart of an identification method of an event-associated user according to an embodiment of the present application.
The method for identifying the event-related user can be executed by the device for identifying the event-related user, which is provided by the embodiment of the application, and can be configured in the electronic equipment so as to quickly identify the related user of the event to be identified according to the network behavior data of the user.
As shown in fig. 1, the method for identifying the event-associated user includes:
step 101, obtaining the theme of the event to be identified and the network behavior data of each candidate user.
In this embodiment, the event to be identified may be a public health event, a natural disaster event, a social security event, an accident disaster event, or the like. The subject of the event to be identified may be received by the electronic device, or the electronic device may extract news information related to the event to be identified.
The electronic device may further obtain network behavior data of network users in the geographic area according to the geographic area to which the occurrence location of the event to be identified belongs, where the network users are candidate users, that is, obtain network behavior data of each candidate user.
Where network behavior data is behavior data of a network user on a network, including but not limited to browsing web pages, posting comments, purchasing tickets, posting dynamics, and the like.
For example, an event occurs in a city of province a, and network behavior data of network users in province a can be obtained.
And 102, analyzing the network behavior data of each candidate user to obtain the keywords corresponding to each candidate user.
In this embodiment, the network behavior data of each candidate user may be parsed, so as to obtain a keyword corresponding to each candidate user.
Specifically, each candidate user may have a plurality of pieces of network behavior data, each piece of network behavior data of each candidate user may be parsed to obtain a keyword corresponding to each piece of network behavior data, and the keyword corresponding to each candidate user may be obtained according to the keyword corresponding to each piece of network behavior data of each candidate user. Here, the keywords corresponding to each candidate user may be keywords corresponding to all network behavior data of each candidate user.
When each piece of network behavior data is analyzed, word segmentation processing can be performed on each piece of network behavior data to obtain each word segment of each piece of network behavior data, and then each word segment is screened to obtain keywords.
In the screening, the broken words, the aid words, the exclamation words and the like in each piece of network behavior data can be screened out, and the rest of the broken words are used as keywords. Further, according to a preset word bank related to the event, the word segmentation related to the event can be screened out, so that keywords corresponding to the network behavior data are obtained.
For example, the web page contents browsed by the candidate users are analyzed, and keywords are extracted from the web page contents, so that keywords corresponding to the network behavior data of the browsed web page are obtained.
For another example, the purchase bill of the candidate user is analyzed to obtain the purchase article name, the addressee, the receiving address and the like corresponding to the network behavior data of the purchase article.
Step 103, obtaining the matching degree of the keywords corresponding to each candidate user and the theme.
In this embodiment, the matching degree of each keyword corresponding to each candidate user and the topic may be calculated according to the vector corresponding to each keyword and the vector corresponding to the topic, and the weighted sum of the matching degrees of each keyword is used as the matching degree of each keyword corresponding to each candidate user and the topic.
The higher the matching degree of the keywords corresponding to the candidate users and the subject, the higher the association degree of the candidate users and the event to be identified can be considered.
For example, the keywords corresponding to a candidate user have k1, k2 and k3, the matching degree of the keywords corresponding to the candidate user can be calculated by calculating the matching degree of the keywords corresponding to the k1, k2 and k3, and then the weighted sum of the matching degrees corresponding to the three keywords is calculated to obtain the matching degree of the keywords corresponding to the candidate user.
Wherein, the corresponding weight of each keyword can be set to be the same. Or, the weight may be determined according to the number of times of occurrence of each keyword in the keywords corresponding to the network behavior data of all the candidate users, where the number of times is greater.
And 104, extracting the associated users of the event to be identified from the candidate users according to each matching degree.
In practical applications, the number of candidate users is relatively large, and there may be some candidate users with a low degree of association with the event to be identified. Based on this, in one embodiment of the present application, the associated user of the event to be identified may be extracted from each candidate user according to the matching degree of the keyword corresponding to each candidate user and the topic.
Specifically, the matching degree of the keywords corresponding to each candidate user and the subject can be ranked, and a preset number of candidate users with high matching degree are selected from the ranking results to serve as the associated users of the event to be identified.
For example, the number of candidate users is 10000, and 9000 candidate users with high matching degree can be extracted from the candidate users as associated users.
It should be noted that the preset number may be set according to actual needs, or may be determined according to other manners.
In this embodiment, the candidate users with high relevance to the event to be identified may be screened from the candidate users according to the matching degree of the keyword corresponding to each candidate user and the topic, so as to identify the relevant users of the event.
In the related art, the associated population of events is typically determined in an offline manner. However, determining the crowd associated with an event in an offline manner requires a large amount of manpower and resources, and often requires long-term adherence to the relevant policies. In the method, the related users of the event can be identified rapidly according to the network behavior data of the users, compared with the offline mode, the method is high in identification speed and beneficial to predicting the development direction of the event, so that the first machine is mastered to determine the next strategy, the emergency is better handled, and manpower and material resources are saved.
In the embodiment of the application, the theme of the event to be identified and the network behavior data of each candidate user are obtained; analyzing the network behavior data of each candidate user to obtain keywords corresponding to each candidate user; obtaining the matching degree of the keywords corresponding to each candidate user and the theme; and extracting the associated users of the event to be identified from the candidate users according to each matching degree. Therefore, the user associated with the event can be rapidly identified according to the network behavior data of the user, so that people can be assisted in making event analysis decisions, sudden events can be better handled, and the cost of manpower and material resources is saved.
In practical application, analyzing multiple pieces of network behavior data of the same network user may obtain word segments with the same semantics. Based on this, in order to improve accuracy of the keywords corresponding to each candidate user and improve recognition speed, in an embodiment of the present application, the manner shown in fig. 2 may be used. Fig. 2 is a flowchart of another method for identifying event-related users according to an embodiment of the present application.
As shown in fig. 2, the above-mentioned parsing the network behavior data of each candidate user to obtain the keywords corresponding to each candidate user includes:
step 201, parsing each piece of network behavior data corresponding to each candidate user to obtain each candidate word corresponding to each candidate user.
In this embodiment, word segmentation processing may be performed on each piece of network behavior data of each candidate user, so as to obtain each word segment corresponding to each piece of network behavior data.
After each word segmentation corresponding to each piece of network behavior data is obtained, the broken words, exclamation words, auxiliary words and the like can be screened out, each candidate word corresponding to each piece of network behavior data is obtained, each candidate word corresponding to each piece of network behavior data of each candidate user is used as a candidate word corresponding to each candidate user, and therefore each candidate word corresponding to each candidate user is obtained.
Step 202, fusing the candidate words corresponding to each candidate user to obtain the keywords corresponding to each candidate user.
In practical application, the network user may perform multiple network behaviors on the same event, for example, comment on the same event multiple times, so that the network behavior data of the same network user may be analyzed, and the same candidate word exists in candidate words corresponding to the multiple pieces of network behavior data, or candidate words with the same semantic meaning may exist.
For example, a web user browses news about event B and issues views of event B on the interactive platform. Then, the two pieces of network behavior data of the network user are analyzed, and the same candidate words exist in the candidate words corresponding to the two pieces of behavior data.
In this embodiment, after each candidate word corresponding to each candidate user is obtained, semantic similarity between the candidate words may be obtained, and the candidate words with similarity exceeding a similarity threshold are fused according to the semantic similarity, so that the unfused candidate words and the fused words are used as keywords corresponding to each candidate user.
For example, the candidate words with the same semantics can be subjected to upper-level merging weights, the same candidate words can be subjected to de-duplication and merging weights, and the like.
For example, the candidate words corresponding to a candidate user are { w1, w2, w1, w3, w4}, wherein two candidate words w1 have corresponding weights of q11 and q12, respectively, the sum of q11 and q12 is calculated to be q1, q1 is used as the weight of the candidate word w1, and the keywords corresponding to the candidate user are { w1, w2, w3, w4}, which are obtained through fusion.
In the embodiment of the application, through fusing the candidate words corresponding to each candidate user, the same or semantically same keywords and the like in the keywords corresponding to the candidate users can be avoided, so that the accuracy of the keywords is improved, the recognition accuracy of the associated users is further improved, the matching degree of the keywords with the same or semantically same keywords and the subject can be avoided, and the calculated amount is reduced.
In one embodiment of the present application, when each piece of network behavior data corresponding to each candidate user is parsed, the parsing may be performed according to the attribute of the network behavior data. Fig. 3 is a flow chart of another method for identifying event-related users according to an embodiment of the present application.
As shown in fig. 3, the above-mentioned parsing the network behavior data of each candidate user to obtain the keywords corresponding to each candidate user includes:
Step 301, determining the weight of each analysis object in each piece of network behavior data according to the behavior attribute corresponding to each piece of network behavior data corresponding to each candidate user.
Because network users can perform various network behaviors, such as uploading videos, commenting, publishing dynamics, browsing web pages, interacting with other network users, and the like, the analysis key points are different because of the network behavior data with different behavior attributes.
In this embodiment, the triple SPO data may be extracted from the network behavior data, where S is a subject (subject), P is a predicate (predicate), and O is an object (object).
Specifically, the behavior attribute corresponding to each piece of network behavior data may be determined according to each piece of network behavior data, and the weight of each analysis object in each piece of network behavior data may be determined according to the behavior attribute corresponding to each piece of network behavior data.
For example, if the behavior attribute of the network behavior data uploads the video, and the SPO corresponding to the network behavior data is the user-upload-video, the weight of O in the parsing object SPO is the largest, that is, the weight corresponding to the video is the largest. As another example, where the attribute of the network behavior data is a comment, the weight of the comment text is greatest.
Step 302, according to the weight of each analysis object in each piece of network behavior data, each piece of network behavior data is analyzed to determine the candidate word corresponding to each piece of network behavior data.
After the weight of each analysis object in each piece of network behavior data is determined, the analysis object with the largest weight can be analyzed to determine the candidate word corresponding to each piece of network behavior data.
For example, uploading the behavior of the video, analyzing the uploaded video with emphasis if the weight corresponding to the video is the largest, and analyzing the video with emphasis if the behavior attribute is comments and the weight of the comment text is the largest.
In the embodiment of the present application, when analyzing each piece of network behavior data corresponding to each candidate user to obtain each candidate word corresponding to each candidate user, the weight of each analysis object in each piece of network behavior data may be determined according to the behavior attribute corresponding to each piece of network behavior data corresponding to each candidate user, and then each piece of network behavior data may be analyzed according to the weight of each analysis object in each piece of network behavior data, so as to determine the candidate word corresponding to each piece of network behavior data. Therefore, the network behavior data is analyzed according to the weight of each analysis object in the network behavior data, the candidate words corresponding to the network behavior data are obtained, and the accuracy of obtaining the candidate words is improved.
In order to improve the accuracy of the identified associated users, in one embodiment of the present application, the number of associated users extracted from each candidate user may be determined according to the domain to which the event to be identified belongs, so as to extract the associated users. Fig. 4 is a flow chart of another method for identifying event-related users according to an embodiment of the present application.
As shown in fig. 4, the above-mentioned extracting, according to each matching degree, the associated user of the event to be identified from the candidate users includes:
step 401, determining the domain to which the event to be identified belongs.
In this embodiment, the domain to which the event to be identified belongs may be determined according to the subject of the event to be identified and the characteristics of each domain. Among the areas to which the event belongs include, but are not limited to, public health, natural disasters, social security, accident disasters, and the like.
Step 402, determining the target number of the associated users to be extracted according to the domain to which the event to be identified belongs.
After determining the domain to which the event to be identified belongs, the target number of the associated users to be extracted can be determined according to the event sweep range, the hazard degree and the like of the domain to which the event to be identified belongs.
For example, a natural disaster event has a certain geographical area, and then the number of network users in the disaster area of the event can be used as the target number of the associated users to be extracted.
As another example, public safety events may involve a relatively wide range of people, and the number of associated users to be extracted may be determined based on the range of involvement.
And step 403, extracting the associated users matched with the target number from the candidate users according to each matching degree.
After determining the matching degree of the keywords corresponding to each candidate user and the topics of the event to be identified and the target number of the associated users to be extracted, the matching degree of the keywords corresponding to each candidate user and the topics can be ranked according to the order of the matching degree, and the candidate users with the target number of the matching degree are used as the associated users of the event to be identified.
In the embodiment of the present application, when the associated user of the event to be identified is extracted from each candidate user according to each matching degree, the target number of the associated users to be extracted may be determined by determining the domain to which the event to be identified belongs according to the domain to which the event to be identified belongs, and according to each matching degree, the associated users matching the target number may be extracted from each candidate user. Therefore, the number of the associated users to be extracted is determined according to the domain of the event to be identified, so that the identification accuracy of the associated users is improved.
In practical applications, some users associated with an event may not directly perform network behavior data associated with the event, or have a relatively low degree of matching with a topic, so that the number of associated users extracted from candidate users according to the degree of matching may be less than the target number. Based on this, in one embodiment of the present application, in the case where the number of associated users extracted from the respective candidate users is smaller than the target number according to each degree of matching, the incremental users may be acquired, from which the associated users are extracted such that the number of associated users of the event to be identified satisfies the target number.
Specifically, the network behavior data of the user is usually SPO data, and when the number of associated users extracted from each candidate user is less than the target number according to the matching degree of the keyword corresponding to each candidate user and the subject of the event to be identified, the incremental user corresponding to each associated user can be obtained according to the SPO data corresponding to the network behavior data of each associated user, and the incremental users corresponding to all the associated users form an incremental user set.
That is, the incremental user set is a set of incremental users that are obtained from network behavior data of all associated users.
For example, the SPO data corresponding to the network behavior data of the user a is interactive interaction data between the user a and the user b, and if the user a is an associated user, the user b is found according to the SPO data corresponding to the user a, so that the user b can be considered as an incremental user.
After the incremental user set is obtained, the network behavior data of each incremental user can be obtained, the network behavior data of each incremental user is analyzed, and the keyword corresponding to each incremental user is obtained.
After the keywords corresponding to each incremental user are obtained, the matching degree of the keywords corresponding to each incremental user and the topics of the events to be identified can be calculated, and the associated users of the events to be identified are extracted from the incremental user set according to the sequence of the matching degree from high to low so that the number of the associated users of the events to be identified is equal to the target number.
In the embodiment of the present application, when the number of associated users extracted from each candidate user is less than the target number according to each matching degree, an incremental user set may be obtained according to network behavior data of each associated user, and according to the matching degree of keywords corresponding to each incremental user and a topic, the associated users of the event to be identified may be extracted from the incremental user set. Therefore, when the number of the associated users extracted from each candidate user is smaller than the target number, the incremental users can be determined according to the network behavior data of the associated users, and the associated users of the event to be identified are continuously extracted from the incremental users, so that the number of the associated users of the event to be identified is equal to the target number.
In order to improve the recognition accuracy of the associated users, in one embodiment of the present application, when network behavior data of each candidate user is obtained, network behavior data of each candidate user and each candidate user may be obtained from the network users. Fig. 5 is a flow chart of another method for identifying event-related users according to an embodiment of the present application.
As shown in fig. 5, the obtaining the theme of the event to be identified and the network behavior data of each candidate user includes:
step 501, obtaining a theme of an event to be identified.
In this embodiment, the subject of the event to be identified may be received by the electronic device, or may be extracted from news information related to the event to be identified by the electronic device.
Step 502, determining the domain to which the event to be identified belongs according to the subject of the event to be identified.
In this embodiment, the domain to which the event to be identified belongs may be determined according to the subject of the event to be identified and the characteristics of each domain. Among the areas to which the event belongs include, but are not limited to, public health, natural disasters, social security, accident disasters, and the like.
Step 503, determining a screening policy corresponding to the candidate user according to the domain to which the event to be identified belongs.
With the development of the internet, the number of network users is increased, and part of network users are usually related to the event, so that candidate users can be screened out from the network users, and then the associated users of the event to be identified can be extracted from the candidate users.
In this embodiment, the fields to which the event belongs are different, and the screening policies of the corresponding candidate users are also different, and after determining the field to which the event to be identified belongs, the screening policies corresponding to the candidate users may be determined according to the field to which the event to be identified belongs.
For example, for a social security event, the user associated with the event is typically the location where the event occurred, and the network user in the geographic area may take the geographic area as a screening policy.
Step 504, according to the filtering strategy, obtaining each candidate user and the network behavior data of each candidate user from the network users.
In this embodiment, after determining the screening policy corresponding to the candidate users, each candidate user may be obtained from the network users according to the screening policy, and after obtaining the occurrence time of the event to be identified, the network behavior data of each candidate user may be obtained.
For example, the screening policy corresponding to the candidate users is a geographic area, and network users in the geographic area where the occurrence location of the event to be identified is located can be used as each candidate user, and network behavior data of each candidate can be obtained.
In the embodiment of the application, when the theme of the event to be identified and the network behavior data of each candidate user are acquired, the theme of the event to be identified is acquired, the domain to which the event to be identified belongs is determined according to the theme of the event to be identified, the screening strategy corresponding to the candidate user is determined according to the domain to which the event to be identified belongs, and then the candidate users and the network behavior data of each candidate user are acquired from the network users according to the screening strategy. Therefore, the network users are screened through the screening strategy, the screened candidate users and the network behavior data thereof are obtained, the network behavior data of all the network users are prevented from being obtained, the calculated amount is reduced, and the identification accuracy of the associated users is improved.
In one embodiment of the present application, the screening policies may include at least one of the following policies: screening period, user type, data type, etc.
For example, public health events, screening periods may be determined based on the possible latency of the event. In the case of a social security event, the corresponding screening period may be different from the screening period of the public health event.
In practical applications, there may be specific groups of events, and candidate users may be selected from network users according to user types. For example, the accident security event can take the user type as a screening policy according to the gender, age and the like of the crowd targeted by the event.
As another example, public health events may take network behavior data as comment class data, published demand class data, and screening period as screening policies. Or screening the candidate users for the data type, the screening period and the user type aimed at by the event.
In the embodiment of the present application, the screening policies corresponding to the determined candidate users according to the domain to which the event to be identified belongs are at least one of a screening period, a user type and a data type, and the candidate users can be screened according to the screening policies, so that the associated users of the event can be extracted from the candidate users, thereby improving the identification accuracy.
In order to achieve the above embodiment, the embodiment of the present application further provides an identification device for event-related users. Fig. 6 is a schematic structural diagram of an identification device of an event-related user according to an embodiment of the present application.
As shown in fig. 6, the event-associated user identification apparatus 600 includes: the first obtaining module 610, the parsing module 620, the second obtaining module 630, and the extracting module 640.
A first obtaining module 610, configured to obtain a topic of an event to be identified and network behavior data of each candidate user;
the parsing module 620 is configured to parse the network behavior data of each candidate user to obtain a keyword corresponding to each candidate user;
A second obtaining module 630, configured to obtain a matching degree between the keywords corresponding to each candidate user and the subject;
and the extraction module 640 is used for extracting the associated user of the event to be identified from the candidate users according to each matching degree.
In one embodiment of the present application, the parsing module 620 is configured to:
analyzing each piece of network behavior data corresponding to each candidate user to obtain each candidate word corresponding to each candidate user;
and fusing the candidate words corresponding to each candidate user to obtain the keywords corresponding to each candidate user.
In one embodiment of the present application, the parsing module 620 is configured to:
determining the weight of each analysis object in each piece of network behavior data according to the behavior attribute corresponding to each piece of network behavior data corresponding to each candidate user;
and analyzing each piece of network behavior data according to the weight of each analysis object in each piece of network behavior data so as to determine candidate words corresponding to each piece of network behavior data.
In one embodiment of the present application, the extraction module 640 is configured to:
determining the domain of the event to be identified;
determining the target number of the associated users to be extracted according to the field of the event to be identified;
And extracting the associated users matched with the target number from the candidate users according to each matching degree.
In one embodiment of the present application, the apparatus further comprises:
the third acquisition module is used for acquiring an incremental user set according to the network behavior data of each associated user under the condition that the number of the associated users extracted from each candidate user is less than the target number according to each matching degree;
the extraction module 640 is further configured to extract, from the set of incremental users, the associated user of the event to be identified according to the matching degree of the keyword corresponding to each incremental user and the topic.
In one embodiment of the present application, the first obtaining module 610 is configured to:
acquiring a theme of an event to be identified;
determining the domain of the event to be identified according to the subject of the event to be identified;
determining a screening strategy corresponding to the candidate user according to the field of the event to be identified;
and acquiring each candidate user and network behavior data of each candidate user from the network users according to the screening strategy.
In one embodiment of the present application, the screening policies include at least one of the following policies: screening period, user type, data type.
It should be noted that, the explanation of the foregoing embodiment of the method for identifying an event-related user is also applicable to the identifying device of the event-related user in this embodiment, so that the description thereof is omitted herein.
According to the event associated user identification device, the theme of the event to be identified and the network behavior data of each candidate user are obtained; analyzing the network behavior data of each candidate user to obtain keywords corresponding to each candidate user; obtaining the matching degree of the keywords corresponding to each candidate user and the theme; and extracting the associated users of the event to be identified from the candidate users according to each matching degree. Therefore, the user associated with the event can be rapidly identified according to the network behavior data of the user, so that people can be assisted in making event analysis decisions, sudden events can be better handled, and the cost of manpower and material resources is saved.
According to embodiments of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 7, a block diagram of an electronic device of an event-associated user identification method according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.
As shown in fig. 7, the electronic device includes: one or more processors 701, memory 702, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 701 is illustrated in fig. 7.
Memory 702 is a non-transitory computer-readable storage medium provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for identifying event-associated users provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of identifying event-associated users provided herein.
The memory 702 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the first acquisition module 610, the parsing module 620, the second acquisition module 630, and the extraction module 640 shown in fig. 6) corresponding to the identification method of the event-related user in the embodiments of the present application. The processor 701 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 702, that is, implements the event-related user identification method in the above-described method embodiment.
Memory 702 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created from the use of the identified electronic device by the event-associated user, and the like. In addition, the memory 702 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 702 optionally includes memory remotely located with respect to processor 701, which may be connected to the electronic device of the event-associated user's identification method via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the event-associated user identification method may further include: an input device 703 and an output device 704. The processor 701, the memory 702, the input device 703 and the output device 704 may be connected by a bus or otherwise, in fig. 7 by way of example.
The input device 703 may receive input numeric or character information and key signal inputs related to user settings and function control of the electronic device that generate a method of recognition of the user associated with the event, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, or the like. The output device 704 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS (Virtual Private Server ) service are overcome.
According to the technical scheme, the method and the device can be used for quickly identifying the user associated with the event according to the network behavior data of the user, assisting people in making event analysis decisions, better coping with the emergency and saving manpower and material cost.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (10)

1. A method of identifying event-associated users, comprising:
acquiring a theme of an event to be identified; determining the domain to which the event to be identified belongs according to the subject of the event to be identified; determining a screening strategy corresponding to the candidate user according to the domain to which the event to be identified belongs; according to the screening strategy, acquiring each candidate user and network behavior data of each candidate user from network users;
Analyzing the network behavior data of each candidate user to obtain keywords corresponding to each candidate user;
obtaining the matching degree of the keywords corresponding to each candidate user and the theme;
determining the domain to which the event to be identified belongs; determining the target number of the associated users to be extracted according to the field of the event to be identified; extracting associated users matched with the target quantity from the candidate users according to each matching degree;
acquiring an incremental user set according to the network behavior data of each associated user under the condition that the number of the associated users extracted from each candidate user is smaller than the target number according to each matching degree;
and extracting the associated users of the event to be identified from the incremental user set according to the matching degree of the keywords corresponding to each incremental user and the theme.
2. The method of claim 1, wherein the parsing the network behavior data of each candidate user to obtain the keyword corresponding to each candidate user comprises:
analyzing each piece of network behavior data corresponding to each candidate user to obtain each candidate word corresponding to each candidate user;
And fusing the candidate words corresponding to each candidate user to obtain the keywords corresponding to each candidate user.
3. The method of claim 2, wherein said parsing each piece of network behavior data corresponding to each of the candidate users to obtain each candidate word corresponding to each of the candidate users comprises:
determining the weight of each analysis object in each piece of network behavior data according to the behavior attribute corresponding to each piece of network behavior data corresponding to each candidate user;
and analyzing each piece of network behavior data according to the weight of each analysis object in each piece of network behavior data so as to determine candidate words corresponding to each piece of network behavior data.
4. The method of claim 1, wherein the screening policy comprises at least one of the following policies: screening period, user type, data type.
5. An identification device of event-associated users, comprising:
the first acquisition module is used for acquiring the theme of the event to be identified; determining the domain to which the event to be identified belongs according to the subject of the event to be identified; determining a screening strategy corresponding to the candidate user according to the domain to which the event to be identified belongs; according to the screening strategy, acquiring each candidate user and network behavior data of each candidate user from network users;
The analysis module is used for analyzing the network behavior data of each candidate user so as to acquire keywords corresponding to each candidate user;
the second acquisition module is used for acquiring the matching degree of the keywords corresponding to each candidate user and the theme;
the extraction module is used for determining the domain to which the event to be identified belongs; determining the target number of the associated users to be extracted according to the field of the event to be identified; extracting associated users matched with the target quantity from the candidate users according to each matching degree;
the third acquisition module is used for acquiring an incremental user set according to the network behavior data of each associated user under the condition that the number of the associated users extracted from each candidate user is smaller than the target number according to each matching degree;
the extraction module is further configured to extract, from the set of incremental users, the associated users of the event to be identified according to the matching degree of the keyword corresponding to each incremental user and the topic.
6. The apparatus of claim 5, wherein the parsing module is configured to:
analyzing each piece of network behavior data corresponding to each candidate user to obtain each candidate word corresponding to each candidate user;
And fusing the candidate words corresponding to each candidate user to obtain the keywords corresponding to each candidate user.
7. The apparatus of claim 6, wherein the parsing module is configured to:
determining the weight of each analysis object in each piece of network behavior data according to the behavior attribute corresponding to each piece of network behavior data corresponding to each candidate user;
and analyzing each piece of network behavior data according to the weight of each analysis object in each piece of network behavior data so as to determine candidate words corresponding to each piece of network behavior data.
8. The apparatus of claim 5, wherein the screening policy comprises at least one of: screening period, user type, data type.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of identifying event-associated users of any of claims 1-4.
10. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of identifying event-associated users of any of claims 1-4.
CN202011034605.4A 2020-09-27 2020-09-27 Event-associated user identification method, device, electronic equipment and storage medium Active CN112148979B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011034605.4A CN112148979B (en) 2020-09-27 2020-09-27 Event-associated user identification method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011034605.4A CN112148979B (en) 2020-09-27 2020-09-27 Event-associated user identification method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112148979A CN112148979A (en) 2020-12-29
CN112148979B true CN112148979B (en) 2023-08-01

Family

ID=73894709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011034605.4A Active CN112148979B (en) 2020-09-27 2020-09-27 Event-associated user identification method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112148979B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881851B (en) * 2023-09-04 2023-12-19 成都无声讯通科技有限责任公司 Internet of things data processing method and device based on machine learning and server

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202488A (en) * 2016-07-19 2016-12-07 西北工业大学 Estimation user is to the method for physical event distance
CN108023768A (en) * 2017-12-01 2018-05-11 中国联合网络通信集团有限公司 Network event chain establishment method and network event chain establish system
CN109829089A (en) * 2018-12-12 2019-05-31 中国科学院计算技术研究所 Social network user method for detecting abnormality and system based on association map
CN110415131A (en) * 2019-07-19 2019-11-05 上海连尚网络科技有限公司 A kind of method and apparatus for realizing social interaction between author and reader
CN111105064A (en) * 2018-10-26 2020-05-05 阿里巴巴集团控股有限公司 Method and device for determining suspected information of fraud event
CN111353455A (en) * 2020-03-06 2020-06-30 网易(杭州)网络有限公司 Video content determination method and device, storage medium and electronic equipment
CN111414487A (en) * 2020-03-20 2020-07-14 北京百度网讯科技有限公司 Method, device, equipment and medium for relevant expansion of event theme

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10832348B2 (en) * 2013-11-08 2020-11-10 International Business Machines Corporation Topic recommendation in a social network environment
US20180047038A1 (en) * 2016-08-10 2018-02-15 International Business Machines Corporation Leveraging hashtags to dynamically scope a target audience for a social network message

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202488A (en) * 2016-07-19 2016-12-07 西北工业大学 Estimation user is to the method for physical event distance
CN108023768A (en) * 2017-12-01 2018-05-11 中国联合网络通信集团有限公司 Network event chain establishment method and network event chain establish system
CN111105064A (en) * 2018-10-26 2020-05-05 阿里巴巴集团控股有限公司 Method and device for determining suspected information of fraud event
CN109829089A (en) * 2018-12-12 2019-05-31 中国科学院计算技术研究所 Social network user method for detecting abnormality and system based on association map
CN110415131A (en) * 2019-07-19 2019-11-05 上海连尚网络科技有限公司 A kind of method and apparatus for realizing social interaction between author and reader
CN111353455A (en) * 2020-03-06 2020-06-30 网易(杭州)网络有限公司 Video content determination method and device, storage medium and electronic equipment
CN111414487A (en) * 2020-03-20 2020-07-14 北京百度网讯科技有限公司 Method, device, equipment and medium for relevant expansion of event theme

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于FP-Growth的社交好友推荐方法研究;熊才权;陈曦;;湖北工业大学学报(01);全文 *
基于社区发现和关键词共现的网络舆情潜在主题发现研究――以新浪微博魏则西事件为例;丁晟春;王鹏鹏;龚思兰;;情报科学(07);全文 *

Also Published As

Publication number Publication date
CN112148979A (en) 2020-12-29

Similar Documents

Publication Publication Date Title
AU2018383346B2 (en) Domain-specific natural language understanding of customer intent in self-help
CN111507104B (en) Method and device for establishing label labeling model, electronic equipment and readable storage medium
US9818080B2 (en) Categorizing a use scenario of a product
US11397954B2 (en) Providing analytics on compliance profiles of type organization and compliance named entities of type organization
WO2013074553A1 (en) Microblog summarization
CN111104514A (en) Method and device for training document label model
US11748429B2 (en) Indexing native application data
CN110427436B (en) Method and device for calculating entity similarity
US11640420B2 (en) System and method for automatic summarization of content with event based analysis
CN112052397B (en) User characteristic generation method and device, electronic equipment and storage medium
CN111310058B (en) Information theme recommendation method, device, terminal and storage medium
CN111666495A (en) Case recommendation method, device, equipment and storage medium
CN111858905A (en) Model training method, information identification method, device, electronic equipment and storage medium
CN111563198B (en) Material recall method, device, equipment and storage medium
CN111460296B (en) Method and apparatus for updating event sets
CN110569370A (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN112380847A (en) Interest point processing method and device, electronic equipment and storage medium
CN112084150A (en) Model training method, data retrieval method, device, equipment and storage medium
CN112148979B (en) Event-associated user identification method, device, electronic equipment and storage medium
CN111125445B (en) Community theme generation method and device, electronic equipment and storage medium
CN112650919A (en) Entity information analysis method, apparatus, device and storage medium
CN111310044B (en) Page element information extraction method, device, equipment and storage medium
CN111666417A (en) Method and device for generating synonyms, electronic equipment and readable storage medium
US20170337570A1 (en) Analytics system for product retention management
CN111026916A (en) Text description conversion method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant