CN111385136B - Method and device for determining user communication identifier - Google Patents

Method and device for determining user communication identifier Download PDF

Info

Publication number
CN111385136B
CN111385136B CN201811653353.6A CN201811653353A CN111385136B CN 111385136 B CN111385136 B CN 111385136B CN 201811653353 A CN201811653353 A CN 201811653353A CN 111385136 B CN111385136 B CN 111385136B
Authority
CN
China
Prior art keywords
target
feature
target user
determining
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811653353.6A
Other languages
Chinese (zh)
Other versions
CN111385136A (en
Inventor
杜奎然
张睿
李易学
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technical Service Co Ltd
Original Assignee
Huawei Technical Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technical Service Co Ltd filed Critical Huawei Technical Service Co Ltd
Priority to CN201811653353.6A priority Critical patent/CN111385136B/en
Publication of CN111385136A publication Critical patent/CN111385136A/en
Application granted granted Critical
Publication of CN111385136B publication Critical patent/CN111385136B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/52User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail for supporting social networking services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the invention discloses a method and a device for determining a user communication identifier, wherein the method comprises the following steps: n pieces of message attribute information corresponding to N pieces of historical messages issued by a target account on a target social contact platform are obtained. And determining N time windows according to the release time corresponding to each historical message, and determining the ticket attribute information of any target user in the M target users on any time window according to the target ticket data in any time window so as to obtain the N ticket attribute information of each target user on the N time windows. And extracting information characteristics based on the N message attribute information and the N call ticket attribute information of each target user in N time windows to obtain M target characteristic sets corresponding to the M target users. And determining a target user communication identifier uniquely associated with the target account according to the target feature set corresponding to each target user. By adopting the embodiment of the invention, the complaint feedback efficiency and the user experience of the communication network can be improved.

Description

Method and device for determining user communication identifier
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for determining a user communication identifier.
Background
With the continuous development of computer network technology and communication technology, more and more people can enjoy the convenience brought by the internet technology. Particularly, the rise of social networks brings great changes to people's daily life. The social network has penetrated the aspects of our lives, and breaks through the way of confidence propagation of traditional media, so that people can freely publish information contents which are considered valuable by us on the social network in the ways of texts or multimedia (such as pictures or videos) and the like. However, the rise of social networks brings convenience to life of people and challenges to network providers. Because people can freely publish or browse messages in the social network, when people find that the network service quality is poor, people often choose to publish network failure declaration or network complaint information such as 'user experience of xxx network providers' on the social network, and the complaint information is public and is easily received by more users, so that the brand quality of the network providers is adversely affected. Therefore, how to accurately position the users who send out the complaint information and improve the network quality experience of the users in a targeted manner becomes a research hotspot in the field of network maintenance.
In the prior art, a network provider can accurately locate a user corresponding to a certain social account in a network communication system only on the premise that the social account has a fixed cooperative relationship with a social network platform, so that user experience is improved in a targeted manner. If the network provider and some social network platforms do not have a fixed cooperative relationship, the network provider cannot analyze and process the complaint information issued by the users of some social network platforms, which results in low complaint feedback efficiency and poor user experience of the communication network provided by the network provider.
Disclosure of Invention
The embodiment of the invention provides a method and a device for determining a user communication identifier, which can enable a communication network to accurately locate a network user corresponding to a social account, and then carry out fault analysis and resolution in a targeted manner, so that the network complaint feedback efficiency of the communication network can be improved, and the user experience of the communication network can be improved.
In a first aspect, an embodiment of the present invention provides a method for determining a user communication identifier. First, N pieces of message attribute information corresponding to N pieces of history messages issued by a target account on a target social platform may be obtained. Here, one history message corresponds to one message attribute information. Then, N time windows may be determined according to the publishing time corresponding to each historical message. And determining the ticket attribute information of any target user in the M target users on any time window according to the target ticket data in any time window so as to obtain N ticket attribute information of each target user in the M target users on the N time windows. Here, the target user is a communication network user having service interaction with the target social platform within each time window, and the target call ticket data is call ticket data associated with the target user. And extracting information characteristics based on the N message attribute information and the N call ticket attribute information of each target user in the M target users on the N time windows to obtain M target characteristic sets corresponding to the M target users. And determining a target probability corresponding to each target user according to the target feature set corresponding to each target user, determining a target user uniquely associated with the target account according to the target probability corresponding to each target user, and determining a user communication identifier corresponding to the target user as a target user communication identifier. Here, a target probability is used to indicate the degree of association between a target user and the target account.
In the embodiment of the invention, after acquiring N message attribute information corresponding to N historical messages issued by a target social account on a social platform and call ticket attribute information of M users on a time window corresponding to each historical message, feature extraction can be performed on the message attribute information and the call ticket attribute information to obtain M target feature sets corresponding to M users. Then, according to the M feature sets, M target probabilities which can be used for indicating the degree of association between the target user and the target social account are determined. And finally, determining a target user uniquely associated with the target social account number from the M target users according to the M target probabilities, and determining a user communication identifier corresponding to the target user in a communication system as a target user communication identifier. The association degree of a certain target user and a target social account is determined through information comparison and statistics between the message attribute of the historical message and ticket attribute information of a certain user on a time window corresponding to the historical message, and a target user communication identifier uniquely associated with the target social account is further determined, so that a network user corresponding to the social account can be accurately positioned by the communication network, then fault analysis and solution are performed in a targeted manner, the network complaint feedback efficiency of the communication network can be improved, and the user experience of the communication network is improved.
In some possible embodiments, a preset time period threshold t may be obtained. And then determining a time window TDi corresponding to any historical message i according to the preset time period threshold t and the release time Ti corresponding to any historical message i to obtain N time windows corresponding to N historical messages. Here, the TDi = [ Ti-t, ti + t ]. The plurality of time windows are determined by taking the issuing time of the historical message as a reference, so that the ticket data corresponding to the service for issuing the historical message can be ensured to be contained in the ticket data acquired in each time window, and the subsequent information characteristic extraction process and the determination process of the target user communication identifier are reasonable and effective.
In some feasible implementation manners, the following information feature extraction operations of the message attribute information and the ticket attribute information can be performed on any one target user i of the M target users: firstly, determining a comparison feature set of the target user i on any time window according to comparison and statistics of the ticket attribute information of the target user i on any time window and the message attribute information of the target user i on any time window, so as to obtain N comparison feature sets corresponding to the target user i on N time windows. Here, one alignment feature set includes S different kinds of alignment features. And then carrying out feature fusion on N comparison feature sets corresponding to the target user i on N time windows to obtain a target feature set corresponding to the target user i. And finally, determining M sign sets corresponding to the M target users according to the information feature extraction results of the message attribute information and the ticket data information corresponding to each target user. And obtaining a comparison characteristic set corresponding to each target user on N time windows through information comparison and statistics, wherein the comparison characteristic set is used for representing the matching degree between the ticket attribute information and the message attribute information of the historical message. And then fusing to obtain a target feature set for representing the association degree between the target user and the target account number for issuing the historical message. The method is easy to implement, reasonable and effective, and can improve the efficiency of the method for determining the user communication identifier.
In some possible embodiments, the S different alignment features include at least one or more of the following: the method comprises the following steps of marking the same type of the initiating terminal, service occurrence time difference, ticket number, uplink flow size, downlink flow relative size, historical message size, multimedia data mark and multimedia data size.
In some possible embodiments, U feature groups to be fused may be determined from N comparison feature sets corresponding to the target user i over N time windows. Here, one feature group to be fused includes one or more comparison features of the target user i in each time window. And determining a target characteristic value corresponding to any feature group to be fused according to a feature fusion result of the comparison features included in any feature group to be fused so as to obtain U target characteristic values corresponding to the U feature groups to be fused. And determining a target feature set corresponding to the target user i according to the U target feature values. The same bit feature or multiple comparison features are subjected to feature fusion, the process is simple, and the association degree between the target user and the target account can be fully reflected by a target feature set obtained through fusion.
In some possible embodiments, the U feature groups to be fused include a first feature group to be fused, where the first feature group to be fused includes a first comparison feature of the target user i in each time window. The average value of the feature values of the first type of comparison features in each time window can be calculated, and the average value is determined as the target feature value corresponding to any feature group to be fused.
In some possible embodiments, the U feature groups to be fused include a second feature group to be fused, and the second feature group to be fused includes a second comparison feature and a third comparison feature of the target user i in each time window. And calculating similarity values between the second comparison features on the time windows and the third comparison features on the time windows, and determining the similarity values as target feature values corresponding to any feature group to be fused.
In some possible embodiments, the U feature groups to be fused include a third feature group to be fused, and the third feature group to be fused includes a fourth comparison feature of the target user i in each time window. And calculating the sum of the feature values of the fourth comparison features on each time window, and determining the ratio of the sum of the feature values to the number N of the historical messages as a target feature value corresponding to any feature group to be fused.
In some feasible embodiments, the target feature sets corresponding to the target users may be sequentially input into a preset classification model, and the target probability corresponding to each target user is determined based on the classification result of the target feature set corresponding to each target user by the classification model. And determining the target probability corresponding to each target user through a preset trained classification model, so that the effectiveness of the target probability can be improved.
In some possible embodiments, the target user corresponding to the highest target probability among the target probabilities corresponding to the target users may be determined as the target user uniquely associated with the target account.
In the embodiment of the invention, the association degree between a certain target user and a target social contact account is determined through information comparison and statistics between the message attribute of the historical message and ticket attribute information of the user in a time window corresponding to the historical message, and a target user communication identifier uniquely associated with the target social contact account is further determined. The network users corresponding to the social account can be accurately positioned by the communication network, and then fault analysis and solution are performed in a targeted manner, so that the network complaint feedback efficiency of the communication network can be improved, and the user experience of the communication network can be improved.
In a second aspect, an embodiment of the present invention provides a device for determining a user communication identifier, where the device includes a unit configured to perform the method for determining a user communication identifier provided in any one of the possible implementations of the first aspect, so that the method for determining a user communication identifier provided in the first aspect can also be beneficial (or advantageous) to implement.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a processor and a memory, where the processor and the memory are connected to each other. The memory is used for storing a computer program, the computer program includes program instructions, and the processor is configured to invoke the program instructions to execute the method for determining a user communication identifier provided by the first aspect, so as to achieve the beneficial effects of the method for determining a user communication identifier provided by the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a computer, the instructions cause the computer to perform the method for determining a user communication identifier provided in any one of the foregoing possible implementation manners of the first aspect, and also can achieve the beneficial effects of the method for determining a user communication identifier provided in the first aspect.
In a fifth aspect, an embodiment of the present invention provides a chip system, where the chip system includes a processor, configured to support a terminal device to implement the functions referred to in the foregoing first aspect, for example, to generate or process information referred to in the method for determining a user communication identifier provided in the foregoing first aspect. In one possible design, the chip system further includes a memory for storing program instructions and data necessary for the terminal. The chip system may be formed by a chip, and may also include a chip and other discrete devices.
In a sixth aspect, an embodiment of the present invention provides a computer program product including instructions, which, when the computer program product runs on a computer, enables the computer to execute the method for determining a user communication identifier provided in the first aspect, and also can achieve the beneficial effects of the method for determining a user communication identifier provided in the first aspect.
Drawings
Fig. 1 is a flowchart illustrating a method for determining a user communication identifier according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating a corresponding relationship between a comparison feature set and each time window according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an apparatus for determining a user communication identifier according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
The method for determining the user communication identifier according to the embodiment of the present invention may be executed by a terminal device with data processing capability, such as a desktop computer, a laptop computer, and the like, which is not limited herein. In the embodiment of the present invention, for example, the first and second nodes before the first feature group to be fused and the second feature group to be fused are only used for distinguishing one or more different feature groups to be fused, and have no other limitation, and the first and second nodes before the names of the subsequent first middle-bit feature and the second comparison feature also have no other limitation.
Example one
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for determining a user communication identifier according to an embodiment of the present invention. For convenience of understanding and description, the present embodiment describes the method for determining the user communication identifier by using the terminal device as an execution subject. The determination method comprises the following steps:
s101, acquiring N pieces of message attribute information corresponding to N pieces of historical messages issued by a target account on a target social platform.
In some feasible embodiments, the terminal device may acquire N pieces of message attribute information corresponding to N pieces of history messages issued by the target account on the target social platform in a certain past period. Here, the target account is a legal account that has already been registered on the target social platform. One history message corresponds to one message attribute information.
In specific implementation, the terminal device may obtain the corresponding historical data of the target social platform in a certain past time period by using methods such as web data capture through an Application Programming Interface (API) provided by the target social platform. And then, screening the historical data to obtain N pieces of historical information issued by the target account within a certain period of time in the past and associated information corresponding to each historical information in the N pieces of historical information. Here, the association information corresponding to the history message is information associated with the publisher of the history message, which is disclosed on the target platform, for example, information such as a terminal device which publishes the history message. And finally, the terminal equipment can determine N pieces of message attribute information corresponding to the N pieces of historical information from the N pieces of historical information and the associated information corresponding to the N pieces of historical information according to a preset first information type. Optionally, the first information type may include a distribution time, a message content, a distribution terminal type, and attribute information of multimedia data included in the message. Here, the multimedia data includes picture data and video data, and is not limited herein. The multimedia data attribute is used for indicating whether the history message contains multimedia data and the size of the contained multimedia data. Similarly, it can be understood that the message attribute information corresponding to the history message also includes at least a distribution time of the history message, a content of the history message, a terminal type of the distribution history message, and attribute information of multimedia data included in the history message. Taking the example of obtaining the message attribute information corresponding to the history message L in the N history messages as an example, after the terminal device obtains the history data of the target social platform through the API interface provided by the target social platform, the terminal device may screen the history message L published by the target user from the history data, and obtain the associated information such as the message publishing time and the publishing terminal type corresponding to the history message L at the same time. And then according to a preset information type, selecting data which accord with the information type from the historical information L and the associated information corresponding to the historical information L, and determining the data as the information attribute information corresponding to the historical information L.
S102, determining N time windows according to the release time corresponding to each historical message, and determining the ticket attribute information of any target user in M target users on any time window according to the target ticket data in any time window to obtain N ticket attribute information of each target user in M target users on N time windows.
In some feasible embodiments, after the terminal device obtains the N pieces of history information, N time windows (one time window is a fixed time period) may be determined according to the publishing time corresponding to the N pieces of history information. And then acquiring target call ticket data in each time window of the N time windows. And the target call ticket data is call ticket data associated with the target social platform in each time window. Then, the terminal device can determine the ticket attribute information corresponding to one or more target users in any time window according to the target ticket data in any time window, and repeat the above operations to finally obtain N ticket attribute information of each target user in M target users on N time windows. Here, the target user is a communication network user having service interaction with the target social platform in each time window.
Optionally, in a specific implementation, after the terminal device acquires the N history messages, it may determine N publishing times corresponding to the N history messages. Here, one history message corresponds to one publication time. Then, a preset time period threshold t is obtained, and a time window TDi corresponding to any historical message i in the N release moments is determined according to the preset time period threshold t and the release moment Ti corresponding to any historical message i. Here, the above TDi = [ Ti-t, ti + t ]. For example, if the preset time period threshold t is 1 minute, and the issuing time corresponding to the history message i is 13: 55 points, the time window corresponding to the history message i is [ 13: 54 points, 13: 56 points ]. And the terminal equipment performs the operation on each of the N release moments to obtain N time windows corresponding to the N historical messages. The plurality of time windows are determined by taking the issuing time of the historical message as a reference, so that the ticket data corresponding to the service for issuing the historical message can be ensured to be contained in the ticket data acquired in each time window, and the subsequent information characteristic extraction process and the determination process of the target user communication identifier are reasonable and effective.
After the N time windows are determined, the terminal device may obtain, through a Deep Packet Inspection (DPI) technique, DPI data in each of the N time windows. Then, the terminal device can extract call ticket data associated with the target social platform from the DPI data in each time window to determine target call ticket data corresponding to each time window in the N time windows. Here, it should be noted that the ticket data associated with the target social platform in a certain time window is the ticket data corresponding to the service request initiated by one or more target users in the time window and associated with the target social platform. Different service types generate different call ticket data, for example, a target user browses a webpage of a target social platform and the target user publishes a message on the target social platform can generate different call ticket data, so that one target user can correspond to one or more call ticket data in one time window. For example, in a time window TDi, the terminal device determines that data interaction exists between three users, namely a target user 1, a target user 2 and a target user 3, and the target social contact platform through DPI detection, and then the terminal device can determine all call ticket data of the target user 1 in the time window TDi, all call ticket data of the target user 2 in the time window TDi and all call ticket data of the target user 3 in the time window TDi as target call ticket data corresponding to the time window TDi. And then, the terminal equipment determines N pieces of ticket attribute information corresponding to each target user in the M target users on the N time windows from the target ticket data corresponding to each time window according to a preset second information type. Here, the second information type may include a terminal type corresponding to the ticket, a ticket start time, a user communication identifier included in the ticket, an uplink traffic size corresponding to the ticket, and a downlink traffic size corresponding to the ticket. Specifically, take an example of determining a ticket attribute information corresponding to the target user i in the time window time TDi. After acquiring the target call ticket data corresponding to the time window TDi, the terminal device may extract one or more call ticket data corresponding to the target user i, and determine the call ticket data with the maximum uplink flow in the one or more call ticket data as the suspected call ticket data corresponding to the target user i. Then, the terminal device may extract the ticket attribute information of the second information type according to the symbol extracted from the suspected ticket data, and determine the ticket attribute information corresponding to the suspected ticket data as the ticket attribute information corresponding to the target user i over the time window TDi, for example, the terminal type corresponding to the ticket, the start time of the ticket, the user communication identifier included in the ticket, the uplink traffic size corresponding to the ticket, and the downlink traffic size corresponding to the ticket. It should be further noted that the M target users are all target users that appear in the N time windows. For example, assuming that N is 2, the target user 1, the target user 2, the target user 3, and the target user 4 exist in the time window TD1, and the target user 1, the target user 2, and the target user 4 exist in the time window 2, after the terminal device obtains the target call ticket data corresponding to the 2 time windows, it may determine 4 call ticket attribute information corresponding to 4 target users in total, which are the target user 1, the target user 2, the target user 3, and the target user 4 (the call ticket attribute information corresponding to the target user 3 in the time window TD2 is null).
The following illustrates a process of obtaining, by a terminal device, N pieces of ticket attribute information of each target user in M target users over N time windows. Suppose that the terminal device determines 4 time windows, which are time window TD1, time window TD2, time window TD3, and time window TD4. The terminal device can obtain first target call ticket data corresponding to the time window TD1, second target call ticket data corresponding to the time window TD2, third target call ticket data corresponding to the time window TD3 and fourth target call ticket data corresponding to the time window TD4 through the DPI probe. Here, it is assumed that the first target call ticket data includes call ticket data corresponding to a target user 1, a target user 2, a target user 3, and a target user 4; the second target call ticket data comprises call ticket data corresponding to a target user 1, a target user 3 and a target user 4; the third target call ticket data comprises call ticket data corresponding to a target user 1, a target user 2, a target user 3 and a target user 4; the fourth target call ticket data comprises a target user 1, a target user 2, a target user 3 and corresponding call ticket data. Then, the terminal device can determine a first suspected call ticket, a second suspected call ticket, a third suspected call ticket and a fourth suspected call ticket corresponding to the target user 1 on the time window TD1, the time window TD2, the time window TD3 and the time window TD4 according to the first target call ticket data, the second target call ticket, the third target call ticket and the fourth target call ticket data, and then extract a terminal type, a start time, a user communication identifier, an uplink flow rate and a downlink flow rate corresponding to each suspected call ticket in the first suspected call ticket, the second suspected call ticket, the third suspected call ticket and the fourth suspected call ticket corresponding to the target user 1, so as to obtain 4 corresponding attribute information of the target user 1 on the time window TD1, the time window TD2, the time window TD3 and the time window TD4. Similarly, the terminal device may determine 4 pieces of ticket attribute information corresponding to the target user 2, the target user 3, and the target user 4 on the time window TD1, the time window TD2, the time window TD3, and the time window TD4, where the ticket attribute information corresponding to the target user 2 on the time window TD2 may be null, and the ticket attribute information corresponding to the target user 4 on the time window TD4 may be null. Finally, the terminal equipment can determine the corresponding call ticket attribute information of each target user in the 4 target users on each time window.
S103, information feature extraction is carried out on the basis of the N message attribute information and the N ticket attribute information of each target user in the M target users on the N time windows, so that M target feature sets corresponding to the M target users are obtained.
In some feasible implementation manners, after acquiring the N message attribute information and the N ticket attribute information of each target user in the M target users on the N time windows, the terminal device may perform information feature extraction based on the N message attribute information and the N ticket attribute information of each target user in the M target users on the N time windows, so as to obtain M target feature sets corresponding to the M target users. Here, one target user corresponds to one target feature set. For convenience of understanding and description, a process of determining M target feature sets corresponding to M target users by the terminal device is described below by taking a process of determining a target feature set corresponding to a target user i as an example.
Optionally, in a specific implementation, after determining the corresponding ticket attribute information of the target user i in each time window, the terminal device may compare and count the corresponding message attribute information in any time window TDi in each time window with the corresponding ticket attribute information of the target user i, so as to obtain a corresponding comparison feature set of the target user i in the time window TDi. Here, the alignment feature set includes S types of alignment features. Optionally, the comparison feature set may specifically include 8 comparison features, that is, a flag V1 that the types of the originating terminals are the same, a service occurrence time difference value V2, a number of call tickets V3, an uplink traffic size V4, a downlink traffic relative size V5, a historical message size V6, a multimedia data flag V7, and a multimedia data size V8. The following description will be given by taking the example of the 8 alignment features in the alignment feature set.
Suppose that the time window TDi corresponds to the historical message Z and corresponds to the message attribute information Zi, wherein the message attribute information Zi includes the message Z publishing time t1, the terminal type of the publishing message Z, the content size Q1 of the message Z, and the multimedia attribute information of the message Z. The ticket attribute information corresponding to the target user i comprises a ticket terminal type, a ticket starting time t2, a user communication identifier in a ticket, a ticket uplink flow rate Q2 and a ticket downlink flow rate Q3.
The terminal equipment can compare whether the terminal type of the issued message Z is consistent with the call ticket terminal type, if so, the characteristic value of the comparison characteristic V1 is determined to be 1, and if not, the comparison characteristic V1 is determined to be 0. The terminal equipment can calculate the difference value between the message Z release time t1 and the ticket starting time t2 and determine the value of the comparison characteristic V2 to be t1-t2. The terminal equipment can determine the value of the comparison characteristic V3 according to the number of the call tickets of the target user i in the time window TDi. The terminal equipment can determine the call ticket uplink flow size Q2 in the call ticket attribute information corresponding to the target user i as a value of the comparison characteristic V4. The terminal device can determine the relative magnitude of the downlink flow of the suspected call ticket of the target user i on the time window TDi as the value of the comparison characteristic V5. If the terminal device can compare the downlink traffic volume of the suspected call ticket of the target user i on the time window TDi (i.e. Q3) with the downlink traffic volumes of the two call tickets adjacent to the suspected call ticket at the time of occurrence of the suspected call ticket, so as to determine the relative downlink traffic volume of the suspected call ticket of the target user i. For example, it is assumed that the downlink traffic volume of a suspected ticket of the target user i at the time of the TDi is y1, and the downlink traffic volumes of two tickets adjacent to the target user i before and after the occurrence time of the target user i are y0 and y2, respectively. The predetermined relative size is 1,2,3. The terminal equipment can compare y0, y1 and y2 to obtain the magnitude sequence of y1 in the three data of y0, y1 and y2, and if the sequence is that y0 is greater than y1 and less than y2, the terminal equipment can determine that the relative downlink flow of the suspected ticket of the target user i is 2, namely, determine that the value of the relative downlink flow V5 is 2. The terminal device may also determine the content size Q1 of the history message Z as a value of the comparison feature V6. The terminal equipment can also judge whether the message Z contains multimedia data according to the multimedia attribute information of the message Z. If the comparison result is positive, the value of the comparison characteristic V7 is determined to be 1, otherwise, the value of the comparison characteristic V7 is determined to be 0. The terminal device may further determine the value of the comparison characteristic V8 according to the size of the multimedia data included in the multimedia attribute information of the message Z. Finally, the terminal device may form a comparison feature set corresponding to the target user i on the time window TDi through the comparison features V1 to V8. Similarly, the terminal device performs the comparison and the statistical operation on the message attribute information on each time window and the ticket attribute information corresponding to the target user i, so as to obtain N comparison feature sets corresponding to the target user i on N time windows. Moreover, each bit feature set includes the above 8 alignment features.
Here, it is easily understood that, the terminal device performs the above comparison and statistics operations on each target user of the M target users, that is, may obtain a comparison feature set corresponding to each target user of the M target users in each time window. For convenience of understanding, please refer to fig. 2, and fig. 2 is a schematic diagram illustrating a correspondence relationship between a comparison feature set and each time window according to an embodiment of the present invention. As can be seen from the figure, any one of the M target users corresponds to one comparison feature set in a certain time window, that is, one target user corresponds to N comparison feature sets in N time windows (for example, the target user 1 corresponds to the comparison feature set 1 to the comparison feature set N, which are N comparison feature sets in total), and the M users correspond to M × N comparison feature sets in total in the N time windows.
In some feasible embodiments, after the terminal device determines N comparison feature sets corresponding to N time windows in the target user i, the terminal device may perform feature fusion on the N comparison feature sets to obtain a target feature set corresponding to the target user i. Specifically, the terminal device may determine U feature groups to be fused according to the N comparison feature sets. Here, one feature group to be fused may include one or more comparison features of the target user i in each time window. For example, a certain feature group to be fused may include N comparison features V1 of the target user i in N time windows. A certain feature group to be fused may include N comparison features V4 and N comparison features V6 of the target user i in N time windows. Then, the terminal device can fuse the comparison features included in the U feature groups to be fused to obtain U target feature values. Here, a target feature value is obtained by fusing a feature group to be fused. Finally, the terminal device may combine the U target feature values into a target feature set corresponding to the target user i.
In a specific implementation, it is optionally assumed that the U feature groups to be fused may include a first feature group to be fused, where the first feature group to be fused includes first comparison features of the target user i in the time windows. The terminal equipment can calculate the average value of the characteristic values of the first comparison characteristics on each time window, and determines the average value as the target characteristic value corresponding to the first characteristic group to be fused. Optionally, it is assumed that the U feature groups to be fused include a second feature group to be fused, and the second feature group to be fused includes a second comparison feature and a third comparison feature of the target user i in each time window. The terminal equipment can calculate the similarity value between the second comparison characteristic on each time window and the third comparison characteristic on each time window, and determines the similarity value as the target characteristic value corresponding to the second feature group to be fused. Optionally, it is assumed that the U feature groups to be fused include a third feature group to be fused, and the third feature group to be fused includes fourth comparison features of the target user i in each time window. The terminal equipment can calculate the sum of the characteristic values of the fourth comparison characteristic on each time window, and determine the ratio of the sum of the characteristic values to the number N of the historical messages as the target characteristic value corresponding to the third characteristic group to be fused.
For example, suppose that 8 feature groups to be fused are preset, and the first feature group to be fused includes N comparison features V of the target user i in each time window1. The terminal equipment can calculate the ratio B' of the sum of the characteristic values corresponding to the N comparison characteristics V1 to the number N of the historical messages, and log 10 (B') determining a target characteristic value corresponding to the first feature group to be fused. The second feature group to be fused includes N comparison features V2 of the target user i in each time window, and the terminal device may calculate an average value of feature values of the N comparison features V2, and determine the average value as a target feature value corresponding to the second feature group to be fused. The third feature group to be fused includes N comparison features V3 of the target user i in each time window, and the terminal device may calculate an average value of feature values of the N comparison features V3, and determine the average value as a target feature value corresponding to the third feature group to be fused. The fourth feature group to be fused includes N comparison features V4 and comparison features V6 of the target user i in each time window, and the terminal device may calculate a similarity value between a first sequence composed of feature values of the N comparison features V4 and a second sequence composed of feature values of the N comparison features V6, and determine the similarity value as a target feature value corresponding to the fourth feature group to be fused. Here, optionally, the terminal device may calculate a pearson coefficient of the first sequence and the second sequence, and determine the coefficient as a target feature value corresponding to the fourth feature group to be fused. The fifth feature group to be fused includes N comparison features V5 of the target user i in N time windows, and the terminal device may calculate a mean value of the N comparison features V5, and determine the mean value as a target feature value corresponding to the fifth feature group to be fused. The sixth feature group to be fused includes N comparison features V7 of the target user on N time beds, and the terminal device may calculate a sum of feature values of the N comparison features V7, and determine a ratio of the sum of feature values to the number N of the historical messages as a target feature value corresponding to the sixth feature group to be fused. The seventh feature value to be fused includes N comparison features V4, N comparison features V7, and N comparison features V8 corresponding to the target user i in the N time windows. The terminal device may calculate a ratio of each comparison feature V8 in the N comparison features V8 to each comparison feature V4 in the N comparison features V4 to obtain N ratios. For example, assume that there are 4 alignment features V8, V81, V82, V83 and V84, respectively, and 4 alignment features V4, respectivelyThe terminal equipment can calculate 4 ratios of V81/V41, V82/V42, V83/V43 and V84/V44 respectively. Then, the terminal device may determine a sum of the N ratios, may also determine a sum of the feature values of the N comparison features V7, and finally determines a ratio between the sum of the N ratios and the sum of the feature values of the N comparison features V7 as a target feature value of a seventh feature value to be fused. The eighth feature group to be fused includes N comparison features V3 of the target user i in N time windows. The terminal device may determine N logical values according to the N comparison feature values V3, where the logical value is 1 when the value of the comparison feature V3 is not 0, and the logical value is 0 when the value of the comparison feature V3 is 0. The terminal device may calculate the sum of the N logical values, and determine a ratio of the sum of the N logical values to the number N of the historical messages as the target feature value of the eighth feature group to be fused. Finally, the terminal device may combine the 8 target feature values corresponding to the first to-be-fused feature group to the eighth to-be-fused feature group into a target feature value set corresponding to the target user i.
Here, it is easily understood that the terminal device may repeat the above feature fusion operation on N comparison features for each pair of target users among the M target users, so as to obtain M target feature sets corresponding to the M target users.
S104, determining a target probability corresponding to each target user according to the target feature set corresponding to each target user, determining a target user uniquely associated with the target account according to the target probability corresponding to each target user, and determining a user communication identifier corresponding to the target user as a target user communication identifier.
In some feasible embodiments, after acquiring M target feature sets corresponding to the M target users, the terminal device may perform data analysis and processing on each target feature set in the M target feature sets to obtain a target probability corresponding to each target user. Then, the terminal device can determine a target user uniquely associated with the target account according to the target probability corresponding to each target user, and determine a user communication identifier corresponding to the target user as a target user communication identifier. The user communication identification corresponding to the target user can be determined by the ticket data of the target user. A target probability is used to indicate a degree of association between a target user and the target account.
Taking the process of determining the target probability corresponding to the target user i according to the target feature set j corresponding to the target user i as an example, the process of determining the target probability corresponding to each target user by the terminal device according to the target feature set corresponding to each target user is described below.
Optionally, in a specific implementation, the terminal device may input the target feature set j into a classification model trained in advance, and then determine a target probability corresponding to the target user i according to an output result of the classification model. Here, the classification model may include a classification model based on a random forest machine learning algorithm, a classification model based on a neural network algorithm, and the like, which is not limited herein. Similarly, the terminal device sequentially inputs the target feature sets corresponding to each target user of the M target users into the classification model, that is, the M target probabilities corresponding to the M target users can be determined according to the output result of the classification model.
After the terminal device obtains the M target probabilities corresponding to the M users, the terminal device may determine, according to the target probabilities corresponding to the target users, a target user uniquely associated with the target account. Optionally, the terminal device may determine a maximum target probability among the M target probabilities, and determine a target user corresponding to the maximum target probability as a target user uniquely associated with the target account. And finally, the terminal equipment can extract the user communication identification corresponding to the target user from the ticket data corresponding to the target user, and determines the user communication identification corresponding to the target user as the target user communication identification uniquely matched with the target account.
Optionally, the terminal device may further train a preset classification model to be trained to obtain the trained classification model. Specifically, the terminal device may obtain a positive sample feature set and a negative sample feature set corresponding to E positive sample users. In the following, a process of acquiring the positive sample target feature set and the negative sample target feature set corresponding to the positive sample user c by the terminal device is taken as an example. The terminal equipment can firstly acquire message attribute information corresponding to F historical messages issued by a positive sample user c on a target social platform through a sample account in a past preset time period. And then F time windows are determined based on F publishing moments corresponding to the F historical messages. For a specific process, reference may be made to the process for determining N time windows described above, and details are not repeated here. Then, the terminal device may obtain F pieces of ticket data attribute information corresponding to the positive sample user c over the F time windows, and determine F sample comparison feature sets corresponding to the positive sample user c over the F time windows based on the F pieces of ticket data attribute information corresponding to the positive sample user c over the F time windows and the message attribute information corresponding to the F pieces of historical messages. Meanwhile, the terminal device may further obtain a negative sample target feature set corresponding to each of one or more negative sample users other than the sample user c. Similarly, the above operations are repeated, and the terminal device may obtain E positive sample target feature sets corresponding to the E positive sample users and negative sample target feature sets corresponding to a plurality of negative sample users. Then, the terminal device may label E positive sample target feature sets corresponding to the E positive sample users and negative sample target feature sets corresponding to the negative sample users. The label is used for indicating whether a sample user associated with the corresponding sample target feature set is uniquely matched with the sample account. For example, a positive sample target feature set may be labeled as 1 and a negative sample target feature set may be labeled as 0. And finally, the terminal equipment can sequentially input the E labeled positive sample target feature sets and the negative sample target feature sets corresponding to the negative sample users into the classification model to be trained, and repeatedly train the classification model to be trained until the model parameters of the classification model to be trained are converged, so that the trained classification model can be obtained.
Optionally, in practical application, in order to expand the number of the positive samples or the negative samples, after the terminal device obtains F sample comparison feature sets corresponding to the positive sample user c, one or more sample comparison feature sets may be randomly sampled from the F sample comparison feature sets. For example. The method can randomly sample for F times, wherein 1 sample comparison feature set is sampled for the first time, 2 sample comparison feature sets are sampled for the second time, and the like, and F sample comparison feature sets are sampled for the F time. And performing feature fusion on the combination of the F sample comparison feature sets obtained by the F times of sampling, so as to obtain F positive sample target feature sets corresponding to the positive sample users. Similarly, the negative sample target feature set may also obtain a sample process by using the above method.
In the embodiment of the invention, after acquiring N message attribute information corresponding to N historical messages issued by a target social account on a social platform and call ticket attribute information of M users on a time window corresponding to each historical message, feature extraction can be performed on the message attribute information and the call ticket attribute information to obtain M target feature sets corresponding to M users. Then, according to the M feature sets, M target probabilities which can be used for indicating the degree of association between the target user and the target social account are determined. And finally, determining a target user uniquely associated with the target social account from the M target users according to the M target probabilities, and determining a user communication identifier corresponding to the target user in a communication system as a target user communication identifier. The association degree of a certain target user and a target social account is determined through information comparison and statistics between the message attribute of the historical message and ticket attribute information of a certain user on a time window corresponding to the historical message, and a target user communication identifier uniquely associated with the target social account is further determined, so that a network user corresponding to the social account can be accurately positioned by the communication network, then fault analysis and solution are performed in a targeted manner, the network complaint feedback efficiency of the communication network can be improved, and the user experience of the communication network is improved.
Example two
Referring to fig. 3, fig. 3 is a schematic structural diagram of a device for determining a user communication identifier according to an embodiment of the present invention. The determination device includes:
the message attribute information determining unit 10 is configured to acquire N pieces of message attribute information corresponding to N pieces of history messages issued by a target account on a target social platform. Here, one history message corresponds to one message attribute information.
The ticket attribute information determining unit 20 is configured to determine N time windows according to the release time corresponding to each historical message acquired by the message attribute information determining unit 10, and determine ticket attribute information of any one of M target users on any one time window according to target ticket data in any one time window, so as to obtain N ticket attribute information of each target user of the M target users on the N time windows. Here, the target user is a communication network user having service interaction with the target social platform within each time window, and the target call ticket data is call ticket data associated with the target user.
An information feature extraction unit 30, configured to perform information feature extraction based on the N pieces of message attribute information determined by the message attribute information determination unit 10 and the N pieces of ticket attribute information of each target user of the M target users on the N time windows determined by the ticket attribute information determination unit 20, so as to obtain M target feature sets corresponding to the M target users.
A user communication identifier determining unit 40, configured to determine a target probability corresponding to each target user according to the target feature set corresponding to each target user determined by the information feature extracting unit 30. And determining the target user uniquely associated with the target account according to the target probability corresponding to each target user, and determining the user communication identifier corresponding to the target user as the target user communication identifier. Here, a target probability is used to indicate the degree of association between a target user and the target account.
In some possible embodiments, the ticket attribute information determining unit 20 is configured to:
and acquiring a preset time period threshold t. And determining a time window TDi corresponding to any historical message i according to the preset time period threshold t and the release time Ti corresponding to any historical message i to obtain N time windows corresponding to N historical messages. Here, the TDi = [ Ti-t, ti + t ].
In some possible embodiments, the information feature extraction unit 30 is configured to:
and performing the following information characteristic extraction operation of the message attribute information and the call ticket data information on any target user i in the M target users: and determining a comparison feature set of the target user i on any time window according to comparison and statistics of the ticket attribute information of the target user i on any time window determined by the ticket attribute information determination unit 20 and the message attribute information of the target user i on any time window determined by the message attribute information determination unit 10, so as to obtain N comparison feature sets corresponding to the target user i on N time windows. Here, one alignment feature set includes S different kinds of alignment features. And performing feature fusion on N comparison feature sets corresponding to the target user i on N time windows to obtain a target feature set corresponding to the target user i. And determining M target feature sets corresponding to the M target users according to the information feature extraction results of the message attribute information and the ticket attribute information corresponding to each target user.
In some possible embodiments, the S species-distinct alignment features include at least one or more of: the method comprises the following steps of marking the same type of the initiating terminal, service occurrence time difference, ticket number, uplink flow size, downlink flow relative size, historical message size, multimedia data mark and multimedia data size.
In some possible embodiments, the information feature extraction unit 30 is configured to:
and determining U feature groups to be fused from N comparison feature sets corresponding to the target user i in N time windows. And one feature group to be fused comprises one or more comparison features of the target user i on each time window. And determining a target characteristic value corresponding to any feature group to be fused according to a feature fusion result of the comparison features included in any feature group to be fused so as to obtain U target characteristic values corresponding to the U feature groups to be fused. And determining a target feature set corresponding to the target user i according to the U target feature values.
In some possible embodiments, the U feature groups to be fused include a first feature group to be fused, where the first feature group to be fused includes a first comparison feature of the target user i in each time window. The information feature extraction unit 30 is configured to: and calculating the average value of the characteristic values of the first type of comparison characteristics on each time window, and determining the average value as a target characteristic value corresponding to the first characteristic group to be fused.
In some possible embodiments, the U feature groups to be fused include a second feature group to be fused, and the second feature group to be fused includes a second comparison feature and a third comparison feature of the target user i in each time window. The information feature extraction unit 30 is configured to: and calculating similarity values between the second comparison features on each time window and the third comparison features on each time window, and determining the similarity values as target feature values corresponding to the second feature group to be fused.
In some possible embodiments, the U feature groups to be fused include a third feature group to be fused, and the third feature group to be fused includes a fourth alignment feature of the target user i in each time window. The information feature extraction unit 30 is configured to: and calculating the sum of the feature values of the fourth comparison features on each time window, and determining the ratio of the sum of the feature values to the number N of the historical messages as a target feature value corresponding to the third feature group to be fused.
In some possible embodiments, the user communication identity determining unit 40 is configured to:
and sequentially inputting the target feature sets corresponding to the target users determined by the information feature extraction unit into a preset classification model, and determining the target probability corresponding to each target user based on the classification result of the target feature set corresponding to each target user by the classification model to which the target feature set belongs.
In some possible embodiments, the user communication identity determining unit 40 is configured to:
and determining the target user corresponding to the maximum target probability in the target probabilities corresponding to the target users as the target user uniquely associated with the target account.
In some possible embodiments, the message attribute information determining unit 10 may obtain N pieces of message attribute information corresponding to N pieces of history messages published by the target account on the target social platform. Wherein, a piece of historical information corresponds to a piece of message attribute information. For a specific process, reference may be made to the process of obtaining N pieces of message attribute information corresponding to N pieces of history messages described in step S101 in the first embodiment, and details are not repeated here. The bill attribute information determining unit 20 may be configured to determine N time windows according to the release time corresponding to each historical message acquired by the message attribute information determining unit 10, and the specific process may refer to a process of determining N time windows by using the traffic data described in step S102 in the first embodiment, which is not described herein again. Then, the ticket attribute information determining unit 20 may determine the ticket attribute information of any one of the M target users in any time window according to the target ticket data in any time window, so as to obtain N ticket attribute information of each target user in the M target users in the N time windows. For a specific process, reference may be made to the process of determining the individual ticket attribute information described in step S102 in the first embodiment, and details are not described here again. Then, the information feature extraction unit 30 may perform information feature extraction based on the N pieces of message attribute information determined by the message attribute information determination unit 10 and the N pieces of ticket attribute information of each target user of the M target users on the N time windows determined by the ticket attribute information determination unit 20, so as to obtain M target feature sets corresponding to the M target users. For a specific process, reference may be made to the process of determining M target feature sets corresponding to M target users described in step S103 of the embodiment, and details are not repeated here. Finally, the user communication identifier determining unit 40 may determine the target probability corresponding to each target user according to the target feature set corresponding to each target user determined by the information feature extracting unit 30. And then determining a target user uniquely associated with the target account according to the target probability corresponding to each target user, and determining a user communication identifier corresponding to the target user as a target user communication identifier. For a specific process, refer to the process of determining the target user communication identifier described in step S104 in the first embodiment, which is not described herein again.
In the embodiment of the invention, the association degree between a certain target user and a target social account is determined through information comparison and statistics between the message attribute of the historical message and the ticket attribute information of the certain user in the time window corresponding to the historical message, and the target user communication identifier uniquely associated with the target social account is further determined, so that the communication network can accurately position the network user corresponding to the social account, and then fault analysis and solution are carried out in a targeted manner, the network complaint feedback efficiency of the communication network can be improved, and the user experience of the communication network can be improved.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic device provided by the embodiment of the present invention includes a processor 401, a memory 402, and a bus system 403. The processor 401 and the memory 402 are connected by a bus system 403.
The memory 402 is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory 402 includes, but is not limited to, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a portable read-only memory (CD-ROM). Only one memory is shown in fig. 4, but of course, the memory may be provided in plural numbers as necessary.
The memory 402 may also be a memory in the processor 401, which is not limited herein.
The memory 402 stores the following elements, executable modules or data structures, or a subset thereof, or an expanded set thereof:
and (3) operating instructions: including various operational instructions for performing various operations.
Operating the system: including various system programs for implementing various basic services and for handling hardware-based tasks.
The processor 401 controls the operation of the electronic device, and the processor 401 may be one or more Central Processing Units (CPUs). In the case where the processor 401 is a CPU, the CPU may be a single-core CPU or a multi-core CPU.
In a particular application, the various components of the electronic device are coupled together by a bus system 403, wherein the bus system 403 may include a power bus, a control bus, a status signal bus, and the like, in addition to a data bus. But for clarity of illustration the various buses are labeled as bus system 403 in figure 4. For ease of illustration, it is only drawn schematically in fig. 4.
The method for identifying the user communication disclosed by the embodiment of the invention can be applied to the processor 401, or can be implemented by the processor 401. The processor 401 may be an integrated circuit chip having signal processing capabilities.
An embodiment of the present invention provides a computer-readable storage medium, which stores instructions that, when executed on a computer, implement a method for determining a user communication identifier described in the first embodiment.
The computer readable storage medium may be an internal storage unit of the apparatus for determining a user communication identifier according to the first embodiment. The computer readable storage medium may also be an external storage device of the terminal device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, which are provided on the terminal device. Further, the computer-readable storage medium may include both an internal storage unit and an external storage device of the terminal device. The computer-readable storage medium stores the computer program and other programs and data required by the terminal device. The above-described computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
Those skilled in the art can understand that all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer readable storage medium and can include the processes of the method embodiments described above when executed. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM or RAM, magnetic or optical disks, etc.

Claims (22)

1. A method for determining a user communication identity, the method comprising:
acquiring N pieces of message attribute information corresponding to N pieces of historical messages issued by a target account on a target social platform, wherein one piece of historical message corresponds to one piece of message attribute information;
determining N time windows according to the release time corresponding to each historical message, and determining the ticket attribute information of any target user in M target users on any time window according to the target ticket data in any time window to obtain the N ticket attribute information of each target user in M target users on the N time windows, wherein the target user is a communication network user with service interaction with the target social platform in each time window, and the target ticket data is ticket data associated with the target user;
performing information feature extraction based on the N message attribute information and N ticket attribute information of each target user in the M target users on the N time windows to obtain M target feature sets corresponding to the M target users;
determining a target probability corresponding to each target user according to the target feature set corresponding to each target user, determining a target user uniquely associated with the target account according to the target probability corresponding to each target user, and determining a user communication identifier corresponding to the target user as a target user communication identifier, wherein one target probability is used for indicating the association degree between one target user and the target account.
2. The method of claim 1, wherein the determining N time windows according to the publishing time corresponding to each historical message comprises:
acquiring a preset time period threshold t;
determining a time window TDi corresponding to any historical message i according to the preset time period threshold t and the release time Ti corresponding to any historical message i to obtain N time windows corresponding to N historical messages;
wherein, the TDi = [ Ti-t, ti + t ].
3. The method of claim 2, wherein the performing feature extraction based on the N message attribute information and the N ticket attribute information of each target user in the M target users over the N time windows to obtain M target feature sets corresponding to the M target users comprises:
and carrying out the following information characteristic extraction operation of the message attribute information and the ticket data information on any target user i in the M target users:
determining a comparison feature set of the target user i on any time window according to comparison and statistics of the ticket attribute information of the target user i on any time window and the message attribute information of the target user i on any time window so as to obtain N comparison feature sets corresponding to the target user i on N time windows, wherein one comparison feature set comprises S comparison features of different types;
performing feature fusion on N comparison feature sets corresponding to the target user i on N time windows to obtain a target feature set corresponding to the target user i;
and determining M target feature sets corresponding to the M target users according to the information feature extraction results of the message attribute information and the ticket attribute information corresponding to each target user.
4. The method of claim 3, wherein the S species-specific alignment features comprise at least one or more of: the type of the initiating terminal is the same, the service occurrence time difference value, the number of call tickets, the size of uplink flow, the relative size of downlink flow, the size of historical information, the multimedia data mark and the size of multimedia data.
5. The method according to claim 4, wherein the performing feature fusion on the N comparison feature sets corresponding to the target user i over N time windows to obtain the target feature set corresponding to the target user i comprises:
determining U feature groups to be fused from N comparison feature sets corresponding to the target user i on N time windows, wherein one feature group to be fused comprises one or more comparison features of the target user i on each time window;
determining a target characteristic value corresponding to any feature group to be fused according to a feature fusion result of the comparison features included in the feature group to be fused to obtain U target characteristic values corresponding to the U feature groups to be fused;
and determining a target feature set corresponding to the target user i according to the U target feature values.
6. The method according to claim 5, wherein the U feature groups to be fused include a first feature group to be fused, and the first feature group to be fused includes a first comparison feature of the target user i in each time window;
the determining, according to the feature fusion result of the comparison features included in any feature group to be fused, a target feature value corresponding to any feature group to be fused includes:
and calculating the average value of the characteristic values of the first comparison characteristics on each time window, and determining the average value as the target characteristic value corresponding to the first characteristic group to be fused.
7. The method according to claim 5 or 6, wherein the U feature groups to be fused include a second feature group to be fused, and the second feature group to be fused includes a second comparison feature and a third comparison feature of the target user i in each time window;
the determining, according to the feature fusion result of the compared features included in any feature group to be fused, a target feature value corresponding to any feature group to be fused includes:
and calculating similarity values between the second comparison features on the time windows and the third comparison features on the time windows, and determining the similarity values as target feature values corresponding to the second feature group to be fused.
8. The method according to claim 5 or 6, wherein the U feature groups to be fused include a third feature group to be fused, and the third feature group to be fused includes a fourth comparison feature of the target user i in each time window;
the determining, according to the feature fusion result of the comparison features included in any feature group to be fused, a target feature value corresponding to any feature group to be fused includes:
and calculating the sum of the feature values of the fourth comparison features on each time window, and determining the ratio of the sum of the feature values to the number N of the historical messages as a target feature value corresponding to the third feature group to be fused.
9. The method of claim 8, wherein the determining the target probability corresponding to each target user according to the target feature set corresponding to each target user comprises:
and sequentially inputting the target feature sets corresponding to the target users into a preset classification model, and determining the target probability corresponding to each target user based on the classification result of the target feature set corresponding to each target user of the classification model.
10. The method of claim 9, wherein the determining the target user uniquely associated with the target account according to the target probability corresponding to each target user comprises:
and determining the target user corresponding to the maximum target probability in the target probabilities corresponding to the target users as the target user uniquely associated with the target account.
11. An apparatus for determining a user communication identity, the apparatus comprising:
the message attribute information determining unit is used for acquiring N pieces of message attribute information corresponding to N pieces of historical messages issued by a target account on a target social contact platform, wherein one piece of historical message corresponds to one piece of message attribute information;
a ticket attribute information determining unit, configured to determine N time windows according to release moments corresponding to the historical messages, and determine ticket attribute information of any one of M target users on any one time window according to target ticket data in any one time window, so as to obtain N ticket attribute information of each target user in the M target users on the N time windows, where the target user is a communication network user having service interaction with the target social platform in each time window, and the target ticket data is ticket data associated with the target user;
an information feature extraction unit, configured to perform information feature extraction based on the N pieces of message attribute information determined by the message attribute information determination unit and the N pieces of ticket attribute information of each target user of the M target users on the N time windows determined by the ticket attribute information determination unit, so as to obtain M target feature sets corresponding to the M target users;
and the user communication identifier determining unit is used for determining target probabilities corresponding to the target users according to the target feature sets corresponding to the target users determined by the information feature extracting unit, determining target users uniquely associated with the target account according to the target probabilities corresponding to the target users, and determining the user communication identifiers corresponding to the target users as target user communication identifiers, wherein one target probability is used for indicating the association degree between one target user and the target account.
12. The apparatus of claim 11, wherein the ticket attribute information determining unit is configured to:
acquiring a preset time period threshold t;
determining a time window TDi corresponding to any historical message i according to the preset time period threshold t and the release time Ti corresponding to any historical message i to obtain N time windows corresponding to N historical messages;
wherein, the TDi = [ Ti-t, ti + t ].
13. The determination apparatus according to claim 12, wherein the information feature extraction unit is configured to:
and performing the following information characteristic extraction operation of the message attribute information and the call ticket data information on any target user i in the M target users:
determining a comparison feature set of the target user i on any time window according to comparison and statistics of the ticket attribute information of the target user i on any time window determined by the ticket attribute information determination unit and the message attribute information of the target user i on any time window determined by the message attribute information determination unit, so as to obtain N comparison feature sets corresponding to the target user i on N time windows, wherein one comparison feature set comprises S comparison features of different types;
performing feature fusion on N comparison feature sets corresponding to the target user i on N time windows to obtain a target feature set corresponding to the target user i;
and determining M target feature sets corresponding to the M target users according to the information feature extraction results of the message attribute information and the ticket attribute information corresponding to each target user.
14. The apparatus according to claim 13, wherein the S different alignment features comprise at least one or more of: the type of the initiating terminal is the same, the service occurrence time difference value, the number of call tickets, the size of uplink flow, the relative size of downlink flow, the size of historical information, the multimedia data mark and the size of multimedia data.
15. The determination apparatus according to claim 14, wherein the information feature extraction unit is configured to:
determining U feature groups to be fused from N comparison feature sets corresponding to the target user i on N time windows, wherein one feature group to be fused comprises one or more comparison features of the target user i on each time window;
determining a target characteristic value corresponding to any feature group to be fused according to a feature fusion result of the comparison features included in the feature group to be fused to obtain U target characteristic values corresponding to the U feature groups to be fused;
and determining a target feature set corresponding to the target user i according to the U target feature values.
16. The apparatus according to claim 15, wherein the U feature groups to be fused include a first feature group to be fused, and the first feature group to be fused includes a first comparison feature of the target user i in each time window;
the information feature extraction unit is configured to: and calculating the average value of the characteristic values of the first comparison characteristics on each time window, and determining the average value as the target characteristic value corresponding to the first characteristic group to be fused.
17. The apparatus according to claim 15 or 16, wherein the U feature groups to be fused include a second feature group to be fused, and the second feature group to be fused includes a second comparison feature and a third comparison feature of the target user i in each time window;
the information feature extraction unit is configured to: and calculating similarity values between the second comparison features on each time window and the third comparison features on each time window, and determining the similarity values as target feature values corresponding to the second feature group to be fused.
18. The apparatus according to claim 15 or 16, wherein the U feature groups to be fused include a third feature group to be fused, and the third feature group to be fused includes a fourth comparison feature of the target user i in each time window;
the information feature extraction unit is configured to: and calculating the sum of the feature values of the fourth comparison features on each time window, and determining the ratio of the sum of the feature values to the number N of the historical messages as a target feature value corresponding to the third feature group to be fused.
19. The apparatus according to claim 18, wherein the user communication identification determining unit is configured to:
and sequentially inputting the target feature sets corresponding to the target users determined by the information feature extraction unit into a preset classification model, and determining the target probability corresponding to each target user based on the classification result of the target feature set corresponding to each target user by the classification model to which the target feature set belongs.
20. The apparatus according to claim 19, wherein the user communication identification determining unit is configured to:
and determining the target user corresponding to the maximum target probability in the target probabilities corresponding to the target users as the target user uniquely associated with the target account.
21. A computer-readable storage medium, characterized in that the computer-readable storage medium stores program instructions that, when executed on a computer, cause the computer to perform the method of any one of claims 1-10.
22. An electronic device, comprising a memory configured to store program code, and a processor configured to invoke the program code stored by the memory to perform the method of any of claims 1-10.
CN201811653353.6A 2018-12-29 2018-12-29 Method and device for determining user communication identifier Active CN111385136B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811653353.6A CN111385136B (en) 2018-12-29 2018-12-29 Method and device for determining user communication identifier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811653353.6A CN111385136B (en) 2018-12-29 2018-12-29 Method and device for determining user communication identifier

Publications (2)

Publication Number Publication Date
CN111385136A CN111385136A (en) 2020-07-07
CN111385136B true CN111385136B (en) 2023-01-06

Family

ID=71221249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811653353.6A Active CN111385136B (en) 2018-12-29 2018-12-29 Method and device for determining user communication identifier

Country Status (1)

Country Link
CN (1) CN111385136B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107800608A (en) * 2016-09-05 2018-03-13 腾讯科技(深圳)有限公司 A kind of processing method and processing device of user profile

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108076018A (en) * 2016-11-16 2018-05-25 阿里巴巴集团控股有限公司 Identity authorization system, method, apparatus and account authentication method
CN108171519A (en) * 2016-12-07 2018-06-15 阿里巴巴集团控股有限公司 The processing of business datum, account recognition methods and device, terminal
CN107665442B (en) * 2017-05-10 2020-03-27 平安科技(深圳)有限公司 Method and device for acquiring target user
CN107656918B (en) * 2017-05-10 2019-07-05 平安科技(深圳)有限公司 Obtain the method and device of target user
CN107563429B (en) * 2017-07-27 2020-11-10 国家计算机网络与信息安全管理中心 Method and device for classifying network user groups

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107800608A (en) * 2016-09-05 2018-03-13 腾讯科技(深圳)有限公司 A kind of processing method and processing device of user profile

Also Published As

Publication number Publication date
CN111385136A (en) 2020-07-07

Similar Documents

Publication Publication Date Title
CN112861648B (en) Character recognition method, character recognition device, electronic equipment and storage medium
KR102002024B1 (en) Method for processing labeling of object and object management server
CN113127633B (en) Intelligent conference management method and device, computer equipment and storage medium
CN112200067B (en) Intelligent video event detection method, system, electronic equipment and storage medium
US20150161278A1 (en) Method and apparatus for identifying webpage type
CN107153716B (en) Webpage content extraction method and device
CN111107423A (en) Video service playing card pause identification method and device
CN101183458A (en) Picture validation code generating method and device
CN104052737A (en) Network data message processing method and device
CN112989348A (en) Attack detection method, model training method, device, server and storage medium
CN112770129A (en) Live broadcast-based group chat establishing method, related device, equipment and medium
US9665574B1 (en) Automatically scraping and adding contact information
CN110807050B (en) Performance analysis method, device, computer equipment and storage medium
CN115883187A (en) Method, device, equipment and medium for identifying abnormal information in network traffic data
CN111586695A (en) Short message identification method and related equipment
CN109698798B (en) Application identification method and device, server and storage medium
CN104317847A (en) Method and system for identifying languages in network text information
CN112286815A (en) Interface test script generation method and related equipment thereof
CN111444364B (en) Image detection method and device
CN111385136B (en) Method and device for determining user communication identifier
CN108985059B (en) Webpage backdoor detection method, device, equipment and storage medium
CN116192527A (en) Attack flow detection rule generation method, device, equipment and storage medium
CN115168755A (en) Abnormal data processing method and system based on URL (Uniform resource locator) characteristics
CN115774762A (en) Instant messaging information processing method, device, equipment and storage medium
CN111800391B (en) Port scanning attack detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant