CN113366469A

CN113366469A - Data classification method and related product

Info

Publication number: CN113366469A
Application number: CN201980089586.4A
Authority: CN
Inventors: 郭子亮
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Priority date: 2019-06-29
Filing date: 2019-06-29
Publication date: 2021-09-07
Also published as: WO2021000084A1

Abstract

The embodiment of the application discloses a data classification method and a related product, wherein the method comprises the following steps: acquiring application data of a target application of a target object, and acquiring a target user ID of the target object; ID extraction is carried out on the application data to obtain a plurality of user IDs and associated data corresponding to each user ID; performing barrel division processing on the associated data of the user IDs through a locality sensitive hashing algorithm to obtain a plurality of barrels, wherein each barrel comprises the associated data of at least one user ID; and carrying out group division on the plurality of user IDs based on the associated data of the user IDs in the plurality of buckets to obtain a plurality of groups. By adopting the method and the device, the friend classification efficiency can be improved, and the user experience is improved.

Description

Data classification method and related product

Technical Field

The present application relates to the field of communications technologies, and in particular, to a data classification method and a related product.

Background

With the widespread use of electronic devices (such as mobile phones, tablet computers, etc.), the electronic devices have more and more applications and more powerful functions, and the electronic devices are developed towards diversification and personalization, and become indispensable electronic products in the life of users.

At present, social applications are widely applied to mobile phones, but in the using process, a user needs to classify friends, the classification efficiency is low, and the user experience is reduced.

Disclosure of Invention

The embodiment of the application provides a data classification method and a related product, which can improve friend classification efficiency and improve user experience.

In a first aspect, a data classification method in an embodiment of the present application includes:

acquiring application data of a target application of a target object, and acquiring a target user ID of the target object;

ID extraction is carried out on the application data to obtain a plurality of user IDs and associated data corresponding to each user ID;

performing barrel division processing on the associated data of the user IDs through a locality sensitive hashing algorithm to obtain a plurality of barrels, wherein each barrel comprises the associated data of at least one user ID;

and carrying out group division on the plurality of user IDs based on the associated data of the user IDs in the plurality of buckets to obtain a plurality of groups.

In a second aspect, an embodiment of the present application provides a data classification apparatus, where the apparatus includes:

an acquisition unit configured to acquire application data of a target application of a target object and acquire a target user ID of the target object;

the extraction unit is used for extracting the ID of the application data to obtain a plurality of user IDs and associated data corresponding to each user ID;

the bucket dividing processing unit is used for carrying out bucket dividing processing on the associated data of the user IDs through a locality sensitive hash algorithm to obtain a plurality of buckets, and each bucket comprises the associated data of at least one user ID;

and the dividing unit is used for dividing the plurality of user IDs into groups based on the associated data of the user IDs in the plurality of buckets to obtain a plurality of groups.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the program includes instructions for executing the steps in the first aspect of the embodiment of the present application.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program enables a computer to perform some or all of the steps described in the first aspect of the embodiment of the present application.

In a fifth aspect, embodiments of the present application provide a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, where the computer program is operable to cause a computer to perform some or all of the steps as described in the first aspect of the embodiments of the present application. The computer program product may be a software installation package.

Drawings

Reference will now be made in brief to the drawings that are needed in describing embodiments or prior art.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1A is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;

FIG. 1B is a schematic flow chart diagram illustrating a data classification method disclosed in an embodiment of the present application;

FIG. 1C is a schematic diagram illustrating a data classification method disclosed in an embodiment of the present application;

FIG. 1D is a schematic diagram illustrating a user portrait configuration according to an embodiment of the present disclosure;

FIG. 1E is a schematic diagram illustrating a locality sensitive hashing algorithm disclosed in an embodiment of the present application;

FIG. 1F is another schematic illustration of a locality sensitive hashing algorithm disclosed in an embodiment of the present application;

FIG. 2 is a schematic flow chart diagram of another data classification method disclosed in the embodiments of the present application;

FIG. 3 is a schematic flow chart diagram of another data classification method disclosed in the embodiments of the present application;

fig. 4 is a schematic structural diagram of another electronic device disclosed in the embodiments of the present application;

fig. 5A is a schematic structural diagram of a data classification apparatus disclosed in an embodiment of the present application;

FIG. 5B is a schematic structural diagram of another data classification apparatus disclosed in the embodiments of the present application;

fig. 5C is a schematic structural diagram of another data classification apparatus disclosed in the embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The electronic device related to the embodiment of the present application may include various handheld devices, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to a wireless modem, and various forms of User Equipment (UE), Mobile Stations (MS), smart home devices (smart tv, smart air conditioner, smart range hood, smart fan, smart wheelchair, smart dining table, etc.), and the like. For convenience of description, the above-mentioned devices are collectively referred to as electronic devices, and the electronic devices may also be servers, service platforms, and the like.

The following describes embodiments of the present application in detail.

Referring to fig. 1A, fig. 1A is a schematic structural diagram of an electronic device disclosed in an embodiment of the present application, and the electronic device 100 may include a control circuit, which may include a storage and processing circuit 110. The storage and processing circuitry 110 may be a memory, such as a hard drive memory, a non-volatile memory (e.g., flash memory or other electronically programmable read-only memory used to form a solid state drive, etc.), a volatile memory (e.g., static or dynamic random access memory, etc.), etc., and the embodiments of the present application are not limited thereto. Processing circuitry in storage and processing circuitry 110 may be used to control the operation of electronic device 100. The processing circuit may be implemented based on one or more microprocessors, microcontrollers, baseband processors, power management units, audio codec chips, application specific integrated circuits, display driver integrated circuits, and the like.

The storage and processing circuitry 110 may be used to run software in the electronic device 100, such as an internet browsing application, a Voice Over Internet Protocol (VOIP) telephone call application, an email application, a media playing application, operating system functions, and so forth. Such software may be used to perform control operations such as, for example, camera-based image capture, ambient light measurement based on an ambient light sensor, proximity sensor measurement based on a proximity sensor, information display functionality based on status indicators such as status indicator lights of light emitting diodes, touch event detection based on a touch sensor, functionality associated with displaying information on multiple (e.g., layered) displays, operations associated with performing wireless communication functions, operations associated with collecting and generating audio signals, control operations associated with collecting and processing button press event data, and other functions in the electronic device 100, and the like, without limitation of embodiments of the present application.

The electronic device 100 may also include input-output circuitry 150. The input-output circuit 150 may be used to enable the electronic device 100 to input and output data, i.e., to allow the electronic device 100 to receive data from an external device and also to allow the electronic device 100 to output data from the electronic device 100 to the external device. The input-output circuit 150 may further include a sensor 170. The sensors 170 may include ambient light sensors, proximity sensors based on light and capacitance, touch sensors (e.g., based on optical touch sensors and/or capacitive touch sensors, where the touch sensors may be part of a touch display screen or used independently as a touch sensor structure), acceleration sensors, gravity sensors, and other sensors, among others.

Input-output circuitry 150 may also include one or more displays, such as display 130. Display 130 may include one or a combination of liquid crystal displays, organic light emitting diode displays, electronic ink displays, plasma displays, displays using other display technologies. Display 130 may include an array of touch sensors (i.e., display 130 may be a touch display screen). The touch sensor may be a capacitive touch sensor formed by a transparent touch sensor electrode (e.g., an Indium Tin Oxide (ITO) electrode) array, or may be a touch sensor formed using other touch technologies, such as acoustic wave touch, pressure sensitive touch, resistive touch, optical touch, and the like, and the embodiments of the present application are not limited thereto.

The audio component 140 may be used to provide audio input and output functionality for the electronic device 100. The audio components 140 in the electronic device 100 may include a speaker, a microphone, a buzzer, a tone generator, and other components for generating and detecting sound.

The communication circuit 120 may be used to provide the electronic device 100 with the capability to communicate with external devices. The communication circuit 120 may include analog and digital input-output interface circuits, and wireless communication circuits based on radio frequency signals and/or optical signals. The wireless communication circuitry in communication circuitry 120 may include radio-frequency transceiver circuitry, power amplifier circuitry, low noise amplifiers, switches, filters, and antennas. For example, the wireless communication circuitry in communication circuitry 120 may include circuitry to support Near Field Communication (NFC) by transmitting and receiving near field coupled electromagnetic signals. For example, the communication circuit 120 may include a near field communication antenna and a near field communication transceiver. The communications circuitry 120 may also include a cellular telephone transceiver and antenna, a wireless local area network transceiver circuitry and antenna, and so forth.

The electronic device 100 may further include a battery, power management circuitry, and other input-output units 160. The input-output unit 160 may include buttons, joysticks, click wheels, scroll wheels, touch pads, keypads, keyboards, cameras, light emitting diodes and other status indicators, and the like.

A user may input commands through input-output circuitry 150 to control the operation of electronic device 100, and may use output data of input-output circuitry 150 to enable receipt of status information and other outputs from electronic device 100.

Referring to fig. 1B, fig. 1B is a schematic flow chart of a data classification method according to an embodiment of the present disclosure, where the data transmission method described in the embodiment is applied to the electronic device shown in fig. 1A, and the data classification method includes:

101. the method comprises the steps of obtaining application data of a target application of a target object, and obtaining a target user ID of the target object.

The target object can be understood as an owner or other users. The target application may be at least one of: video applications, social applications, instant messaging applications, shopping applications, payment applications, gaming applications, navigation applications, photography applications, financial applications, and the like, without limitation. The target application may be one application or a class of applications, the target application may include one or more applications, and the target application may be a third party application or a system application. In an embodiment of the present application, the application data may include at least one of: registering application data, application cache data, or instant messaging data, etc., which are not limited herein, for example, the application data may include: the method comprises the steps of obtaining a user ID of a user identity such as a cookie of a user, an APP terminal browsing behavior identification ID and an account ID, wherein the property of the user ID of the user identity can be an equipment hardware ID or a character identification.

Certainly, the electronic equipment may be used by multiple persons, a multi-dimensional feature layer and an ID-mapping relation layer may be constructed by integrating equipment IMEI, SSOID, openid, user location data, internet behavior data and the like, and the natural person identification layer completes accurate identification of the natural person by using a multi-code relation trusted identification filtering algorithm and a graph communication algorithm, so that the owner can be accurately identified, and the owner still uses the electronic equipment most of the time after all.

In one possible example, the step 101 of obtaining the application data of the target application of the target object may include the following steps:

11. acquiring at least one user ID of a target application of the target object;

12. and acquiring application data of the target application of the target object in a preset time period from a preset database according to the at least one user ID.

The preset time period may be set by a user or default by a system, where the preset time period may be understood as a time period of using the electronic device recently, or a time period from registration of any user ID in the at least one user ID to current time, and the target object may be an owner, in this embodiment, the user ID may be at least one of the following: a phone number, an Integrated Circuit Card Identity (ICCID), an International Mobile Equipment Identity (IMEI), a Single Sign On ID (SSOID), an ID of a third-party application, an openid, and the like, which are not limited herein.

Further, the electronic device may obtain at least one user ID of a used target application in the electronic device, and further, target application data in the electronic device may be determined according to the at least one user ID, and of course, all data related to the target application may be stored in the electronic device, for example, cache data, application running state data, and the like.

Further, when the at least one user ID is a natural person ID, before the step 101, the following steps may be further included:

a1, acquiring historical use data of the target application of the electronic equipment corresponding to the target object;

a2, constructing a multi-dimensional feature layer and an ID-mapping relation layer according to the historical use data;

a3, determining a natural person ID according to the multi-dimensional feature layer and the ID-mapping relation layer, and taking the natural person ID as the target user ID.

The historical user data may be understood as usage data corresponding to a current time when a user uses the target application in the electronic device from the first time, or all usage data corresponding to at least one user ID of the target object, and the historical usage data may include at least one of the following: registering application data, application cache data, or instant messaging data, etc., which are not limited herein, for example, the application data may include: the user data may also be at least one of the following user data, where the user data includes a user ID of a user identity such as a cookie of the user, an APP side browsing behavior identification ID, and an account ID: CPU working frequency, CPU core number, CPU working mode, GPU frame rate, GPU resolution, device brightness, device sound, partial parameters or all parameters in memory parameters. The user ID of the user ID may be a device hardware ID or a character identifier.

In a specific implementation, as shown in fig. 1C, the electronic device may obtain historical usage data of a target application corresponding to a target object, where the historical usage data may be obtained from a data source, and the data source may include at least one of the following: browsers, software stores, account systems, grand data, shopping data, communication data, gaming data, social data, office data, smart home data, and the like, are not limited thereto. ID-MAPPing relationship layer data may be obtained from the historical usage data, and may include at least one of: OSSID < - > IMEI (mapping relationship between OSSID and IMEI), TEL < - > IMEI, OppenId < - > ICCID, etc., which are not limited herein, and multidimensional feature layer data can be obtained according to historical usage data, and the multidimensional feature layer data can include at least one of the following: device features, APP features, location features, and the like, without limitation, each of the natural person IDs may correspond to a user representation according to the multidimensional feature layer and the ID-mapping relationship layer, as shown in fig. 1D, and the user representation may include at least one of the following: demographic attributes, geographic relationships, hobbies, equipment attributes, asset conditions, business interests, etc., without limitation.

Additionally, the device features may include at least one of: the device attributes (e.g., device daily dotting, model configuration, activation date, etc.), network connection conditions (e.g., WIFI connection, network IP, base station, connectivity distribution, etc.), ID attributes (e.g., ID format, character length, etc.), etc., which are not limited herein. APP features may include at least one of: APP installation, start-up, uninstallation, APP type preferences (e.g., games, applications), APP periods of constant activity (weekdays, holidays, etc.), and the like, without limitation, the positioning features may include at least one of: location attributes (e.g., home or business, resident business, frequent), travel preferences (e.g., mode of travel, time of travel, frequency of travel, trajectory of travel, etc.), POI preferences (POI arrival, POI search).

102. And performing ID extraction on the application data to obtain a plurality of user IDs and associated data corresponding to each user ID.

The application data may include a plurality of user IDs, that is, when the user uses the device to communicate with another user, the application data may record the user ID of the another user. The association data may be at least one of: user ratings, point consumption, liveness, preference type, time online, operating habits, number of communications, time of communications, user ID, and the like.

In a possible example, in step 102, extracting IDs of the application data to obtain a plurality of user IDs, and obtaining associated data corresponding to each ID to obtain a plurality of user IDs, may include the following steps:

21. searching the application data according to preset ID keywords to obtain a plurality of IDs;

22. integrating the plurality of IDs, wherein each user ID corresponds to a natural person;

23. and acquiring the associated data corresponding to the user IDs from the application data to obtain the associated data corresponding to each user ID in the user IDs.

The preset ID keyword is understood as a keyword in a specific format, for example, a user name: xxx, then xxx is the key, and the specific format may be defaulted by the system. In a specific implementation, the electronic device may search the application data according to a preset ID keyword to obtain a plurality of IDs, may further integrate the plurality of IDs to obtain a plurality of user IDs, where a specific integration algorithm may be a clustering algorithm, a locality sensitive hash algorithm, or the like, which is not limited herein, and then may obtain associated data corresponding to the plurality of user IDs from the application data, where the associated data may be understood as data related to the user IDs, so as to obtain associated data corresponding to each user ID in the plurality of user IDs.

103. And performing barrel division processing on the associated data of the plurality of user IDs through a locality sensitive hashing algorithm to obtain a plurality of barrels, wherein each barrel comprises the associated data of at least one user ID.

Among them, locality sensitive hashing is the most popular one of approximate nearest neighbor search algorithms, which has a solid theoretical basis and performs well in a high-dimensional data space. The LSH algorithm is based on an assumption that if two texts are similar in the original data space, the two texts are also highly similar after being respectively converted by a hash function; conversely, if they themselves are dissimilar, they should still not have similarity after conversion.

In a specific implementation, the electronic device may perform bucket dividing processing on the associated data of the plurality of user IDs through a local hash sensitivity algorithm to obtain a plurality of buckets, where each bucket corresponds to the associated data of at least one user ID.

For example, the basic idea of Locality-Sensitive Hashing (LSH) is to hash data into buckets with a series of functions, such that data points close to each other are highly likely to be in the same bucket, while data points far away from each other are likely to be in different buckets. Taking the calculation of the intimacy between users as an example, if the users with higher intimacy are all classified into the same bucket with a higher probability, as shown in fig. 1E: 1. 2, 3, 4, 5 may represent 5 different user IDs. Further, as shown in fig. 1F, after the separation of the barrels, 1, 2 may be placed in one barrel, 3, 4 may be placed in one barrel, and 5 may be placed in one barrel separately. The 'user set' where each user is located is relatively small, the complexity of calculating the intimacy of the user sets in the bucket can be reduced only by calculating the intimacy of the user sets in the bucket, and friends of the users can be classified according to the corresponding intimacy through sorting and some rules.

104. And carrying out group division on the plurality of user IDs based on the associated data of the user IDs in the plurality of buckets to obtain a plurality of groups.

Because the user IDs in each bucket have certain relevance, the user IDs can be further divided, and accurate groups can be obtained.

In a possible example, the step 104, dividing the plurality of user IDs into groups based on the association data of the user IDs in the plurality of buckets to obtain a plurality of groups, may include the following steps:

41. determining the association degree between each user ID and the target user ID according to the association data of each user ID in an ith bucket to obtain a plurality of association degrees, wherein the ith bucket is any one of the buckets;

42. selecting the association degrees larger than a preset threshold value from the association degrees to obtain at least one target association degree;

43. and taking the user ID corresponding to the at least one target association degree as a group.

Wherein, the preset threshold value can be set by the user or the default of the system. Taking the ith bucket as an example, where the ith bucket is any one of the buckets, the electronic device may determine, according to the association data of each user ID in the ith bucket, the association degree between the user ID and the target user ID to obtain multiple association degrees, further select, from the multiple association degrees, an association degree greater than a preset threshold to obtain at least one target association degree, and may use an ID corresponding to the at least one target association degree as a group, that is, some user IDs with high similarity in each bucket as a group.

Further optionally, in step 41, determining a degree of association between each user ID in the ith bucket and the target user ID according to the association data of each user ID in the ith bucket to obtain a plurality of degrees of association, which may include the following steps:

411. acquiring associated data of a first user ID, wherein the first user ID is any one user ID in the ith bucket;

412. performing feature extraction on the associated data of the first user ID to obtain a target feature set;

413. and determining the association degree between the first user ID and the target user ID according to the target feature set.

Taking a first user ID as an example, where the first user ID is any one user ID in an ith bucket, the electronic device may acquire associated data of the first user ID, perform feature extraction on the first user ID, and the target feature set may include at least one of the following: the geographic location, the communication time period, the communication content, the communication times, and the like are not limited herein, and of course, the feature of each dimension may be represented by a feature value. Further, the degree of association between the first user ID and the target user ID may be determined according to the target feature set. For example, a weight value corresponding to each feature in the target feature set may be determined, and then a weighting operation may be performed based on each feature and the weight value corresponding thereto, so as to obtain an association degree between the first user ID and the target user ID.

In one possible example, the set of target features includes feature values for a plurality of dimensions; the step 413 of determining the association degree between the first user ID and the target user ID according to the target feature set may include the following steps:

b1, determining a weight value corresponding to each dimension in the characteristic values of the multiple dimensions to obtain the weight values of the multiple dimensions;

and B2, performing weighting operation according to the characteristic values of the multiple dimensions and the weight values of the multiple dimensions to obtain the association degree between the first user ID and the target user ID.

The target feature set may include feature sets of multiple dimensions, a weight value corresponding to a feature value of each dimension may be stored in the electronic device in advance, and then, a weight value corresponding to each dimension of the feature values of the multiple dimensions may be determined to obtain weight values of the multiple dimensions, and then, a weighting operation may be performed according to the feature values of the multiple dimensions and the weight values of the multiple dimensions to obtain an association degree between the first user ID and the target user ID.

In one possible example, the step 413 of determining the association degree between the first user ID and the target user ID according to the target feature set may include the following steps:

c1, separating a first feature set corresponding to the first user ID and a second feature set corresponding to the target user ID according to the target feature set;

c2, determining the association degree between the first user ID and the target user ID according to the first feature set and the second feature set.

The target feature set includes both the features of the first user ID and the features of the target user ID, so that the target feature set can be separated to obtain a first feature set corresponding to the first user ID and a second feature set corresponding to the target user ID, an intersection can exist between the first feature set and the second feature set, and both the first feature set and the second feature set can include at least one of the following features: user rating, point consumption, activity, preference type, online time, operation habit, communication times, communication time, user ID, etc., without limitation, and further, the association degree between the first user ID and the target user ID may be determined according to the first feature set and the second feature set.

For example, the association degree between the first user ID and the target user ID may be calculated by the euclidean distance as follows:

where p represents the first set of features, q represents the second set of features, and i represents either dimension.

For another example, the association degree between the first user ID and the target user ID may be calculated by the Jaccard distance, which is specifically as follows:

where p represents the first set of features and q represents the second set of features.

For another example, the association degree between the first user ID and the target user ID may be calculated by the cosine distance, which is specifically as follows:

In a possible example, each user ID corresponds to at least one tag, and after the step 104, the following steps may be further included:

d1, acquiring a label corresponding to the user ID in a group j to obtain a plurality of labels, wherein the group j is any one of the groups;

d2, taking the label with the most occurrence times in the plurality of labels as the group name of the group j.

In a specific implementation, each user ID may correspond to at least one tag, and the tag may be a user portrait tag. The label may be at least one of: the electronic device may further obtain a plurality of tags corresponding to the user ID in the group j, where the group j is any one of the plurality of groups, and further, the tag with the largest occurrence number in the plurality of tags may be used as the group name of the group j.

In the concrete implementation, in the internet era, for any data platform, it is a very happy matter to have a large number of users and the data that it produces, however, a lot of data also can bring some happy vexation, and on one hand, a large amount of data need a large amount of storage space, and on the other hand, to convert data into real data assets means a large amount of computing resources, if can't reasonable use resources, will produce very big loss to the enterprise. Based on the embodiment of the application, no matter user identification or user data classification, the similarity or intimacy calculation of a large number of users is a process requiring a large number of calculation resources, and due to the locality sensitive hashing algorithm, compared with a conventional method, the algorithm can remarkably reduce the required calculation resources and time.

Furthermore, the locality sensitive hashing algorithm is not only used in user identification and user data classification alone, but also applied to many fields needing similarity calculation, such as friend recommendation, document similarity and the like.

It can be seen that, the data classification method described in the foregoing embodiment of the present application obtains application data of a target application of a target object, obtains a target user ID of the target object, performs ID extraction on the application data to obtain a plurality of user IDs and associated data corresponding to each user ID, performs bucket dividing on the associated data of the plurality of user IDs by using a locality sensitive hashing algorithm to obtain a plurality of buckets, each bucket including associated data of at least one user ID, performs group dividing on the plurality of user IDs based on the associated data of the user IDs in the plurality of buckets to obtain a plurality of groups, so that the plurality of user IDs can be extracted from the application data, and performs bucket dividing on the plurality of user IDs by using the locality sensitive hashing algorithm, and finally performs group dividing based on the ID in each bucket, which can reduce the complexity of calculation, save corresponding time and calculation resources, the data classification efficiency is improved.

In accordance with the above, please refer to fig. 2, fig. 2 is a schematic flow chart of another data classification method provided in the embodiment of the present application, and the data classification method described in the embodiment is applied to the electronic device shown in fig. 1A, and the method may include the following steps:

201. the method comprises the steps of obtaining application data of a target application of a target object, and obtaining a target user ID of the target object.

202. And performing ID extraction on the application data to obtain a plurality of user IDs and associated data corresponding to each user ID.

203. And performing barrel division processing on the associated data of the plurality of user IDs through a locality sensitive hashing algorithm to obtain a plurality of barrels, wherein each barrel comprises the associated data of at least one user ID.

204. And carrying out group division on the plurality of user IDs based on the associated data of the user IDs in the plurality of buckets to obtain a plurality of groups.

205. And acquiring a label corresponding to the user ID in a group j to obtain a plurality of labels, wherein the group j is any one of the groups.

206. And taking the label with the largest occurrence number in the plurality of labels as the group name of the group j.

The specific implementation process of the steps 201-206 can refer to the corresponding description in the method shown in fig. 1B, and will not be described herein again.

It can be seen that the data classification method described in the above embodiment of the present application obtains application data of a target application of a target object, obtains a target user ID of the target object, performs ID extraction on the application data to obtain a plurality of user IDs and associated data corresponding to each user ID, performs bucket division on the associated data of the plurality of user IDs by using a locality sensitive hash algorithm to obtain a plurality of buckets, each bucket includes associated data of at least one user ID, performs group division on the plurality of user IDs based on the associated data of the user IDs in the plurality of buckets to obtain a plurality of groups, obtains tags corresponding to the user IDs in a group j to obtain a plurality of tags, the group j is any one of the plurality of groups, and takes the tag with the largest occurrence frequency in the plurality of tags as the group name of the group j, so that the plurality of user IDs can be extracted from the application data, and the plurality of user IDs are subjected to barrel division through a locality sensitive Hash algorithm, and finally, group division is carried out based on the IDs in each barrel, so that the group can be named, the calculation complexity can be reduced, the corresponding time and calculation resources are saved, and the data classification efficiency is improved.

In accordance with the above, please refer to fig. 3, which is a schematic flow chart of another data classification method according to an embodiment of the present application, where the data classification method described in the present embodiment is applied to the electronic device shown in fig. 1A, and the method includes the following steps:

301. and acquiring historical use data of the target application of the electronic equipment corresponding to the target object.

302. And constructing a multi-dimensional feature layer and an ID-mapping relation layer according to the historical use data.

303. And determining a natural person ID according to the multi-dimensional feature layer and the ID-mapping relation layer, and taking the natural person ID as the target user ID.

304. Acquiring application data of a target application of the target object, and acquiring a target user ID of the target object.

305. And performing ID extraction on the application data to obtain a plurality of user IDs and associated data corresponding to each user ID.

306. And performing barrel division processing on the associated data of the plurality of user IDs through a locality sensitive hashing algorithm to obtain a plurality of barrels, wherein each barrel comprises the associated data of at least one user ID.

307. And carrying out group division on the plurality of user IDs based on the associated data of the user IDs in the plurality of buckets to obtain a plurality of groups.

The specific implementation process of steps 301-307 can refer to the corresponding description in the method shown in fig. 1B, and is not described herein again.

It can be seen that, the data classification method described in the embodiment of the present application may first obtain the natural ID of the user object, obtain the application data of the target application of the target object based on the natural ID, perform ID extraction on the application data to obtain a plurality of user IDs and associated data corresponding to each user ID, perform bucket division on the associated data of the plurality of user IDs through a locality sensitive hash algorithm to obtain a plurality of buckets, each bucket including associated data of at least one user ID, perform group division on the plurality of user IDs based on the associated data of the user IDs in the plurality of buckets to obtain a plurality of groups, so that the plurality of user IDs may be extracted from the application data, and perform bucket division on the plurality of user IDs through the locality sensitive hash algorithm, and finally perform group division based on the IDs in each bucket, which may reduce the complexity of calculation, save corresponding time and calculation resources, the data classification efficiency is improved.

In accordance with the above, please refer to fig. 4, in which fig. 4 is an electronic device according to an embodiment of the present application, including: a processor and a memory; and one or more programs stored in the memory and configured to be executed by the processor, the programs including instructions for performing the steps of:

It can be seen that, the electronic device described in the embodiment of the present application obtains application data of a target application of a target object, obtains a target user ID of the target object, performs ID extraction on the application data to obtain a plurality of user IDs, and associated data corresponding to each user ID, performs bucket dividing on the associated data of the plurality of user IDs through a locality sensitive hashing algorithm to obtain a plurality of buckets, each bucket includes associated data of at least one user ID, performs group dividing on the plurality of user IDs based on the associated data of the user ID in the plurality of buckets to obtain a plurality of groups, so that the plurality of user IDs can be extracted from the application data, and performs bucket dividing on the plurality of user IDs through the locality sensitive hashing algorithm, and finally performs group dividing based on the ID in each bucket, thereby reducing complexity of calculation, saving corresponding time and calculation resources, the data classification efficiency is improved.

In one possible example, in the ID extraction of the application data to obtain a plurality of user IDs, and the associated data corresponding to each ID to obtain a plurality of user IDs, the program includes instructions for performing the following steps:

searching the application data according to preset ID keywords to obtain a plurality of IDs;

integrating the plurality of IDs, wherein each user ID corresponds to a natural person;

and acquiring the associated data corresponding to the user IDs from the application data to obtain the associated data corresponding to each user ID in the user IDs.

In one possible example, in the grouping the plurality of user IDs into groups based on the association data for the user IDs in the plurality of buckets, resulting in a plurality of groups, the program includes instructions for:

determining the association degree between each user ID and the target user ID according to the association data of each user ID in an ith bucket to obtain a plurality of association degrees, wherein the ith bucket is any one of the buckets;

selecting the association degrees larger than a preset threshold value from the association degrees to obtain at least one target association degree;

and taking the user ID corresponding to the at least one target association degree as a group.

In one possible example, in the aspect that the association degree between each user ID in the ith bucket and the target user ID is determined according to the association data of the user ID, a plurality of association degrees are obtained, the program includes instructions for executing the following steps:

acquiring associated data of a first user ID, wherein the first user ID is any one user ID in the ith bucket;

performing feature extraction on the associated data of the first user ID to obtain a target feature set;

and determining the association degree between the first user ID and the target user ID according to the target feature set.

In one possible example, the set of target features includes feature values for a plurality of dimensions;

in said determining a degree of association between said first user ID and said target user ID from said set of target features, said program comprises instructions for:

determining a weight value corresponding to each dimension in the characteristic values of the multiple dimensions to obtain the weight values of the multiple dimensions;

and performing weighting operation according to the characteristic values of the dimensions and the weight values of the dimensions to obtain the association degree between the first user ID and the target user ID.

In one possible example, in said determining a degree of association between said first user ID and said target user ID in terms of said set of target characteristics, said program comprises instructions for performing the steps of:

separating a first feature set corresponding to the first user ID and a second feature set corresponding to the target user ID according to the target feature set;

and determining the association degree between the first user ID and the target user ID according to the first feature set and the second feature set.

In one possible example, in connection with the obtaining application data for a target application for a target object, the program includes instructions for performing the steps of:

acquiring at least one user ID of a target application of the target object;

and acquiring application data of the target application of the target object in a preset time period from a preset database according to the at least one user ID.

In one possible example, when the at least one user ID is a natural person ID, the program further includes instructions for performing the steps of:

acquiring historical use data of the target application of the electronic equipment corresponding to the target object;

constructing a multi-dimensional feature layer and an ID-mapping relation layer according to the historical use data;

and determining a natural person ID according to the multi-dimensional feature layer and the ID-mapping relation layer, and taking the natural person ID as the target user ID.

In one possible example, each user ID corresponds to at least one tag, the program further comprising instructions for:

acquiring a label corresponding to a user ID in a group j to obtain a plurality of labels, wherein the group j is any one of the groups;

and taking the label with the largest occurrence number in the plurality of labels as the group name of the group j.

The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that the electronic device comprises corresponding hardware structures and/or software modules for performing the respective functions in order to realize the above-mentioned functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the electronic device may be divided into the functional units according to the method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

Referring to fig. 5A, fig. 5A is a schematic structural diagram of a data classification apparatus provided in the present embodiment. The data classification apparatus is applied to the electronic device shown in fig. 1A, and comprises an acquisition unit 501, an extraction unit 502, a bucket processing unit 503 and a dividing unit 504, wherein,

an obtaining unit 501, configured to obtain application data of a target application of a target object, and obtain a target user ID of the target object;

an extracting unit 502, configured to perform ID extraction on the application data to obtain a plurality of user IDs and associated data corresponding to each user ID;

a bucket dividing processing unit 503, configured to perform bucket dividing processing on the associated data of the multiple user IDs through a locality sensitive hash algorithm to obtain multiple buckets, where each bucket includes associated data of at least one user ID;

a dividing unit 504, configured to divide the multiple user IDs into groups based on the associated data of the user IDs in the multiple buckets, so as to obtain multiple groups.

It can be seen that the data classifying device described in the embodiment of the present application obtains the application data of the target application of the target object, obtains the target user ID of the target object, performs ID extraction on the application data to obtain a plurality of user IDs and associated data corresponding to each user ID, performs bucket dividing processing on the associated data of the plurality of user IDs through the locality sensitive hashing algorithm to obtain a plurality of buckets, each bucket includes associated data of at least one user ID, performs group dividing on the plurality of user IDs based on the associated data of the user IDs in the plurality of buckets to obtain a plurality of groups, so that the plurality of user IDs can be extracted from the application data, and performs bucket dividing processing on the plurality of user IDs through the locality sensitive hashing algorithm, and finally performs group dividing based on the ID in each bucket, which can reduce the complexity of calculation, save corresponding time and calculation resources, the data classification efficiency is improved.

In a possible example, in the aspect of extracting IDs from the application data to obtain a plurality of user IDs, and obtaining a plurality of user IDs from associated data corresponding to each ID, the extracting unit 502 is specifically configured to:

In a possible example, in terms of dividing the plurality of user IDs into groups based on the association data of the user IDs in the plurality of buckets to obtain a plurality of groups, the dividing unit 504 is specifically configured to:

In a possible example, in the aspect that the association degree between the user ID and the target user ID is determined according to the association data of each user ID in the ith bucket, so as to obtain multiple association degrees, the dividing unit 504 is specifically configured to:

in respect of said determining a degree of association between said first user ID and said target user ID in dependence of said target feature set, said partitioning unit 504 is collectively configured to:

In one possible example, in the aspect of determining the association degree between the first user ID and the target user ID according to the target feature set, the dividing unit 504 is specifically configured to:

In one possible example, in terms of acquiring application data of a target application of a target object, the acquiring unit 501 is specifically configured to:

acquiring at least one user ID of a target application of the target object;

In one possible example, as shown in fig. 5B, fig. 5B is a further modified structure of the data classification method shown in fig. 5A, which, compared with fig. 5A, further includes: the establishing unit 505 and the determining unit 506 are as follows:

the obtaining unit 501 is further configured to obtain historical usage data of the target application of the electronic device corresponding to the target object;

the establishing unit 505 is configured to establish a multi-dimensional feature layer and an ID-mapping relationship layer according to the historical usage data;

the determining unit 506 is configured to determine a natural person ID according to the multi-dimensional feature layer and the ID-mapping relationship layer, and use the natural person ID as the target user ID.

In a possible example, each user ID corresponds to at least one tag, as shown in fig. 5C, fig. 5C is a further apparatus of the data classification method shown in fig. 5A, which, compared with fig. 5A, further includes: the selecting unit 507 is specifically as follows:

the obtaining unit 501 is configured to obtain a tag corresponding to a user ID in a group j, to obtain multiple tags, where the group j is any one of the multiple groups;

the selecting unit 507 is configured to use a label with the largest occurrence number in the plurality of labels as the group name of the group j.

It can be understood that the functions of each program module of the data classification apparatus in this embodiment may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.

Embodiments of the present application also provide a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any one of the data transmission methods as described in the above method embodiments.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the data transmission methods as recited in the above method embodiments.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.

The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a read-only memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and the like.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash disk, ROM, RAM, magnetic or optical disk, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

A method of data classification, comprising:

acquiring application data of a target application of a target object, and acquiring a target user ID of the target object;

ID extraction is carried out on the application data to obtain a plurality of user IDs and associated data corresponding to each user ID;

performing barrel division processing on the associated data of the user IDs through a locality sensitive hashing algorithm to obtain a plurality of barrels, wherein each barrel comprises the associated data of at least one user ID;

and carrying out group division on the plurality of user IDs based on the associated data of the user IDs in the plurality of buckets to obtain a plurality of groups.
The method of claim 1, wherein the ID extracting the application data to obtain a plurality of user IDs and associated data corresponding to each ID to obtain a plurality of user IDs comprises:

searching the application data according to preset ID keywords to obtain a plurality of IDs;

integrating the plurality of IDs, wherein each user ID corresponds to a natural person;

and acquiring the associated data corresponding to the user IDs from the application data to obtain the associated data corresponding to each user ID in the user IDs.
The method of claim 1 or 2, wherein the grouping the plurality of user IDs based on the association data of the user IDs in the plurality of buckets into a plurality of groups comprises:

determining the association degree between each user ID and the target user ID according to the association data of each user ID in an ith bucket to obtain a plurality of association degrees, wherein the ith bucket is any one of the buckets;

selecting the association degrees larger than a preset threshold value from the association degrees to obtain at least one target association degree;

and taking the user ID corresponding to the at least one target association degree as a group.
The method of claim 3, wherein determining the association between the user ID and the target user ID according to the association data of each user ID in the ith bucket to obtain a plurality of associations comprises:

acquiring associated data of a first user ID, wherein the first user ID is any one user ID in the ith bucket;

performing feature extraction on the associated data of the first user ID to obtain a target feature set;

and determining the association degree between the first user ID and the target user ID according to the target feature set.
The method of claim 4, wherein the set of target features comprises feature values for a plurality of dimensions;

the determining the degree of association between the first user ID and the target user ID according to the target feature set includes:

determining a weight value corresponding to each dimension in the characteristic values of the multiple dimensions to obtain the weight values of the multiple dimensions;

and performing weighting operation according to the characteristic values of the dimensions and the weight values of the dimensions to obtain the association degree between the first user ID and the target user ID.
The method of claim 4, wherein determining the degree of association between the first user ID and the target user ID based on the set of target characteristics comprises:

separating a first feature set corresponding to the first user ID and a second feature set corresponding to the target user ID according to the target feature set;

and determining the association degree between the first user ID and the target user ID according to the first feature set and the second feature set.
The method according to any one of claims 1-6, wherein the obtaining application data of the target application of the target object comprises:

acquiring at least one user ID of a target application of the target object;

and acquiring application data of the target application of the target object in a preset time period from a preset database according to the at least one user ID.
The method of claim 7, wherein when the at least one user ID is a natural human ID, the method further comprises:

acquiring historical use data of the target application of the electronic equipment corresponding to the target object;

constructing a multi-dimensional feature layer and an ID-mapping relation layer according to the historical use data;

and determining a natural person ID according to the multi-dimensional feature layer and the ID-mapping relation layer, and taking the natural person ID as the target user ID.
The method of any one of claims 1-8, wherein each user ID corresponds to at least one tag, the method further comprising:

acquiring a label corresponding to a user ID in a group j to obtain a plurality of labels, wherein the group j is any one of the groups;

and taking the label with the largest occurrence number in the plurality of labels as the group name of the group j.
An apparatus for classifying data, the apparatus comprising:

an acquisition unit configured to acquire application data of a target application of a target object and acquire a target user ID of the target object;

the extraction unit is used for extracting the ID of the application data to obtain a plurality of user IDs and associated data corresponding to each user ID;

the bucket dividing processing unit is used for carrying out bucket dividing processing on the associated data of the user IDs through a locality sensitive hash algorithm to obtain a plurality of buckets, and each bucket comprises the associated data of at least one user ID;

and the dividing unit is used for dividing the plurality of user IDs into groups based on the associated data of the user IDs in the plurality of buckets to obtain a plurality of groups.
The apparatus according to claim 10, wherein in the ID extraction of the application data to obtain a plurality of user IDs, and associated data corresponding to each ID to obtain a plurality of user IDs, the extraction unit is specifically configured to:

searching the application data according to preset ID keywords to obtain a plurality of IDs;

integrating the plurality of IDs, wherein each user ID corresponds to a natural person;

and acquiring the associated data corresponding to the user IDs from the application data to obtain the associated data corresponding to each user ID in the user IDs.
The apparatus according to claim 10 or 11, wherein in the aspect of grouping the plurality of user IDs into a plurality of groups based on the association data of the user IDs in the plurality of buckets, the dividing unit is specifically configured to:

determining the association degree between each user ID and the target user ID according to the association data of each user ID in an ith bucket to obtain a plurality of association degrees, wherein the ith bucket is any one of the buckets;

selecting the association degrees larger than a preset threshold value from the association degrees to obtain at least one target association degree;

and taking the user ID corresponding to the at least one target association degree as a group.
The apparatus according to claim 12, wherein in the aspect that the association degree between the user ID and the target user ID is determined according to the association data of each user ID in the ith bucket, so as to obtain multiple association degrees, the dividing unit is specifically configured to:

acquiring associated data of a first user ID, wherein the first user ID is any one user ID in the ith bucket;

performing feature extraction on the associated data of the first user ID to obtain a target feature set;

and determining the association degree between the first user ID and the target user ID according to the target feature set.
The apparatus of claim 13, wherein the set of target features comprises feature values for a plurality of dimensions;

in said determining a degree of association between the first user ID and the target user ID in dependence on the target feature set, the partitioning units collectively being for:

determining a weight value corresponding to each dimension in the characteristic values of the multiple dimensions to obtain the weight values of the multiple dimensions;

and performing weighting operation according to the characteristic values of the dimensions and the weight values of the dimensions to obtain the association degree between the first user ID and the target user ID.
The apparatus according to claim 13, wherein, in said determining the degree of association between the first user ID and the target user ID according to the target feature set, the dividing unit is specifically configured to:

separating a first feature set corresponding to the first user ID and a second feature set corresponding to the target user ID according to the target feature set;

and determining the association degree between the first user ID and the target user ID according to the first feature set and the second feature set.
The apparatus according to any of the claims 10-15, wherein, in said obtaining application data of the target application of the target object, the obtaining unit is specifically configured to:

acquiring at least one user ID of a target application of the target object;

and acquiring application data of the target application of the target object in a preset time period from a preset database according to the at least one user ID.
The apparatus of claim 16, wherein when the at least one user ID is a natural person ID, the apparatus further comprises: a establishing unit and a determining unit, wherein,

the obtaining unit is further configured to obtain historical usage data of the target application of the electronic device corresponding to the target object;

the establishing unit is used for establishing a multi-dimensional feature layer and an ID-mapping relation layer according to the historical use data;

and the determining unit is used for determining a natural person ID according to the multi-dimensional feature layer and the ID-mapping relation layer, and taking the natural person ID as the target user ID.
An electronic device comprising a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps in the method of any of claims 1-9.
A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the method according to any one of claims 1-9.
A computer program product, characterized in that the computer program product comprises a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform the method according to any one of claims 1-9.