CN112115305A - Group identification method and device and computer readable storage medium - Google Patents

Group identification method and device and computer readable storage medium Download PDF

Info

Publication number
CN112115305A
CN112115305A CN201910541204.9A CN201910541204A CN112115305A CN 112115305 A CN112115305 A CN 112115305A CN 201910541204 A CN201910541204 A CN 201910541204A CN 112115305 A CN112115305 A CN 112115305A
Authority
CN
China
Prior art keywords
target
item set
potential
determining
items
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910541204.9A
Other languages
Chinese (zh)
Other versions
CN112115305B (en
Inventor
周武
俞颖晔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201910541204.9A priority Critical patent/CN112115305B/en
Publication of CN112115305A publication Critical patent/CN112115305A/en
Application granted granted Critical
Publication of CN112115305B publication Critical patent/CN112115305B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a group identification method, a group identification device and a computer readable storage medium, and belongs to the technical field of information processing. The method comprises the following steps: determining a plurality of frequent itemsets based on the plurality of candidate items, the itemsets including one or more reference elements; acquiring an auxiliary probability corresponding to each reference element in a frequent item set determined by a binary classification model, wherein the auxiliary probability is the probability that the reference element is a target element, and the target element is used for indicating individuals in a target group; determining a weighted average value of auxiliary probabilities corresponding to all reference elements in the frequent item set as a target probability corresponding to the frequent item set, wherein the target probability is the probability that the frequent item set is a target item set, and the target item set is used for indicating a target group; determining a target item set in a plurality of frequent item sets according to the target probability corresponding to each frequent item set; and determining a target group according to the target item set. The method and the device solve the problem that mining pertinence of the item set is low, and are used for identifying the target group.

Description

Group identification method and device and computer readable storage medium
Technical Field
The present application relates to the field of information technology, and in particular, to a group identification method, an apparatus, and a computer-readable storage medium.
Background
With the expansion of data size and the increasing demand of data processing, mining potentially valuable information from massive amounts of data becomes more and more important for data processing.
In the related art, when data mining is performed, a Frequent item set in the candidate sets is usually mined by constructing a Frequent Pattern tree (FP-tree). The term set refers to a set of several elements (also called terms), the frequent term set refers to a term set with a support degree greater than or equal to a support degree threshold (English: min _ sup), and the support degree refers to the frequency of occurrence of a certain term set in all candidate item sets.
The mined frequent item set can only show that the probability of the items in the item set is high, and the roles of the frequent item set in different data processing requirements are greatly different, so that the mining pertinence to the item set is low at present.
Disclosure of Invention
The application provides a group identification method, a group identification device and a computer-readable storage medium, which can solve the problem of low mining pertinence on an item set. The technical scheme is as follows:
in one aspect, a population identification method is provided, and the method includes:
determining a plurality of frequent itemsets based on the plurality of candidate items, the itemsets including one or more reference elements;
acquiring an auxiliary probability corresponding to each reference element in the frequent item set determined by a binary classification model, wherein the auxiliary probability is the probability that the reference element is a target element, and the target element is used for indicating individuals in a target group;
determining a weighted average value of auxiliary probabilities corresponding to reference elements in the frequent item set as a target probability corresponding to the frequent item set, wherein the target probability is the probability that the frequent item set is a target item set, and the target item set is used for indicating a target group;
determining a target item set in the multiple frequent item sets according to the target probability corresponding to each frequent item set;
and determining a target group according to the target item set.
Optionally, the determining a target item set in the multiple frequent item sets according to the target probability corresponding to each frequent item set includes:
determining one or more potential item sets according to the target probability corresponding to each frequent item set;
determining the target set of items among the one or more sets of potential items.
Optionally, determining the target set of items in the one or more potential sets of items comprises:
determining a weighted average of a plurality of filter parameter values for the set of potential terms, the filter parameter values for the set of potential terms being: the support of the potential item set, the number of target elements included in the potential item set, or the number of target elements in reference elements included in the potential item set;
and determining the potential item set with the weighted average value of the plurality of filtering parameter values in the one or more potential item sets larger than the filtering threshold value as the target item set.
Optionally, determining the target set of items in the one or more potential sets of items comprises:
determining the target set of items from characteristics of the set of potential items, the characteristics of the set of potential items including: one or more of attributes of reference elements comprised by the set of potential items and relationships of the reference elements comprised by the set of potential items to the target element.
Optionally, before the determining a plurality of frequent item sets based on the plurality of candidate item sets, the method further comprises:
constructing the plurality of candidate items based on spatiotemporal data.
In another aspect, there is provided a group identification apparatus including:
a first determining module for determining a plurality of frequent itemsets based on a plurality of candidate items, an itemset comprising one or more reference elements;
an obtaining module, configured to obtain an auxiliary probability corresponding to each reference element in the frequent item set determined by a binary model, where the auxiliary probability is a probability that the reference element is a target element, and the target element is used to indicate an individual in a target group;
a second determining module, configured to determine a weighted average of auxiliary probabilities corresponding to reference elements in the frequent item set as a target probability corresponding to the frequent item set, where the target probability is a probability that the frequent item set is a target item set, and the target item set is used to indicate a target group;
a third determining module, configured to determine a target item set in the multiple frequent item sets according to a target probability corresponding to each frequent item set;
and the fourth determining module is used for determining a target group according to the target item set.
Optionally, the third determining module includes:
the first determining submodule is used for determining one or more potential item sets according to the target probability corresponding to each frequent item set;
a second determination submodule to determine the target set of items in the one or more sets of potential items.
Optionally, the first determining sub-module is further configured to:
determining a weighted average of a plurality of filter parameter values for the set of potential terms, the filter parameter values for the set of potential terms being: the support of the potential item set, the number of target elements included in the potential item set, or the number of target elements in reference elements included in the potential item set;
and determining the potential item set with the weighted average value of the plurality of filtering parameter values in the one or more potential item sets larger than the filtering threshold value as the target item set.
Optionally, the first determining sub-module is further configured to:
determining the target set of items from characteristics of the set of potential items, the characteristics of the set of potential items including: one or more of attributes of reference elements comprised by the set of potential items and relationships of the reference elements comprised by the set of potential items to the target element.
Optionally, the group identification apparatus further includes:
a construction module for constructing the plurality of candidate item sets based on spatiotemporal data.
In still another aspect, a group identification apparatus is provided, which includes a processor and a memory, where the memory stores at least one instruction, and the at least one instruction, when executed by the processor, implements the group identification method described above.
In yet another aspect, a computer-readable storage medium is provided, having at least one instruction stored therein, which when executed, implements the population identification method described above.
The beneficial effect that technical scheme that this application provided brought includes at least:
according to the technical scheme, the frequent item set can be mined based on the candidate item set, the target item set is identified in the frequent item set, and the target group is determined, so that the mining of the item set is highly targeted.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a group identification method provided by an embodiment of the present application;
FIG. 2 is a flow chart of another group identification method provided by the embodiments of the present application;
FIG. 3 is a schematic structural diagram of a group identification device according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a third determining module provided in an embodiment of the present application;
FIG. 5 is a schematic structural diagram of another group identification apparatus provided in the embodiments of the present application;
fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a group identification method according to an embodiment of the present disclosure. Optionally, the group identification method may be used in a group identification device, which may be a terminal, a server, or other electronic equipment, and this is not limited in this embodiment of the present application. As shown in fig. 1, the population recognition method includes:
a plurality of frequent itemsets are determined based on a plurality of candidate items, the itemsets comprising one or more reference elements, step 101.
102, acquiring an auxiliary probability corresponding to each reference element in the frequent item set determined by the binary classification model, wherein the auxiliary probability is the probability that the reference element is a target element, and the target element is used for indicating individuals in a target group.
And 103, determining the weighted average of the auxiliary probabilities of all the reference elements in the frequent item set as a target probability corresponding to the frequent item set, wherein the target probability is the probability that the frequent item set is a target item set, and the target item set is used for indicating a target group.
And step 104, determining a target item set in the multiple frequent item sets according to the target probability corresponding to each frequent item set.
And 105, determining the target item set as a target group.
In summary, in the population identification method provided in the embodiment of the present application, a frequent item set may be mined based on the candidate item set, and a target item set is identified in the frequent item set to determine a target population, so that the mining of the item set has a higher pertinence.
Fig. 2 is a flowchart of another group identification method provided in the embodiment of the present application. Optionally, the group identification method may be used in a group identification device, which may be a terminal, a server, or other electronic equipment, and this is not limited in this embodiment of the present application. As shown in fig. 2, the population identifying method includes:
step 201, a plurality of candidate items are constructed based on spatiotemporal data.
Where spatiotemporal data refers to data having both a temporal dimension and a spatial dimension, a set of items includes one or more reference elements, which may also be referred to as items. Alternatively, the reference element in the candidate set may be one entity, and the candidate set may be a combination of entities in a certain time range and space range. The entity may be a person or other objects, which is not limited in this application.
Alternatively, multiple candidate sets may be constructed based on multiple sets of spatiotemporal data, each set of spatiotemporal data being used to construct one candidate set. The set of spatiotemporal data may include at least one set of data characterizing the entity, and the identity of the entity characterized by each set of data characterizing the entity may be determined, thereby determining the identity of one or more entities, the set of identities of the one or more entities being determined as a candidate.
Illustratively, the spatiotemporal data includes images captured by a target camera over a period of time, which may include multiple sets of data for characterizing a person, and an identification of the person characterized by each set of data may be determined. For example, the identification is a person name, and a set of a plurality of person names determined from the view control data may be determined as one candidate set.
Step 202, a plurality of frequent item sets is determined based on the plurality of candidate item sets.
Wherein the frequent item set may include one or more reference elements.
Optionally, a support threshold of the frequent item set may be determined according to the data processing requirement, and the frequent item set with the support greater than the support threshold is obtained through a mining algorithm.
Wherein, the support degree of the item set is used for characterizing the probability of the item set, and the support degree can be expressed by percentage. The support threshold is used for determining a frequent item set, and the support of the frequent item set needs to be greater than or equal to the support threshold.
Illustratively, a 20% support for a set of items means that the set of items has a 20% probability of occurring in all sets of items. Optionally, the data processing requirement may represent different data processing scenarios, and the different data processing requirements may correspond to different support degree thresholds. For example, the data processing requirement is to find a criminal group, and the support threshold corresponding to the data processing requirement may be 20%; the data processing requirement to determine the user's shopping intent, the threshold of support may be 10%.
Optionally, the mining algorithm may be an Apriori algorithm, a frequent pattern tree algorithm, or another mining algorithm, which is not limited in this embodiment.
For example, the process of deriving a frequent item set with a support degree greater than the support degree threshold by the mining algorithm may include:
s1, scanning the plurality of candidate items results in a 1 item set, that is, an item set including only one reference element, and all 1 item sets are counted.
And S2, determining a frequent 1 item set in all the 1 item sets obtained through scanning, wherein the support degree of the frequent 1 item set is greater than or equal to a support degree threshold value, and the reference element in each frequent 1 item set is a frequent item.
And S3, scanning the plurality of candidate item sets again, eliminating the non-frequent items for each candidate item set, and arranging the residual reference elements in the candidate item sets in a descending order according to the support degree of the corresponding 1 item set to obtain a plurality of auxiliary item sets.
And S4, reading the reference elements in each auxiliary item set in turn and inserting the reference elements into the FP tree.
And S5, sequentially mining the frequent items with the lowest support degree in the FP tree upwards, and finding the conditional mode base of each frequent item. The conditional mode base is to use a reference element to be mined as a leaf node, determine an FP sub-tree corresponding to the leaf node, set the count of each node in the sub-tree as the count of the leaf node, and delete the nodes with the count lower than the support degree threshold value in the conditional mode base. And then recursively mining a frequent item set of each frequent item from the conditional mode base.
And step 203, determining the target group attribute according to the data processing requirement.
The target group attribute is a feature that a target group to be identified needs to satisfy, and whether a certain group is a target group can be determined according to whether the group satisfies the target group attribute. For example, if the reference element in the item set is a person and the data processing requirement is to find a stealing partner, then the target group attribute may be a stealing partner. Also illustratively, the reference element in the item set is a cart, the data processing requirement is to find a medical team, and the target group attribute may be a medical team.
Alternatively, a target element and a target item set may be determined according to the target group attribute, the target element being used for indicating one individual in the target group, and the target item set being used for indicating the target group. Optionally, the target element is an identification of an individual in the target population, and the target item set is an item set composed of identifications of individuals in the target population. According to the probability that each reference element in a certain item set is a target element, the probability that the item set is the target item set can be determined, and then the probability that a group formed by individuals indicated by the reference elements in the item set is the target group is determined.
For example, if the target group attribute is a theft group, and each individual in the theft group is a theft suspect, the individual indicated by the target element may be the theft suspect and the target item set may be a set of identities of the theft suspects. If the target group attribute is a medical team and each individual in the medical team is a medical vehicle, the individual indicated by the target element may be a medical vehicle and the target item set may be a set of identifications of medical vehicles.
And 204, acquiring the auxiliary probability corresponding to each reference element in the frequent item set determined by the binary classification model according to the target group attributes.
Wherein the auxiliary probability is the probability that the reference element in the frequent item set is the target element.
Alternatively, a two-classification model may be established in advance, and feature data of a plurality of sample elements known to be target elements or not may be acquired. The feature data of the sample elements are processed and then input into a binary model to train the binary model, and the trained binary model can judge the probability that a certain reference element is a target element according to the input feature data of the reference element. The difference value between the number of the target elements and the number of the non-target elements in the obtained multiple sample elements needs to be smaller than the difference value threshold value, so that the number balance between the target elements and the non-target elements is ensured, and the probability that a certain reference element is the target element can be accurately determined after the multiple sample elements are adopted to train the two classification models.
Optionally, the number of the target elements corresponding to the two-classification model may be one or more. For example, for one target element a, the classification model may be used only to determine the probability that the reference element corresponding to the input feature data is a. For multiple target elements A, B and C, a binary model may be used to determine the probability that the reference element corresponding to the input feature data is target element a, the probability that the reference element is target element B, and the probability that the reference element is target element C, respectively.
Optionally, when one classification model can only determine the probability that a certain reference element is a target element, before step 204, a classification model meeting the data processing requirement may be determined, and the probabilities corresponding to the reference elements output by the classification model are directly obtained in step 204, where the probabilities are auxiliary probabilities. When one of the two-classification models can respectively determine the probability that a reference element is any one of a plurality of target elements, in step 204, auxiliary probabilities corresponding to the reference elements output by the two-classification model satisfying the data processing requirements need to be obtained in a targeted manner.
Illustratively, the target element determined from the target population attribute is a. If a binary model can only determine the probability that a reference element is a target element, a binary model capable of determining the probability that a reference element is a target element a may be selected before step 204, and then the binary model is used to directly determine the auxiliary probability corresponding to each reference element in the frequent item set in step 204. If a binary model can determine the probability that a reference element is any one of a plurality of target elements, for example, if the binary model can determine the probability that a reference element is a target element a, the probability that the reference element is a target element B, and the probability that the reference element is a target element C, respectively, in step 204, an auxiliary probability (i.e., the probability that the reference element is the target element a) needs to be selected from a plurality of probabilities corresponding to each reference element in the frequent item set output by the binary model.
And step 205, determining the weighted average of the auxiliary probabilities corresponding to the reference elements in the frequent item set as the target probability corresponding to the frequent item set.
Wherein the target probability is the probability that the frequent item set is the target item set. The target set of items may include one or more target elements, and the secondary probabilities (i.e., the probabilities that the respective reference elements are the target elements) corresponding to the respective reference elements in the set of items may reflect the probabilities that the set of items is the target set of items.
Optionally, the weights of the reference elements in the frequent item set may be all 1, or different weights may be set for the reference elements in combination with a specific application scenario of data mining.
For example, assume a frequent item set of { F, B, D }, where the auxiliary probability corresponding to the reference element F is 1, the auxiliary probability corresponding to the reference element B is 0.6, and the auxiliary probability corresponding to the reference element D is 0.5. If the weight of each reference element is set to be 1, the target probability corresponding to the frequent item set is (1+0.6+0.5)/3 is 0.7. If the weight of the reference element F is set to 0.6, the weight of the reference element B is set to 0.2, and the weight of the reference element D is set to 0.2, the target probability corresponding to the frequent item set is (1 × 0.6+0.6 × 0.2+0.5 × 0.2)/(0.6+0.2+0.2) ═ 0.82.
And step 206, determining a target item set in the multiple frequent item sets according to the target probability corresponding to each frequent item set.
Optionally, step 206 may include:
and step 51, determining one or more potential item sets according to the target probability corresponding to each frequent item set.
Wherein, the potential item set is also an item set with a higher possibility of being a target item set in a plurality of frequent item sets.
In one implementation, one or more frequent item sets in the plurality of frequent item sets having a target probability greater than a specified probability can be determined as the potential item set.
In another implementation, one or more frequent item sets with corresponding target probabilities in the multiple frequent item sets that are top may be determined as potential item sets. For example, the multiple frequent item sets may be sorted in descending order according to the target probability, and then the first a frequent item sets in the sorted multiple frequent item sets are determined as potential item sets, where a is a positive integer. Optionally, the frequent item set meeting the specified condition may also be determined in other manners, for example, a plurality of frequent item sets may also be arranged in an ascending order according to the target probability, and then the last a frequent item sets in the arranged frequent item sets are determined as potential item sets.
In yet another implementation, the frequent item sets corresponding to the top one or more target probabilities among all target probabilities corresponding to the frequent item sets may be determined as potential item sets. For example, the target probabilities corresponding to multiple frequent item sets may be sorted in a descending order, and then the frequent item sets corresponding to the top b sorted target probabilities are all determined as potential item sets, where b is a positive integer. Alternatively, in step 51, the target probabilities of the multiple frequent item sets may be sorted in ascending order, and then the frequent item sets corresponding to the b sorted target probabilities are all determined as potential item sets.
A target set of items is determined in the one or more sets of potential items, step 52.
Alternatively, each set of potential items may be determined directly as a set of target items upon determining the one or more sets of potential items.
Optionally, a plurality of sets of potential items may be determined in step 51. After determining the plurality of potential item sets, other filtering conditions can be set to filter the plurality of potential item sets so as to determine the target item set more accurately.
In an alternative embodiment, a weighted average of the plurality of filtering parameter values of the potential item set may be determined according to the plurality of filtering parameter values and the weights of the potential item set, and a potential item set with the weighted average greater than the filtering threshold may be determined as the target item set. That is, the filtering condition in such an alternative embodiment is that the weighted average of the plurality of filtering parameter values of the set of terms is greater than the filtering threshold.
Optionally, the filtering parameter values of the set of potential terms may be: the support of the potential item set, the number of included target elements of the potential item set, or the number of target elements in included reference elements of the potential item set.
Optionally, a weight for each filter parameter value and a filter threshold for the set of potential terms may be set. In step 52, each filtering parameter value of the potential item set may be determined, and a weight of each filtering parameter value may be obtained, so as to perform weighted average on each filtering parameter value of the potential item set, and compare the obtained weighted average with the filtering threshold.
For example, assume that the target element is a, the weight of the support degree in each filtering parameter value is 0.5, the weight of the number of target elements is 0.1, the weight of the number of target elements in proportion is 0.4, and the filtering threshold value is 0.3. The support of the set of potential terms { a, B, C, D } is 40%, so the number of target elements in the set of potential terms is 1, and the ratio of the number of target elements is 25%, then the weighted average of the individual filter parameter values for the set of potential terms is 40% × 0.5+1 × 0.1+ 25% × 0.4 ═ 0.4. The weighted average is greater than the filter threshold, so the set of potential terms can be determined as a set of target terms.
In another alternative embodiment, the target set of items may be determined from characteristics of the set of potential items, which may include: one or more of attributes of reference elements comprised by the set of potential items and relationships of the reference elements comprised by the set of potential items to the target element.
Alternatively, the set of potential items that satisfy some specified characteristic may be determined directly as the target set of items, or the set of potential items that satisfy some specified characteristic may be excluded directly, i.e., the set of potential items is determined not to be the target set of items. That is, the filtering condition in such alternative embodiments is that the property of the set of items satisfies a specified property, or the property of the set of items does not satisfy a specified property.
Illustratively, the reference element in the item set is a person, the target element is a thief, and the attributes of the reference element may include: age, occupation, gender, height, etc., the relationship of the reference element to the target element may include: relatives and co-workers, etc. Assuming that the reference elements in the designated property set all satisfy gender as male and the relationship to the target element is a relationship, then in step 52 it may be determined whether each set of potential items satisfies the designated property and the set of potential items that satisfies the designated property is determined as the target set of items.
And step 207, determining a target group according to the target item set.
Optionally, after the target item set is determined, the group formed by the individuals indicated by the reference elements in the target item set can be determined to satisfy the target group attribute, and then the group formed by the individuals indicated by the reference elements in the target item set can be determined as the target group.
For example, assuming the data processing requirement in step 203 is to find a stealing partner, the set of target items determined in step 206 is { C, D, E }, and the "C", "D", and "E" each represent a person name, then it can be determined from the set of target items that persons C, D, and E constitute a stealing partner.
In summary, in the population identification method provided in the embodiment of the present application, a frequent item set may be mined based on the candidate item set, and a target item set is identified in the frequent item set to determine a target population, so that the mining of the item set has a higher pertinence.
In addition, one or more potential item sets can be determined according to the target probability, and after the potential item sets are determined, the potential item sets can be further filtered to obtain the target item set, so that the accuracy of the determined target item set is improved.
Fig. 3 is a schematic structural diagram of a group identification device according to an embodiment of the present application. As shown in fig. 3, the group recognition device 30 may include:
a first determining module 301 for determining a plurality of frequent item sets based on a plurality of candidate item sets, an item set comprising one or more reference elements.
An obtaining module 302, configured to obtain an auxiliary probability corresponding to each reference element in the frequent item set determined by the binary model, where the auxiliary probability is a probability that an item is a target element, and the target element is used to indicate an individual in a target group.
The second determining module 303 is configured to determine a weighted average of the auxiliary probabilities of the reference elements in the frequent item set as a target probability corresponding to the frequent item set, where the target probability is a probability that the frequent item set is a target item set, and the target item set is used to indicate a target group.
A third determining module 304, configured to determine a target item set in the multiple frequent item sets according to the target probability corresponding to each frequent item set.
A fourth determining module 305, configured to determine a target group according to the target item set.
In summary, in the population identification method provided in the embodiment of the present application, the first determining module may mine a frequent item set based on the candidate item set, the third determining module may identify a target item set in the frequent item set, and the fourth determining module may determine the target population according to the target item set, so that the mining of the item set has a higher pertinence.
Optionally, fig. 4 is a schematic structural diagram of a third obtaining module provided in the embodiment of the present application. As shown in fig. 4, the third determining module 304 may include:
the first determining submodule 3041 is configured to determine one or more potential item sets according to the target probability corresponding to each frequent item set.
A second determining submodule 3042 for determining a target set of items in the one or more sets of potential items.
Optionally, the first determining submodule 3041 may be further configured to:
determining a weighted average of a plurality of filter parameter values for the set of potential terms, the filter parameter values for the set of potential terms being: the support degree of the potential item set, the number of target elements included in the potential item set or the number of target elements in the reference elements included in the potential item set;
and determining the potential item set with the weighted average value of the plurality of filtering parameter values in the one or more potential item sets larger than the filtering threshold value as the target item set.
Optionally, the first determining submodule 3041 may be further configured to:
determining a target set of items from characteristics of the set of potential items, the characteristics of the set of potential items including: one or more of attributes of reference elements comprised by the set of potential items and relationships of the reference elements comprised by the set of potential items to the target element.
Optionally, fig. 5 is a schematic structural diagram of another group identification device provided in the embodiment of the present application. As shown in fig. 5, on the basis of fig. 3, the group identification apparatus 30 may further include:
a construction module 306 for constructing a plurality of candidate sets based on spatiotemporal data.
In summary, in the population identifying device provided in the embodiment of the present application, a frequent item set may be mined based on a candidate item set, a target item set may be identified in the frequent item set, and a target population may be determined according to the target item set, so that the mining of the item set has a higher pertinence.
In addition, one or more potential item sets can be determined according to the target probability, and the potential item sets can be further filtered after the potential item sets are determined to obtain the target item set, so that the accuracy of the determined target item set is improved.
Fig. 6 is a schematic structural diagram of a group identification device according to an embodiment of the present application. The device may be a terminal, and may be, for example: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. A terminal may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.
Generally, a terminal includes: a processor 601 and a memory 602.
The processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 601 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 601 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
The memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 602 is used to store at least one instruction for execution by processor 601 to implement the method of population identification provided by the method embodiments herein.
In some embodiments, the terminal may further include: a peripheral interface 603 and at least one peripheral. The processor 601, memory 602, and peripheral interface 603 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 603 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 604, a touch screen display 605, a camera 606, an audio circuit 607, a positioning component 608, and a power supply 609.
The peripheral interface 603 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 601 and the memory 602. In some embodiments, the processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 601, the memory 602, and the peripheral interface 603 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
The Radio Frequency circuit 604 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 604 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 604 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 604 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 604 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display 605 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 605 is a touch display screen, the display screen 605 also has the ability to capture touch signals on or over the surface of the display screen 605. The touch signal may be input to the processor 601 as a control signal for processing. At this point, the display 605 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 605 may be one, providing the front panel of the terminal; in other embodiments, the display 605 may be at least two, respectively disposed on different surfaces of the terminal or in a folding design; in still other embodiments, the display 605 may be a flexible display disposed on a curved surface or on a folded surface of the terminal. Even more, the display 605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 605 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.
The camera assembly 606 is used to capture images or video. Optionally, camera assembly 606 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 606 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
Audio circuitry 607 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 601 for processing or inputting the electric signals to the radio frequency circuit 604 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones can be arranged at different parts of the terminal respectively. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 601 or the radio frequency circuit 604 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 607 may also include a headphone jack.
The positioning component 608 is used to locate the current geographic Location of the terminal to implement navigation or LBS (Location Based Service). The Positioning component 608 can be a Positioning component based on the united states GPS (Global Positioning System), the chinese beidou System, the russian graves System, or the european union's galileo System.
The power supply 609 is used to supply power to various components in the terminal. The power supply 609 may be ac, dc, disposable or rechargeable. When the power supply 609 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, the terminal also includes one or more sensors 610. The one or more sensors 610 include, but are not limited to: acceleration sensor 611, gyro sensor 612, pressure sensor 613, fingerprint sensor 614, optical sensor 615, and proximity sensor 616.
The acceleration sensor 611 may detect the magnitude of acceleration on three coordinate axes of a coordinate system established with the terminal. For example, the acceleration sensor 611 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 601 may control the touch screen display 605 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 611. The acceleration sensor 611 may also be used for acquisition of motion data of a game or a user.
The gyroscope sensor 612 may detect a body direction and a rotation angle of the terminal, and the gyroscope sensor 612 and the acceleration sensor 611 may cooperate to acquire a 3D motion of the user on the terminal. The processor 601 may implement the following functions according to the data collected by the gyro sensor 612: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
The pressure sensor 613 may be disposed on a side frame of the terminal and/or on a lower layer of the touch display screen 605. When the pressure sensor 613 is disposed on the side frame of the terminal, a user's holding signal to the terminal can be detected, and the processor 601 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 613. When the pressure sensor 613 is disposed at the lower layer of the touch display screen 605, the processor 601 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 605. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The fingerprint sensor 614 is used for collecting a fingerprint of a user, and the processor 601 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 614, or the fingerprint sensor 614 identifies the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 601 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 614 may be provided on the front, back or side of the terminal. When a physical button or vendor Logo is provided on the terminal, the fingerprint sensor 614 may be integrated with the physical button or vendor Logo.
The optical sensor 615 is used to collect the ambient light intensity. In one embodiment, processor 601 may control the display brightness of touch display 605 based on the ambient light intensity collected by optical sensor 615. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 605 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 605 is turned down. In another embodiment, the processor 601 may also dynamically adjust the shooting parameters of the camera assembly 606 according to the ambient light intensity collected by the optical sensor 615.
A proximity sensor 616, also known as a distance sensor, is typically provided on the front panel of the terminal. The proximity sensor 616 is used to collect the distance between the user and the front face of the terminal. In one embodiment, when the proximity sensor 616 detects that the distance between the user and the front face of the terminal gradually decreases, the processor 601 controls the touch display 605 to switch from the bright screen state to the dark screen state; when the proximity sensor 616 detects that the distance between the user and the front face of the terminal gradually becomes larger, the processor 601 controls the touch display 605 to switch from the rest screen state to the bright screen state.
Those skilled in the art will appreciate that the configuration shown in fig. 6 is not intended to be limiting, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
In an exemplary embodiment, a group identification apparatus is also provided and may include a processor and a memory having at least one instruction stored therein. The at least one instruction is configured to be executed by one or more processors to implement any of the population identification methods described above.
In an exemplary embodiment, a computer-readable storage medium is also provided, having stored therein at least one instruction, which when executed by a processor of a computer device, implements any of the population identification methods described above.
Alternatively, the computer-readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It should be noted that: the device provided in the above embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the above described functions.
In addition, the method embodiments provided in the embodiments of the present application can be mutually referred to corresponding apparatus embodiments, and the embodiments of the present application do not limit this. The sequence of the steps of the method embodiments provided in the embodiments of the present application can be appropriately adjusted, and the steps can be correspondingly increased or decreased according to the situation, and any method that can be easily conceived by those skilled in the art within the technical scope disclosed in the present application shall be covered by the protection scope of the present application, and therefore, the details are not repeated.
It should be understood that reference to "a plurality" herein means two or more. The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (12)

1. A method of population identification, the method comprising:
determining a plurality of frequent itemsets based on the plurality of candidate items, the itemsets including one or more reference elements;
acquiring an auxiliary probability corresponding to each reference element in the frequent item set determined by a binary classification model, wherein the auxiliary probability is the probability that the reference element is a target element, and the target element is used for indicating individuals in a target group;
determining a weighted average value of auxiliary probabilities corresponding to reference elements in the frequent item set as a target probability corresponding to the frequent item set, wherein the target probability is the probability that the frequent item set is a target item set, and the target item set is used for indicating a target group;
determining a target item set in the multiple frequent item sets according to the target probability corresponding to each frequent item set;
and determining a target group according to the target item set.
2. The method of claim 1, wherein determining a target set of items among the plurality of frequent item sets based on the target probabilities corresponding to the respective frequent item sets comprises:
determining one or more potential item sets according to the target probability corresponding to each frequent item set;
determining the target set of items among the one or more sets of potential items.
3. The method of claim 2, wherein determining the target set of items among the one or more potential sets of items comprises:
determining a weighted average of a plurality of filter parameter values for the set of potential terms, the filter parameter values for the set of potential terms being: the support of the potential item set, the number of target elements included in the potential item set, or the number of target elements in reference elements included in the potential item set;
and determining the potential item set with the weighted average value of the plurality of filtering parameter values in the one or more potential item sets larger than the filtering threshold value as the target item set.
4. The method of claim 2, wherein determining the target set of items among the one or more potential sets of items comprises:
determining the target set of items from characteristics of the set of potential items, the characteristics of the set of potential items including: one or more of attributes of reference elements comprised by the set of potential items and relationships of the reference elements comprised by the set of potential items to the target element.
5. The method of claim 1, wherein prior to said determining a plurality of frequent item sets based on a plurality of candidate item sets, the method further comprises:
constructing the plurality of candidate items based on spatiotemporal data.
6. A group recognition apparatus, characterized in that the group recognition apparatus comprises:
a first determining module for determining a plurality of frequent itemsets based on a plurality of candidate items, an itemset comprising one or more reference elements;
an obtaining module, configured to obtain an auxiliary probability corresponding to each reference element in the frequent item set determined by a binary model, where the auxiliary probability is a probability that the reference element is a target element, and the target element is used to indicate an individual in a target group;
a second determining module, configured to determine a weighted average of auxiliary probabilities corresponding to reference elements in the frequent item set as a target probability corresponding to the frequent item set, where the target probability is a probability that the frequent item set is a target item set, and the target item set is used to indicate a target group;
a third determining module, configured to determine a target item set in the multiple frequent item sets according to a target probability corresponding to each frequent item set;
and the fourth determining module is used for determining a target group according to the target item set.
7. The group recognition device of claim 6, wherein the third determining module comprises:
the first determining submodule is used for determining one or more potential item sets according to the target probability corresponding to each frequent item set;
a second determination submodule to determine the target set of items in the one or more sets of potential items.
8. The population identifying apparatus of claim 7, wherein said first determining sub-module is further configured to:
determining a weighted average of a plurality of filter parameter values for the set of potential terms, the filter parameter values for the set of potential terms being: the support of the potential item set, the number of target elements included in the potential item set, or the number of target elements in reference elements included in the potential item set;
and determining the potential item set with the weighted average value of the plurality of filtering parameter values in the one or more potential item sets larger than the filtering threshold value as the target item set.
9. The population identifying apparatus of claim 7, wherein said first determining sub-module is further configured to:
determining the target set of items from characteristics of the set of potential items, the characteristics of the set of potential items including: one or more of attributes of reference elements comprised by the set of potential items and relationships of the reference elements comprised by the set of potential items to the target element.
10. The group recognition device of claim 6, further comprising:
a construction module for constructing the plurality of candidate item sets based on spatiotemporal data.
11. A group recognition apparatus comprising a processor and a memory, the memory having stored therein at least one instruction which, when executed by the processor, implements a group recognition method as claimed in any one of claims 1 to 5.
12. A computer-readable storage medium having stored therein at least one instruction which, when executed, implements a population identification method as claimed in any one of claims 1 to 5.
CN201910541204.9A 2019-06-21 2019-06-21 Group identification method apparatus and computer-readable storage medium Active CN112115305B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910541204.9A CN112115305B (en) 2019-06-21 2019-06-21 Group identification method apparatus and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910541204.9A CN112115305B (en) 2019-06-21 2019-06-21 Group identification method apparatus and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN112115305A true CN112115305A (en) 2020-12-22
CN112115305B CN112115305B (en) 2024-04-09

Family

ID=73796177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910541204.9A Active CN112115305B (en) 2019-06-21 2019-06-21 Group identification method apparatus and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN112115305B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120078912A1 (en) * 2010-09-23 2012-03-29 Chetan Kumar Gupta Method and system for event correlation
KR20120038575A (en) * 2010-10-14 2012-04-24 재단법인 한국특허정보원 Method of automatic patent document categorization adjusting association rules and frequent itemset
TW201229793A (en) * 2010-10-14 2012-07-16 Ibm System, method, and program product for extracting meaningful frequent itemset
US20130051670A1 (en) * 2011-08-30 2013-02-28 Madirakshi Das Detecting recurring themes in consumer image collections
CN103927398A (en) * 2014-05-07 2014-07-16 中国人民解放军信息工程大学 Microblog hype group discovering method based on maximum frequent item set mining
CN106650273A (en) * 2016-12-28 2017-05-10 东方网力科技股份有限公司 Behavior prediction method and device
CN108091398A (en) * 2016-11-21 2018-05-29 医渡云(北京)技术有限公司 Patient's group technology and device
CN108346085A (en) * 2018-01-30 2018-07-31 南京邮电大学 Electric business platform personalized recommendation method based on weighted frequent items mining algorithm
CN108764197A (en) * 2018-06-06 2018-11-06 中兴智能交通股份有限公司 With vehicle identification method, device, terminal and computer readable storage medium
US20180322125A1 (en) * 2016-09-23 2018-11-08 Tencent Technology (Shenzhen) Company Limited Itemset determining method and apparatus, processing device, and storage medium
CN109508386A (en) * 2018-11-07 2019-03-22 福建工程学院 A kind of relevance metric method of stock information press center word and related stock
CN109558435A (en) * 2018-10-24 2019-04-02 南京邮电大学 A kind of weighted frequent items mining algorithm towards precision marketing

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120078912A1 (en) * 2010-09-23 2012-03-29 Chetan Kumar Gupta Method and system for event correlation
KR20120038575A (en) * 2010-10-14 2012-04-24 재단법인 한국특허정보원 Method of automatic patent document categorization adjusting association rules and frequent itemset
TW201229793A (en) * 2010-10-14 2012-07-16 Ibm System, method, and program product for extracting meaningful frequent itemset
US20130051670A1 (en) * 2011-08-30 2013-02-28 Madirakshi Das Detecting recurring themes in consumer image collections
CN103927398A (en) * 2014-05-07 2014-07-16 中国人民解放军信息工程大学 Microblog hype group discovering method based on maximum frequent item set mining
US20180322125A1 (en) * 2016-09-23 2018-11-08 Tencent Technology (Shenzhen) Company Limited Itemset determining method and apparatus, processing device, and storage medium
CN108091398A (en) * 2016-11-21 2018-05-29 医渡云(北京)技术有限公司 Patient's group technology and device
CN106650273A (en) * 2016-12-28 2017-05-10 东方网力科技股份有限公司 Behavior prediction method and device
CN108346085A (en) * 2018-01-30 2018-07-31 南京邮电大学 Electric business platform personalized recommendation method based on weighted frequent items mining algorithm
CN108764197A (en) * 2018-06-06 2018-11-06 中兴智能交通股份有限公司 With vehicle identification method, device, terminal and computer readable storage medium
CN109558435A (en) * 2018-10-24 2019-04-02 南京邮电大学 A kind of weighted frequent items mining algorithm towards precision marketing
CN109508386A (en) * 2018-11-07 2019-03-22 福建工程学院 A kind of relevance metric method of stock information press center word and related stock

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
董亚楠等: "点击欺诈群体检测与发现", 《计算机应用研究 》, vol. 33, no. 6, pages 1771 - 1774 *
饶亮: "改进的Apriori算法在贫困生助学系统中的应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN112115305B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
CN110602101B (en) Method, device, equipment and storage medium for determining network abnormal group
CN110839128B (en) Photographing behavior detection method and device and storage medium
CN111104980B (en) Method, device, equipment and storage medium for determining classification result
CN111027490B (en) Face attribute identification method and device and storage medium
CN110503160B (en) Image recognition method and device, electronic equipment and storage medium
CN111127509A (en) Target tracking method, device and computer readable storage medium
CN112084811A (en) Identity information determining method and device and storage medium
CN110705614A (en) Model training method and device, electronic equipment and storage medium
CN111192072A (en) User grouping method and device and storage medium
CN112989198B (en) Push content determination method, device, equipment and computer-readable storage medium
CN111586279A (en) Method, device and equipment for determining shooting state and storage medium
CN110675473B (en) Method, device, electronic equipment and medium for generating GIF dynamic diagram
CN110853124B (en) Method, device, electronic equipment and medium for generating GIF dynamic diagram
CN111428080B (en) Video file storage method, video file search method and video file storage device
CN111988664B (en) Video processing method, video processing device, computer equipment and computer-readable storage medium
CN110717110B (en) Multimedia resource filtering method and device, electronic equipment and storage medium
CN113936240A (en) Method, device and equipment for determining sample image and storage medium
CN112115305B (en) Group identification method apparatus and computer-readable storage medium
CN114283310A (en) Image recognition model acquisition method, image recognition device and medium
CN110928867B (en) Data fusion method and device
CN110427362B (en) Method and device for acquiring database types
CN111159168A (en) Data processing method and device
CN112749583A (en) Face image grouping method and device, computer equipment and storage medium
CN112135256A (en) Method, device and equipment for determining movement track and readable storage medium
CN111984738A (en) Data association method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant