CN116167624B

CN116167624B - Determination method of target category identification, storage medium and electronic equipment

Info

Publication number: CN116167624B
Application number: CN202310449913.0A
Authority: CN
Inventors: 袁雷锋; 王旭东; 张俊; 孙茂鹏; 司义品; 周麟钗; 李宏图; 单威
Original assignee: Tianxinda Information Technology Co ltd
Current assignee: Tianxinda Information Technology Co ltd
Priority date: 2023-04-25
Filing date: 2023-04-25
Publication date: 2023-07-07
Anticipated expiration: 2043-04-25
Also published as: CN116167624A

Abstract

The present invention relates to the field of data processing, and in particular, to a method for determining a target class identifier, a storage medium, and an electronic device, where the method includes: a, a _i The ith object identifier corresponding to the event to be executed, b _j File content data for the i-th file; acquiring a target feature identification group T, T _j For the j-th target feature in T, T _j =1 for indicating that any one of a belongs to b _j ，t _j =0 for a _i Not belonging to b _j The method comprises the steps of carrying out a first treatment on the surface of the Obtaining feature vector f= (H) ¹ ,H ² ,t ₁ ,t ₂ ,...,t _j ,...,t _m ,REL,P)；H ¹ Identifying for the first feature; h ² Identifying for the second feature; m is the number of preset files, REL is the execution identifier corresponding to the associated event of the event to be executed; p is an influence coefficient; and obtaining the target category identification according to the F. Therefore, the accuracy of determining the target category identification corresponding to the event to be executed can be improved.

Description

Determination method of target category identification, storage medium and electronic equipment

Technical Field

The present invention relates to the field of data processing, and in particular, to a method for determining a target class identifier, a storage medium, and an electronic device.

Background

In the air freight industry, before an event to be executed is executed, a category identifier corresponding to the event to be executed is often determined, so that a cargo detection method corresponding to the category identifier is determined as a cargo detection method corresponding to the event to be executed. The event to be executed is an air freight flight corresponding to a waybill.

At present, when determining a class identifier corresponding to an event to be executed, determining a preset score corresponding to each target object corresponding to the event to be executed in a plurality of preset scores, wherein the target objects are cargoes, and the preset scores are used for representing the risk degree of the corresponding target objects in the air cargo transportation process; and then summing the preset scores of all the targets corresponding to the event to be executed to obtain a total score corresponding to the event to be executed, and determining the category identification corresponding to the event to be executed according to the total score.

However, since the risk level of at least part of the target objects in the air cargo transportation process is continuously updated along with the change of the actual situation, and the corresponding preset score can be adjusted after the update, the preset score cannot be updated completely in real time, and based on the fact, the accuracy of the total score corresponding to the event to be executed is low, and the accuracy of the category identification corresponding to the event to be executed is further determined to be low.

Disclosure of Invention

Aiming at the technical problems, the invention adopts the following technical scheme:

according to an aspect of the present invention, there is provided a method for determining a target class identifier, including the steps of:

s100, obtaining a target object identification group A= (a) corresponding to the event to be executed ₁ ,a ₂ ,...,a _i ,...,a _n ) I=1, 2, n; wherein a is _i And (3) the ith object identifier corresponding to the event to be executed, and n is the number of object identifiers corresponding to the event to be executed.

S200, acquiring a file content data set b= (B) ₁ ,b ₂ ,...,b _j ,...,b _m ) J=1, 2, m; wherein b _j The file content data of the ith file, and m is the number of preset files; b _j Including at least one candidate identification.

S300, according to A or B, obtaining a target feature identification group T= (T) ₁ ,t ₂ ,...,t _j ,...,t _m ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein t is _j For the j-th target feature in T, T _j =1 or 0, t _j =1 for tableShow a ₁ 、a ₂ 、...、a _i 、...、a _n Any one of b _j ，t _j =0 for a _i Not belonging to b _j 。

S400, obtaining a feature vector F= (H) corresponding to the event to be executed ¹ ,H ² ,t ₁ ,t ₂ ,...,t _j ,...,t _m REL, P); wherein H is ¹ For the first characteristic identification, H ¹ =1 or 0, h ¹ =1 for indicating that the event type of the event to be executed is the first target type, H ¹ =0 to indicate that the event type of the event to be executed is not the first target type; h ² For the second characteristic mark, H ² =1 or 0, h ² =1 for indicating that the event type of the event to be executed is the second target type, H ² =0 to indicate that the event type of the event to be executed is not the second target type; REL is an execution identifier corresponding to an associated event of an event to be executed, rel=1 or 0, rel=1 being used to indicate that at the current time period _now When the association event has been performed, rel=0 is used to indicate that at time _now The time-associated event is not executed; and P is an influence coefficient used for representing the influence degree of historical data corresponding to the initiator of the event to be executed on the identification of the determined target category.

S500, based on the classification model, determining the candidate category identification corresponding to the F as the target category identification corresponding to the event to be executed in a plurality of candidate category identifications.

According to another aspect of the present invention, there is also provided a non-transitory computer readable storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by a processor to implement the method for determining the target class identification described above.

According to another aspect of the present invention, there is also provided an electronic device comprising a processor and the above-described non-transitory computer-readable storage medium.

The invention has at least the following beneficial effects:

in the invention, firstly, the group A and the file content number are identified by the object corresponding to the event to be executedDetermination of t from group B _j And determining H according to the event type of the event to be executed ¹ And H ² Determining REL according to the execution condition of the related event of the event to be executed, determining P according to the historical data corresponding to the initiator of the event to be executed, and then determining P according to the data obtained by t _j 、H ¹ 、H ² F obtained by REL and P determines a target category identification from a plurality of candidate category identifications.

In the related art, first, the total score of the preset scores corresponding to n object identifiers is determined according to the object identifier group a, and then the object category identifier is determined according to the total score, but the preset score corresponding to each object identifier is lower in accuracy because the preset score cannot be updated completely in real time, so that the accuracy of the object category identifier corresponding to the event to be executed is lower; compared with the related art, t in the invention _j Is determined according to whether the file content data of each file comprises any object identifier corresponding to the event to be executed or not, and then t _j Is determined according to the latest file content data, and the preset score is not required to be adjusted according to the file content data, so that t _j More accurate, and then according to t _j The accuracy of the obtained F for determining the target category identification is higher, so that the accuracy for determining the target category identification corresponding to the event to be executed is improved.

In addition, compared with the determination of the target category identification in the related art, which only considers the corresponding preset score of the target object identification, F in the invention also comprises H ¹ 、H ² And REL and P, and further consider the event type of the event to be executed, the associated event of the event to be executed and the historical data corresponding to the initiator of the event to be executed when determining the target category identification corresponding to the event to be executed, so that the characteristics of the event to be executed are more obvious, and the aim of further improving the accuracy of determining the target category identification corresponding to the event to be executed is fulfilled.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method for determining a target category identifier according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

The embodiment of the invention provides a method for determining target category identification, wherein the method can be completed by any one or any combination of the following: terminals, servers, and other devices with processing capabilities, which are not limited in this embodiment of the present invention.

The method for determining the target class identification will be described with reference to a flowchart of the method for determining the target class identification shown in fig. 1.

The method comprises the following steps:

s100, obtaining a target object identification group A= (a) corresponding to the event to be executed ₁ ,a ₂ ,...,a _i ,...,a _n )，i=1,2,...,n。

Wherein a is _i And (3) the ith object identifier corresponding to the event to be executed, and n is the number of object identifiers corresponding to the event to be executed.

Specifically, the event to be executed is an air freight flight corresponding to an air freight bill, and the take-off time of the flight corresponding to the event to be executed is in the current time _now And then, the object is a cargo, the object identifier is the name of the corresponding object, and the object identifier corresponding to the event to be executed is the object identifier included in the waybill corresponding to the event to be executed.

S200, acquiring a file content data set b= (B) ₁ ,b ₂ ,...,b _j ,...,b _m )，j=1,2,...,m。

Wherein b _j The file content data of the ith file, and m is the number of preset files; b _j Including at least one candidate identification.

For example, m=3, b ₁ The corresponding file is the "x-ray machine difficult to identify List", b ₂ The corresponding file is suspected dangerous goods list, b ₂ The corresponding file is an implicit dangerous goods list; wherein b ₁ Comprising names of goods difficult to identify by a plurality of X-ray machines, b ₁ The goods which are difficult to identify by each corresponding X-ray machine are candidates; b ₂ Comprising the names of a plurality of suspected dangerous goods b ₂ Each corresponding suspected dangerous cargo is a candidate; b ₃ Names of goods comprising several hidden dangerous goods, b ₃ The corresponding goods of each hidden dangerous goods are one candidate, and the candidate is identified as the name of the corresponding candidate.

S300, according to A or B, obtaining a target feature identification group T= (T) ₁ ,t ₂ ,...,t _j ,...,t _m )。

Wherein t is _j For the j-th target feature in T, T _j =1 or 0, t _j =1 for a ₁ 、a ₂ 、...、a _i 、...、a _n Any one of b _j ，t _j =0 for a _i Not belonging to b _j 。

S400, obtaining a feature vector F= (H) corresponding to the event to be executed ¹ ,H ² ,t ₁ ,t ₂ ,...,t _j ,...,t _m ,REL,P)。

Wherein H is ¹ For the first characteristic identification, H ¹ =1 or 0, h ¹ =1 for indicating that the event type of the event to be executed is the first target type, H ¹ =0 is used to indicate that the event type of the event to be executed is not the first target type. H ² For the second characteristic mark, H ² =1 or 0, h ² =1 for representing the event to be executedThe event type of the piece is the second target type, H ² =0 is used to indicate that the event type of the event to be executed is not the second target type. REL is an execution identifier corresponding to an associated event of an event to be executed, rel=1 or 0, rel=1 being used to indicate that at the current time period _now When the association event has been performed, rel=0 is used to indicate that at time _now The time-associated event is not performed. And P is an influence coefficient used for representing the influence degree of historical data corresponding to the initiator of the event to be executed on the identification of the determined target category.

Specifically, the first target type is a type corresponding to the guard flight, and correspondingly, the event type of the event to be executed is the first target type and is used for indicating that the event to be executed is the guard flight, and the event type of the event to be executed is not the first target type and is used for indicating that the event to be executed is not the guard flight.

In a specific embodiment, the second target type is a type corresponding to an important flight, and the corresponding event type of the event to be executed is the second target type for indicating that the event to be executed is an important flight, and the event type of the event to be executed is not the second target type for indicating that the event to be executed is not an important flight; in another specific embodiment, the second target type is a type corresponding to a focused on route, and the corresponding event type of the event to be executed is that the second target type is used for indicating that the route corresponding to the event to be executed is a focused on route, and the event type of the event to be executed is not that the second target type is used for indicating that the route corresponding to the event to be executed is not a focused on route.

The related event is a differential record; in a specific embodiment, REL is determined by the following method: determine at time _now Whether the differential record product serial number exists on the waybill corresponding to the event to be executed or not; if yes, rel=1; otherwise rel=0.

The initiator of the event to be executed is the agent corresponding to the event to be executed; in a specific embodiment, P is the credit of the agent corresponding to the event to be executed, that is, the credit of the shipper corresponding to the air freight bill.

Optionally, P is the rootAccording to the historic unpacking rate M of the agent ₁ Historical return rate M ₂ Historical handover rate M ₃ Number of bills M for air freight over the past year ₄ Quantity of cargo M for air freight over the last year ₅ And/or the cargo weight M for air freight over the last year ₆ And (3) determining.

For example, p= (M ₁ +M ₂ +M ₃ +M ₄ /M _max ⁴ +M ₅ /M _max ⁵ +M ₆ /M _max ⁶ ) 100/6; wherein the maximum number of waybills M _max ⁴ =max(M ₁ ⁴ ,M ₂ ⁴ ,...,M _poi ⁴ ,...,M _sum ⁴ ) Poi=1, 2, sum, max () is a preset maximum value determination function, M _poi ⁴ The method comprises the steps that the number of the waybills for carrying out air freight in the past year is the poi candidate agents, sum is the number of the candidate agents, and the agent corresponding to the event to be executed is any one of sum candidate agents; maximum cargo quantity M _max ⁵ =max(M ₁ ⁵ ,M ₂ ⁵ ,...,M _poi ⁵ ,...,M _sum ⁵ )，M _poi ⁵ Number of cargo for air freight over the past year for the poi candidate agent; maximum cargo weight M _max ⁶ =max(M ₁ ⁶ ,M ₂ ⁶ ,...,M _poi ⁶ ,...,M _sum ⁶ )，M _poi ⁶ Cargo weight for air freight for the last year for the poi candidate agent.

Specifically, the classification model may be a random forest model or a GBDT (Gradient Boosting Decision Tree, gradient descent tree) model, which is not limited in the embodiment of the present invention.

Optionally, the number of candidate category identifiers is 5, and the 5 candidate category identifiers can be identifiers corresponding to a low risk category, a lower risk category, a common category, a strict control category and a high risk category respectively.

In addition, each candidate category identifier is provided with a corresponding cargo inspection method, after the target category identifier corresponding to the event to be executed is determined, the cargo inspection method corresponding to the event to be executed can be determined, so that a simpler cargo inspection method can be matched with the event to be executed with higher safety, and a more complex cargo inspection method can be matched with the event to be executed with lower safety, and the cargo inspection efficiency can be improved while the safety of the event to be executed is improved.

It can be seen that, in the present invention, t is first determined by the object identification group a and the file content data group B corresponding to the event to be executed _j And determining H according to the event type of the event to be executed ¹ And H ² Determining REL according to the execution condition of the related event of the event to be executed, determining P according to the historical data corresponding to the initiator of the event to be executed, and then determining P according to the data obtained by t _j 、H ¹ 、H ² F obtained by REL and P determines a target category identification from a plurality of candidate category identifications.

In addition, only the target is considered in comparison with the determination of the target class identification in the related artThe F in the invention also comprises H ¹ 、H ² And REL and P, and further consider the event type of the event to be executed, the associated event of the event to be executed and the historical data corresponding to the initiator of the event to be executed when determining the target category identification corresponding to the event to be executed, so that the characteristics of the event to be executed are more obvious, and the aim of further improving the accuracy of determining the target category identification corresponding to the event to be executed is fulfilled.

Optionally, the classification model is obtained by the following method:

s501, obtaining a training sample set D= (D) ₁ ,d ₂ ,...,d _x ,...,d _y )，x=1,2,..,y。

Wherein d _x The x training sample in D; y is the number of training samples in D; d, d _x =(d _x ¹ ,d _x ² ,...,d _x ^r ,...,d _x ^s ,N _x )，r=1,2,..,s；d _x ^r The parameter corresponding to the r target attribute information for the x first executed event; s is the number of target attribute information, s=m+4; n (N) _x Identifying a candidate category corresponding to the xth first executed event; h ¹ For the parameter of the 1 st target attribute information corresponding to the event to be executed, H ² For the parameter of the 2 nd target attribute information corresponding to the event to be executed, t _j For the parameters of the (j+2) -th target attribute information corresponding to the event to be executed, REL is the parameters of the (m+1) -th target attribute information corresponding to the event to be executed, and P is the parameters of the (m+2) -th target attribute information corresponding to the event to be executed.

Specifically, the first executed event is an air freight flight corresponding to an air freight bill, and the take-off time of the flight corresponding to the first executed event is in time _now Before or at time _now The 1 st target attribute information is an event type corresponding to the first target type, the 2 nd target attribute information is an event type corresponding to the second target type, the (j+2) th target attribute information is a case that the j-th file contains a target object identifier, the (m+1) th target attribute information is an associated event execution case, and the (m+2) th targetThe tag attribute information is credit of the agent. And D can be obtained by carrying out sample equalization processing based on the data resampling and the cost sensitive matrix.

S502, performing q times of random selection of L training samples in the D, and taking each training sample randomly selected in the D for the kth time as a target training sample in the kth target training sample group to obtain a target training sample list SAM= (SAM) ₁ ,sam ₂ ,...,sam _k ,...,sam _q ),sam _k =(sam _k ¹ ,sam _k ² ,...,sam _k ^p ,...,sam _k ^L )，k=1,2,..,q，p=1,2,..,L。

Wherein, sam _k The k target training sample group in the SAM, q is the number of the target training sample groups in the SAM; sam _k ^p Is sam _k The p-th target training sample in (1), L is sam _k L < y.

If l× qy, the same training samples exist in at least part of the target training sample groups, and if l×q is less than or equal to y, the intersection of any two target training sample groups may be an empty set, or the same training sample exists in at least part of the target training sample groups.

For example, y=9000 and l=1000, and then 1000 training samples are randomly selected from 9000 training samples and performed 10 times, thereby obtaining Sam= (SAM) ₁ ,sam ₂ ,...,sam _k ,...,sam ₁₀ ),sam _k =(sam _k ¹ ,sam _k ² ,...,sam _k ^p ,...,sam _k ¹⁰⁰⁰ ). At this time, the same training sample exists in at least part of the target training sample group.

S503, for sam _k Performing decision tree generation processing to obtain a sam _k And a corresponding decision tree.

S504, constructing a classification model based on a plurality of decision trees.

Specifically, the classification model is a random forest model.

Optionally, the decision tree generation process includes the following steps:

s510, taking the target training sample group subjected to decision tree generation processing as a current group.

S520, generating a root node of a decision tree corresponding to the current group, and taking all target training samples in the current group as samples corresponding to the root node.

S530, performing child node generation processing on the root node.

The child node generation process includes:

and S531, taking the node for generating the child node as the current node.

S532, determining whether the current node meets the recursion stop condition of the decision tree corresponding to the current group; if yes, go to step S533; otherwise, step S534 is entered.

S533, determining the current node as a leaf node, marking the leaf node by the candidate category identification with the largest number in all the candidate category identifications in all the samples corresponding to the current node, and proceeding to step S539.

For example, after the current node is determined as a leaf node, the number of samples corresponding to the current node is 5, wherein the candidate category identifiers in the 1 st sample, the 3 rd sample and the 5 th sample are all identifiers corresponding to low-risk categories, the candidate category identifiers in the 2 nd sample are all identifiers corresponding to strict control categories, the candidate category identifiers in the 4 th sample are all identifiers corresponding to common categories, and then the leaf node is identified by using the identifiers corresponding to the low-risk categories.

S534, obtaining a coefficient group to calculate the coefficient GINI= (GINI) corresponding to each target attribute information according to all the samples corresponding to the current node ₁ ,gini ₂ ,...,gini _r ,...,gini _s )。

Wherein gini is _r And the coefficient is the coefficient of the kunit corresponding to the r-th target attribute information calculated based on all samples corresponding to the current node.

gini _r =1-∑ _var=1 ^f(var) (|c _var ^r |/|C|) ² The method comprises the steps of carrying out a first treatment on the surface of the Wherein, C is the sample number of all samples corresponding to the current node, and f (var) is the pairThe number of the remaining parameters obtained after the duplicate removal of the parameter corresponding to the r-th target attribute information in all samples corresponding to the current node, |c _var ^r And the I is the same number as the var-th residual parameter in the parameters corresponding to the r-th target attribute information in all samples corresponding to the current node.

For example, |c|=5, the parameters corresponding to the r-th target attribute information in all samples corresponding to the current node are 5, 10, 11 and 5, respectively, and then the 3 remaining parameters obtained by processing 5, 10, 11 and 5 are 5, 10 and 11, respectively, and f (var) =3, and since 2 of 5, 10, 11 and 5 have 2 of 5, 2 of 10 and 1 of 11, the value of|c is thus equal to ₁ ^r |=2，|c ₂ ^r |=2，|c ₃ ^r |=1。

S535, according to the GINI, obtaining the minimum radix factor GINI corresponding to the current node _min =min(gini ₁ ,gini ₂ ,...,gini _r ,...,gini _s ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein min () is a preset minimum value determination function.

S536 according to gini _min And determining part of all samples corresponding to the current node as a first sample and the other part as a second sample according to the corresponding target attribute information.

And S537, in the decision tree corresponding to the current group, generating a first child node and a second child node of the current node, taking each first sample as a sample corresponding to the first child node, and taking each second sample as a sample corresponding to the second child node.

S538, performing the child node generation processing on the first child node, and performing the child node generation processing on the second child node.

S539, determining whether each node in the decision tree corresponding to the current group is a leaf node or a node which generates a corresponding first child node and a second child node; if yes, outputting a decision tree corresponding to the current group.

Therefore, compared with the generation of the decision tree by adopting a multi-way tree generation algorithm such as a C4.5 algorithm or an ID3 algorithm and the like, the generation of the decision tree is performed by adopting the method for determining the minimum radix coefficient, and the generated decision tree is a binary tree, so that the tree structure can be simplified, and the generation efficiency of the decision tree is improved; in addition, in the invention, the sample division is performed by adopting the target attribute information corresponding to the minimum radix coefficient, and compared with the generation of the decision tree based on the multi-way tree generation algorithm such as the C4.5 algorithm or the ID3 algorithm, the decision tree generation in the invention does not need logarithmic calculation, thereby saving the calculation resources and further improving the efficiency of the decision tree generation.

Optionally, step S536 includes the steps of:

s5361, gini _min The corresponding target attribute information is used as the partition attribute information corresponding to the current node.

S5362, obtaining a threshold set E= (E) corresponding to the partition attribute information corresponding to the current node ₁ ,e ₂ ,...,e _u ,...,e _v )，u=1,2,...,v。

Wherein e _r E, the method is that the u threshold value corresponding to the segmentation attribute information corresponding to the current node is e ₁ ＞e ₂ ＞...＞e _u ＞...＞e _v The method comprises the steps of carrying out a first treatment on the surface of the v is the number of preset threshold values corresponding to the segmentation attribute information corresponding to the current node. The number of thresholds corresponding to at least part of the target attribute information is greater than 1.

S5363, determining whether a corresponding target node exists in the current node; if yes, determining the number num of the target nodes with the same corresponding segmentation attribute information as the segmentation attribute information corresponding to the current node in all the target nodes corresponding to the current node, and entering into step S5364; otherwise, num=0, and the process proceeds to step S5365.

When the current node is not a root node, each node and the root node between the current node and the root node in the decision tree corresponding to the current group are target nodes corresponding to the current node, and when the current node is the root node, the decision tree corresponding to the current group does not have the target nodes corresponding to the current node.

For example, if PAPO1 is the root node, one child node of PAPO1 is PAPO2, one child node of PAPO2 is PAPO3, one child node of PAPO3 is PAPO4, and PAPO4 is the current node, then the target nodes corresponding to PAPO4 are PAPO1, PAPO2, and PAPO2.

S5364, determining whether (num+1) is less than or equal to v; if yes, go to step S5365; otherwise, step S533 is entered.

S5365, for each sample corresponding to the current node, determining whether the target parameter in the sample is less than or equal to e _num+1 The method comprises the steps of carrying out a first treatment on the surface of the If yes, taking the sample as a first sample; otherwise, taking the sample as a second sample; the target parameter is a parameter corresponding to the segmentation attribute information corresponding to the current node in the corresponding sample.

S5366, determining each first sample as a sample corresponding to the first sub-node, and determining each second sample as a sample corresponding to the second sub-node.

Therefore, the number of the thresholds corresponding to at least part of the target attribute information in the invention can be multiple, and at least part of the nodes in the decision tree can generate the sub-nodes based on any one of the target attribute information and one of the thresholds corresponding to the target attribute information, and the sub-nodes of the nodes can still generate the sub-nodes based on the other one of the thresholds corresponding to the target attribute information. Compared with the method that the threshold value corresponding to each target attribute information is one, the method can generate the decision tree with larger depth based on the limited number of target attribute information, so that the classification effect of the classification model constructed based on the decision tree is better, and the purpose of improving the accuracy of determining the target category identification corresponding to the event to be executed is achieved.

Optionally, the recursive stopping condition is that candidate category identifiers in each sample corresponding to the current node are the same, or the depth of the current node in the decision tree corresponding to the current group reaches the sum of the numbers of thresholds corresponding to all the target attribute information, or the depth of the current node in the decision tree corresponding to the current group reaches the preset depth, or the number of samples corresponding to the current node is 0.

Specifically, the sum of the threshold values corresponding to all the target attribute information is obtained by summing the threshold value numbers corresponding to each target attribute information. The preset depth is 10 or more and 100 or less.

In a specific embodiment, H ¹ 、H ² Target attribute information corresponding to REL and P is filtered attribute information.

Based on this, the post-screening attribute information is determined by the following method:

s610, obtain parameter list w= (W ₁ ,w ₂ ,...,w _h1 ,...,w _Q1 )，w _h1 =(w _h1 ¹ ,w _h1 ² ,...,w _h1 ^h2 ,...,w _h1 ^Q2 )，h1=1,2,...,Q1，h2=1,2,...,Q2。

Wherein w is _h1 For the parameter group corresponding to the h1 candidate attribute information, Q1 is the number of candidate attribute information; w (w) _h1 ^h2 Is w _h1 In the method, Q2 is the number of second executed events, and Q1 is more than or equal to 4; each piece of filtered attribute information is any one of Q1 pieces of candidate attribute information, t _j The corresponding target attribute information is different from each candidate attribute information.

Specifically, the second executed event is an air freight flight corresponding to an air freight bill, and the take-off time of the flight corresponding to the second executed event is in time _now Before or at time _now . In addition to the filtered attribute information, the candidate attribute information may be a waybill number, a waybill source, an agent name, an agent code, a flight number, a flight date, or the like.

S620, if w _h1 All of the parameters in are the same or w _h1 Any two parameters are different, then W is deleted in W _h1 To obtain a deleted parameter list W '(W') ₁ ,w´ ₂ ,w´ ₃ ,w´ ₄ )。

Wherein w _z For the z-th deleted parameter set in W', z=1, 2,3,4; s is less than or equal to Q1.

S630, w _z The corresponding candidate attribute information is used as the attribute information after screening to obtain a filtered attribute information set ATT= (ATT) ₁ ,att ₂ ,att ₃ ,att ₄ )。

Wherein att is _z The z-th filtered attribute information in the ATT.

Therefore, the method and the device can reduce the possibility of determining the candidate attribute information with the same corresponding parameters and different corresponding arbitrary two parameters as the screened attribute information, further can enable the relevance between the parameters corresponding to the screened attribute information in F and the target category identifiers to be larger, and achieve the purpose of improving the accuracy of determining the target category identifiers corresponding to the events to be executed.

In another specific embodiment, H ¹ 、H ² Target attribute information corresponding to REL and P is filtered attribute information.

s640, obtain parameter list w= (W ₁ ,w ₂ ,...,w _h1 ,...,w _Q1 )，w _h1 =(w _h1 ¹ ,w _h1 ² ,...,w _h1 ^h2 ,...,w _h1 ^Q2 )，h1=1,2,...,Q1，h2=1,2,...,Q2。

Specifically, at least a portion of the first execution event is a second execution event, which is not limited in the embodiment of the present invention.

S650, for w _h1 Performing parameter deduplication treatment to obtain w _h1 Corresponding de-duplicated parameter set w' _h1 To obtain a de-duplicated parameter list W "= (W)" ₁ ,w" ₂ ,...,w" _h1 ,...,w" _Q1 )，w" _h1 =(w" _h1 ¹ ,w" _h1 ² ,...,w" _h1 ^h3 ,...,w" _h1 ^Q3(h1) )，h3=1,2,...,Q3(h1)。

Wherein w' _h1 ^h3 Is w' _h1 The h3 post-deduplication parameter of (2), Q3 (h 1) is w' _h1 Q3 (h 1) is less than or equal to Q2; w' _h1 Is different.

S670, if [ Q3 (h 1)]Q2 is smaller than a first preset value pre ₁ Or greater than a second preset value pre ₂ Then W is deleted in W' _h1 To obtain a deleted parameter list W '(W') ₁ ,w´ ₂ ,w´ ₃ ,w´ ₄ )。

Wherein, pre ₂ ＞pre ₁ ，w´ _z For the z-th deleted parameter set in W', z=1, 2,3,4.

S680, w _z The corresponding candidate attribute information is used as the attribute information after screening to obtain a filtered attribute information set ATT= (ATT) ₁ ,att ₂ ,att ₃ ,att ₄ ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein att is _z And the z-th filtered attribute information.

Therefore, the probability of determining the candidate attribute information with basically the same corresponding parameters and basically different corresponding two parameters as the screened attribute information can be reduced, and the relevance between the parameters corresponding to the screened attribute information in F and the target category identification is further increased, so that the purpose of improving the accuracy of determining the target category identification corresponding to the event to be executed is achieved.

In another specific embodiment, H ¹ 、H ² Target attribute information corresponding to REL and P is filtered attribute information;

based on this, the target attribute information is determined by the following method:

s710, obtain parameter list w= (W ₁ ,w ₂ ,...,w _h1 ,...,w _Q1 )，w _h1 =(w _h1 ¹ ,w _h1 ² ,...,w _h1 ^h2 ,...,w _h1 ^Q2 )，h1=1,2,...,Q1，h2=1,2,...,Q2。

Wherein w is _h1 For the parameter group corresponding to the h1 candidate attribute information, Q1 is the number of candidate attribute information;w _h1 ^h2 is w _h1 In the method, Q2 is the number of second executed events, and Q1 is more than or equal to s; each piece of filtered attribute information is any one of Q1 pieces of candidate attribute information, t _j The corresponding target attribute information is different from each candidate attribute information;

s720, obtaining w _h1 ^h2 Hash value ash of (a) _h1 ^h2 To obtain hash value list ash= (ASH ₁ ,ash ₂ ,...,ash _h1 ,...,ash _Q1 )，ash _h1 =(ash _h1 ¹ ,ash _h1 ² ,...,ash _h1 ^h2 ,...,ash _h1 ^Q2 )。

Wherein, ash _h1 Is the h1 hash value group in ASH.

S730, obtain priority group Pri= (PRI) ₁ ,pri ₂ ,...,pri _h1 ,...,pri _Q1 )。

Wherein pri _h1 PRI is the h1 st priority in PRI _h1 =[∑ _h2=1 ^Q2 (ash _h1 ^h2 -ash _ave ^h1 ) ² ]/Q2，ash _ave ^h1 As a priority reference factor, ash _ave ^h1 =[∑ _h2=1 ^Q2 (ash _h1 ^h2 )]/Q2。

S740, if pri _h1 Priority pri greater than a predetermined target ₀ Then corresponding W will be among W _h1 Deleting to obtain a deleted parameter list W '(W') ₁ ,w´ ₂ ,w´ ₃ ,w´ ₄ )。

Wherein w _z For the z-th deleted parameter set in W', z=1, 2,3,4.

S750, w _z The corresponding candidate attribute information is used as the attribute information after screening to obtain a filtered attribute information set ATT= (ATT) ₁ ,att ₂ ,att ₃ ,att ₄ )。

Wherein att is _z And the z-th filtered attribute information.

Therefore, compared with the two embodiments, the embodiment can reduce the possibility of larger parameter fluctuation of the screened attribute information corresponding to the different second executed events, further reduce the possibility of larger parameter fluctuation of the screened attribute information corresponding to the different first executed events, so that the fluctuation of the parameter corresponding to the screened attribute information in the training sample is smaller, the possibility of larger parameter difference of the screened attribute information in different training samples corresponding to the same candidate category identification is reduced, further reduce the possibility of inaccurate target category identification determined based on a classification model constructed by a plurality of training samples, and achieve the purpose of improving the accuracy of determining the target category identification corresponding to the event to be executed.

Optionally, W is a list of parameters subjected to data preprocessing, where the data preprocessing includes removing unique features, removing irrelevant features, converting a feature format, analyzing a missing value, analyzing an outlier, and/or normalizing data, which is not limited by the embodiment of the present invention.

Optionally, the voting mechanism corresponding to the classification model constructed based on the decision tree may be a simple voting mechanism, a minority-compliant majority, a threshold voting or a bayesian voting mechanism, and preferably, the voting mechanism corresponding to the classification model constructed based on the decision tree is a minority-compliant majority, and a soft voting mode is adopted.

Optionally, after step S504, the effect of the classification model may be evaluated, specifically, the model accuracy, the confusion matrix, the thermodynamic diagram, the F1 score, the error rate iteration curve, and/or the index weight calculation may be evaluated.

Embodiments of the present invention also provide a non-transitory computer readable storage medium that may be disposed in an electronic device to store at least one instruction or at least one program for implementing one of the methods embodiments, the at least one instruction or the at least one program being loaded and executed by the processor to implement the methods provided by the embodiments described above.

Embodiments of the present invention also provide an electronic device comprising a processor and the aforementioned non-transitory computer-readable storage medium.

Embodiments of the present invention also provide a computer program product comprising program code for causing an electronic device to carry out the steps of the method according to the various exemplary embodiments of the invention described in the present specification when the program product is run on the electronic device.

While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the present disclosure is defined by the appended claims.

Claims

1. A method for determining a target class identifier, the method comprising the steps of:

s100, obtaining a target object identification group A= (a) corresponding to the event to be executed ₁ ,a ₂ ,...,a _i ,...,a _n ) I=1, 2, n; wherein a is _i The ith object identifier corresponding to the event to be executed is identified, and n is the number of the object identifiers corresponding to the event to be executed;

S200, acquiring a file content data set b= (B) ₁ ,b ₂ ,...,b _j ,...,b _m ) J=1, 2, m; wherein b _j The file content data of the ith file, and m is the number of preset files; b _j Including at least one candidate identification;

s300, according to A or B, obtaining a target feature identification group T= (T) ₁ ,t ₂ ,...,t _j ,...,t _m ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein t is _j For the j-th target feature in T, T _j =1 or 0, t _j =1 for a ₁ 、a ₂ 、...、a _i 、...、a _n Any one of b _j ，t _j =0 for a _i Not belonging to b _j ；

S400, obtaining a feature vector F= (H) corresponding to the event to be executed ¹ ,H ² ,t ₁ ,t ₂ ,...,t _j ,...,t _m REL, P); wherein H is ¹ For the first characteristic identification, H ¹ =1 or 0, h ¹ =1 for indicating that the event type of the event to be executed is the first target type, H ¹ =0 to indicate that the event type of the event to be executed is not the first target type; h ² For the second characteristic mark, H ² =1 or 0, h ² =1 for indicating that the event type of the event to be executed is the second target type, H ² =0 to indicate that the event type of the event to be executed is not the second target type; REL is an execution identifier corresponding to the associated event of the event to be executed, rel=1 or 0, rel=1 is used for representing the time at the current time _now When the association event has been performed, rel=0 is used to indicate that at time _now The association event is not executed; p is an influence coefficient used for representing the influence degree of historical data corresponding to the initiator of the event to be executed on the determination of the target class identifier;

2. The determination method according to claim 1, wherein the classification model is obtained by:

s501, obtaining a training sample set D= (D) ₁ ,d ₂ ,...,d _x ,...,d _y ) X=1, 2,; wherein d _x The x training sample in D; y is the number of training samples in D; d, d _x =(d _x ¹ ,d _x ² ,...,d _x ^r ,...,d _x ^s ,N _x )，r=1,2,..,s；d _x ^r The parameter corresponding to the r target attribute information for the x first executed event; s is the number of the target attribute information, s=m+4; n (N) _x Is the x first alreadyExecuting candidate category identification corresponding to the event; h ¹ H, as the parameter of the 1 st target attribute information corresponding to the event to be executed ² For the parameter of the 2 nd target attribute information corresponding to the event to be executed, t _j REL is a parameter of the (m+1) -th target attribute information corresponding to the event to be executed, and P is a parameter of the (m+2) -th target attribute information corresponding to the event to be executed;

s502, performing q times of random selection of L training samples in the D, and taking each training sample randomly selected in the D for the kth time as a target training sample in the kth target training sample group to obtain a target training sample list SAM= (SAM) ₁ ,sam ₂ ,...,sam _k ,...,sam _q ),sam _k =(sam _k ¹ ,sam _k ² ,...,sam _k ^p ,...,sam _k ^L ) K=1, 2, q, p=1, 2, L; wherein, sam _k The k target training sample group in the SAM, q is the number of the target training sample groups in the SAM; sam _k ^p Is sam _k The p-th target training sample in (1), L is sam _k The number of target training samples in (1), L < y;

s503, for sam _k Performing decision tree generation processing to obtain a sam _k A corresponding decision tree;

s504, constructing the classification model based on a plurality of decision trees.

3. The determination method according to claim 2, wherein the decision tree generation process includes the steps of:

s510, taking a target training sample group for generating and processing the decision tree as a current group;

s520, generating a root node of the decision tree corresponding to the current group, and taking all target training samples in the current group as samples corresponding to the root node;

s530, performing child node generation processing on the root node;

the child node generation process includes:

s531, taking the node for generating and processing the child node as a current node;

s532, determining whether the current node meets the recursion stop condition of the decision tree corresponding to the current group; if yes, go to step S533; otherwise, step S534 is entered;

S533, determining the current node as a leaf node, marking the leaf node by the candidate category identification with the largest number in the candidate category identifications in all samples corresponding to the current node, and entering step S539;

s534, according to all samples corresponding to the current node, obtaining a group of coefficient of the Kerni to calculate a coefficient of the Kerni= (GINI) corresponding to each target attribute information ₁ ,gini ₂ ,...,gini _r ,...,gini _s ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein gini is _r The coefficient is a coefficient of the foundation corresponding to the r-th target attribute information calculated based on all samples corresponding to the current node;

s535, according to the GINI, obtaining the minimum radix factor GINI corresponding to the current node _min =min(gini ₁ ,gini ₂ ,...,gini _r ,...,gini _s ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein min () is a preset minimum value determining function;

s536 according to gini _min The corresponding target attribute information is used for determining part of all samples corresponding to the current node as a first sample and the other part as a second sample;

s537, in the decision tree corresponding to the current group, generating a first child node and a second child node of the current node, taking each first sample as a sample corresponding to the first child node, and taking each second sample as a sample corresponding to the second child node;

S538, performing the child node generation processing on the first child node, and performing the child node generation processing on the second child node;

4. A determination method according to claim 3, wherein said step S536 comprises the steps of:

s5361, gini _min The corresponding target attribute information is used as the segmentation attribute information corresponding to the current node;

s5362, obtaining a threshold set E= (E) corresponding to the segmentation attribute information corresponding to the current node ₁ ,e ₂ ,...,e _u ,...,e _v ) U=1, 2, v; wherein e _r E, as a u-th threshold value corresponding to the segmentation attribute information corresponding to the current node ₁ ＞e ₂ ＞...＞e _u ＞...＞e _v The method comprises the steps of carrying out a first treatment on the surface of the v is the number of preset threshold values corresponding to the segmentation attribute information corresponding to the current node; the number of the thresholds corresponding to at least part of the target attribute information is greater than 1;

s5363, determining whether the current node has a corresponding target node; if yes, determining the number num of the target nodes with the same corresponding segmentation attribute information as the segmentation attribute information corresponding to the current node in all the target nodes corresponding to the current node, and proceeding to step S5364; otherwise, num=0, and the process proceeds to step S5365; when the current node is not the root node, each node between the current node and the root node in the decision tree corresponding to the current group and the root node are target nodes corresponding to the current node, and when the current node is the root node, no target node corresponding to the current node exists in the decision tree corresponding to the current group;

S5364, determining whether (num+1) is less than or equal to v; if yes, go to step S5365; otherwise, step S533 is entered;

s5365, for each sample corresponding to the current node, determining whether the target parameter in the sample is less than or equal to e _num+1 The method comprises the steps of carrying out a first treatment on the surface of the If yes, taking the sample as a first sample; otherwise, taking the sample as a second sample; the target parameter is the corresponding sample and the target parameterParameters corresponding to the segmentation attribute information corresponding to the current node;

5. The method according to claim 4, wherein the recursive stopping condition is that candidate class identifiers in each sample corresponding to the current node are the same, or the depth of the current node in the decision tree corresponding to the current group reaches the sum of the numbers of thresholds corresponding to all the target attribute information, or the depth of the current node in the decision tree corresponding to the current group reaches a preset depth, or the number of samples corresponding to the current node is 0.

6. The method of determining according to claim 2, wherein H ¹ 、H ² Target attribute information corresponding to REL and P is filtered attribute information;

the attribute information after screening is determined by the following method:

s610, obtain parameter list w= (W ₁ ,w ₂ ,...,w _h1 ,...,w _Q1 )，w _h1 =(w _h1 ¹ ,w _h1 ² ,...,w _h1 ^h2 ,...,w _h1 ^Q2 ) H1=1, 2, Q1, h2=1, 2, Q2; wherein w is _h1 For the parameter group corresponding to the h1 candidate attribute information, Q1 is the number of candidate attribute information; w (w) _h1 ^h2 Is w _h1 In the method, Q2 is the number of second executed events, and Q1 is more than or equal to 4; each of the filtered attribute information is any one of Q1 candidate attribute information, t _j The corresponding target attribute information is different from each candidate attribute information;

s620, if w _h1 All of the parameters in are the same or w _h1 Any two parameters are different, then W is deleted in W _h1 To obtain a deleted parameter list W '(W') ₁ ,w´ ₂ ,w´ ₃ ,w´ ₄ ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein w _z For the z-th deleted parameter set in W', z=1, 2,3,4; s is less than or equal to Q1;

s630, w _z The corresponding candidate attribute information is used as the attribute information after screening to obtain a filtered attribute information set ATT= (ATT) ₁ ,att ₂ ,att ₃ ,att ₄ ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein att is _z The z-th filtered attribute information in the ATT.

7. The method of determining according to claim 2, wherein H ¹ 、H ² Target attribute information corresponding to REL and P is filtered attribute information;

s640, obtain parameter list w= (W ₁ ,w ₂ ,...,w _h1 ,...,w _Q1 )，w _h1 =(w _h1 ¹ ,w _h1 ² ,...,w _h1 ^h2 ,...,w _h1 ^Q2 ) H1=1, 2, Q1, h2=1, 2, Q2; wherein w is _h1 For the parameter group corresponding to the h1 candidate attribute information, Q1 is the number of candidate attribute information; w (w) _h1 ^h2 Is w _h1 In the method, Q2 is the number of second executed events, and Q1 is more than or equal to 4; each of the filtered attribute information is any one of Q1 candidate attribute information, t _j The corresponding target attribute information is different from each candidate attribute information;

s650, for w _h1 Performing parameter deduplication treatment to obtain w _h1 Corresponding de-duplicated parameter set w' _h1 To obtain a de-duplicated parameter list W "= (W)" ₁ ,w" ₂ ,...,w" _h1 ,...,w" _Q1 )，w" _h1 =(w" _h1 ¹ ,w" _h1 ² ,...,w" _h1 ^h3 ,...,w" _h1 ^Q3(h1) ) H3=1, 2,., Q3 (h 1); wherein w' _h1 ^h3 Is w' _h1 The h3 post-deduplication parameter of (2), Q3 (h 1) is w' _h1 The number of parameters after deduplication, Q3 (h 1)≤Q2；w" _h1 The parameters after any two de-duplication are different;

s670, if [ Q3 (h 1)]Q2 is smaller than a first preset value pre ₁ Or greater than a second preset value pre ₂ Then W is deleted in W' _h1 To obtain a deleted parameter list W '(W') ₁ ,w´ ₂ ,w´ ₃ ,w´ ₄ ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein, pre ₂ ＞pre ₁ ，w´ _z For the z-th deleted parameter set in W', z=1, 2,3,4;

8. The method of determining according to claim 2, wherein H ¹ 、H ² Target attribute information corresponding to REL and P is filtered attribute information;

the target attribute information is determined by the following method:

s710, obtain parameter list w= (W ₁ ,w ₂ ,...,w _h1 ,...,w _Q1 )，w _h1 =(w _h1 ¹ ,w _h1 ² ,...,w _h1 ^h2 ,...,w _h1 ^Q2 ) H1=1, 2, Q1, h2=1, 2, Q2; wherein w is _h1 For the parameter group corresponding to the h1 candidate attribute information, Q1 is the number of candidate attribute information; w (w) _h1 ^h2 Is w _h1 In the method, Q2 is the number of second executed events, and Q1 is more than or equal to s; each of the filtered attribute information is any one of Q1 candidate attribute information, t _j The corresponding target attribute information is different from each candidate attribute information;

s720, obtaining w _h1 ^h2 Hash value ash of (a) _h1 ^h2 To obtain hash value list ash= (ASH ₁ ,ash ₂ ,...,ash _h1 ,...,ash _Q1 )，ash _h1 =(ash _h1 ¹ ,ash _h1 ² ,...,ash _h1 ^h2 ,...,ash _h1 ^Q2 ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein, ash _h1 A 1 st hash value group in the ASH;

s730, obtain priority group Pri= (PRI) ₁ ,pri ₂ ,...,pri _h1 ,...,pri _Q1 ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein pri _h1 PRI is the h1 st priority in PRI _h1 =[∑ _h2=1 ^Q2 (ash _h1 ^h2 -ash _ave ^h1 ) ² ]/Q2，ash _ave ^h1 As a priority reference factor, ash _ave ^h1 =[∑ _h2=1 ^Q2 (ash _h1 ^h2 )]/Q2；

S740, if pri _h1 Priority pri greater than a predetermined target ₀ Then corresponding W will be among W _h1 Deleting to obtain a deleted parameter list W '(W') ₁ ,w´ ₂ ,w´ ₃ ,w´ ₄ ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein w _z For the z-th deleted parameter set in W', z=1, 2,3,4;

s750, w _z The corresponding candidate attribute information is used as the attribute information after screening to obtain a filtered attribute information set ATT= (ATT) ₁ ,att ₂ ,att ₃ ,att ₄ ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein att is _z And the z-th filtered attribute information.

9. A non-transitory computer readable storage medium having stored therein at least one instruction or at least one program, wherein the at least one instruction or the at least one program is loaded and executed by a processor to implement the determination method of any one of claims 1-8.

10. An electronic device comprising a processor and the non-transitory computer-readable storage medium of claim 9.