CN110059771B

CN110059771B - Interactive vehicle data classification method under ordering support

Info

Publication number: CN110059771B
Application number: CN201910386811.2A
Authority: CN
Inventors: 罗月童; 吴帅; 汪涛; 闵海; 周波
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2019-05-10
Filing date: 2019-05-10
Publication date: 2021-01-15
Anticipated expiration: 2039-05-10
Also published as: CN110059771A

Abstract

The invention discloses an interactive vehicle data classification method under the support of sequencing, which comprises the following steps: 1. obtaining a vehicle data training model; 2. acquiring vehicle recommendation data; 3. optimizing a vehicle classification model and a vehicle sequencing model; 4. evaluating the quality of the vehicle classification model; 5. and observing whether the vehicle classification result is satisfactory or not through related parameters in the interface, if so, accepting the result, and otherwise, returning to the step 1 to continue iteration. The invention can solve the problem that the vehicle classification has no high-quality training sample and the boundary points between the vehicle classification classes are difficult to distinguish, thereby optimizing a vehicle classification model and realizing the accurate classification of the vehicle data set to be classified.

Description

Interactive vehicle data classification method under ordering support

Technical Field

The invention relates to the field of interactive classification, in particular to a vehicle classification interactive task with a sequential relationship among vehicle categories.

Background

In the big data era, classification is one of the most fundamental data analysis techniques and data tasks. Although many automatic vehicle data classification methods have been proposed, none of them is suitable for all application scenarios, and the automatic vehicle classification method in the form of "black box system" also affects its interpretability and credibility, especially for high-dimensional complex data such as images, videos, etc. One of the reasons that affect interpretability and reliability is: users tend to interpret information and measure acquaintance according to high-level features, and the automatic classification algorithm relies on the bottom-level features for classification, so that a semantic gap is formed. Multiple scholars think that users are allowed to participate in the classification process, the domain knowledge of the users can be integrated into the classification algorithm, and the interpretability and the reliability of the classification are improved.

Vehicle classification algorithms based on machine learning, particularly vehicle classification algorithms based on deep learning, have been excellent in many fields in recent years and have become mainstream vehicle classification algorithms. The vehicle classification algorithm based on machine learning learns and constructs vehicle classification rules from training samples, and because the vehicle training samples are generally made by users according to own domain knowledge, the vehicle training samples can be considered to contain the domain knowledge for classification, so that the domain knowledge is indirectly merged into the vehicle classification algorithm through the training algorithm, and the semantic gap problem can be partially overcome. However, the vehicle classification algorithm based on machine learning needs enough high-quality vehicle training samples, a large number of high-quality vehicle training samples are not easily available, and the existing many exploratory vehicle data analyses have no vehicle training samples, so that the user cannot be helped to obtain the high-quality vehicle training samples, and the vehicle classification effect is poor. In addition, it is difficult to distinguish the boundary points between the vehicle classification categories and classify them.

Disclosure of Invention

The invention provides an interactive vehicle data classification method under the support of sequencing, aiming at solving the problems of the prior art, and aiming at solving the problems that the vehicle classification has no high-quality training sample and the boundary points among the vehicle classification classes are difficult to distinguish, so that a vehicle classification model is optimized and the accurate classification of a vehicle data set to be classified is realized.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention relates to an interactive vehicle data classification method under the support of sequencing, which is characterized by comprising the following steps:

step 1, obtaining a vehicle data training model:

step 1.1, acquiring n pieces of vehicle data from a vehicle data set S to be classified, and setting the vehicle safety category of the ith piece of vehicle data as p_iAnd p is_i∈{L₁,L₂,…,L_h,…,L_HIn which, { L }₁,L₂,…,L_h,…,L_HDenotes a set of vehicle safety categories, L_hIndicates the h-th vehicle safety category, and let the h-th vehicle safety category L_hIs higher than the h +1 th vehicle safety class L_h+1(ii) a Thus, n pieces of vehicle data with vehicle safety classification are obtained and are marked as { g₁,g₂,…,g_i,…,g_n}，g_iIndicating the ith vehicle data with vehicle safety categories, and H indicating the total number of the vehicle safety categories; i is less than or equal to n and H is less than or equal to n;

step 1.2, inputting the n vehicle data with the vehicle safety categories into a support vector machine for training to obtain an SVM vehicle classification model M_cInputting a vehicle data set S to be classified into the SVM vehicle classification model M_cIn (2), the classification result of the obtained vehicle data set S is denoted as C ═ { C ═ C₁,c₂,…,c_j…,c_NIn which c is_jRepresenting the jth vehicle data g in the vehicle data set S_jThe classification result of (2); n is the total number of the vehicle data in the vehicle data set S; so as to obtain a vehicle data set S', 1 is equal to or more than j is equal to or less than N with a classification result;

step 1.3, obtaining M pairs of vehicle data from the vehicle data set S' with the classification result, comparing the priority of the vehicle safety category of any M-th pair of vehicle data, and marking the obtained comparison result as Q_mObtaining a priority set of M pairs of vehicle data consisting of M comparison results; m is more than or equal to 1 and less than or equal to M;

step 1.4, inputting the priority set of the M pairs of vehicle data into a Ranking-SVM for training to obtain a Ranking-SVM vehicle Ranking model M_rInputting the vehicle data set S' with the classification result into the Ranking-SVM vehicle Ranking model M_rIn (D), the ranking result of the vehicle data set S' is recorded as D ═ D₁,d₂,…,d_j,…,d_N}，d_jRepresenting the jth vehicle data g in the vehicle data set S_jThe sorting result of (1);

sorting the sorting result D in a descending order to obtain a sorting result D ' ═ D ' after the descending order '₁,d′₂,…,d′_j,…,d′_NWherein, d'_jRepresents the jth vehicle data g ' in the sorting result D ' after descending '_jSorting results after descending;

step 2, obtaining vehicle recommendation data:

step 2.1, judging two adjacent sequencing results D ' in the sequencing results D ' after descending '_jAnd d'_j+1If the priority order of the corresponding vehicle data in the classification result C is larger than that of the corresponding vehicle data in the classification result C, the classification result of the corresponding two vehicle data is normal, otherwise, the classification result of the corresponding two vehicle data is in conflict; thus, all the vehicle data pairs with conflicting classification results form a vehicle recommendation data set;

step 2.2, selecting the ranking corresponding to the vehicle data with the top rank in each pair of vehicle data in the vehicle recommendation data set as the ranking of the corresponding vehicle data pair, thereby obtaining a vehicle recommendation data ranking set, and recording as T ═ T { (T)₁,t₂,…,t_x,…,t_X}；t_xRepresenting the ranking of the X-th vehicle data pair, wherein X is more than or equal to 1 and less than or equal to X; x represents the total number of vehicle recommendation data pairs;

step 2.3, based on the assumption of the aggregation effect of the boundary points, obtaining the ranking priority P of the x-th vehicle data pair by using the formula (3)_riority(t_x)：

In the formula (3), t_yRepresenting the ranking of the y-th vehicle data pair, wherein y is more than or equal to 1 and less than or equal to X, and y is not equal to X;

step 3, optimizing a vehicle classification model and a vehicle sequencing model:

step 3.1, defining the iteration times as b; and initializing b as 1;

step 3.2, obtaining Z except the n pieces of vehicle data in the vehicle recommendation data set_bIndividual vehicle data, and setting the vehicle safety class of the z-th vehicle data as p_z，p_z∈{L₁,L₂,…,L_h,…,L_H}，1≤z≤Z_b(ii) a And merging the data with the n vehicle data with the vehicle safety categories into a data category set of a b-th iteration

Wherein

Set of data classes S representing the b-th iteration_bKth vehicle data with vehicle safety category; k is more than or equal to 1 and less than or equal to n + Z_b；

Step 3.3, collecting the data category S of the b-th iteration_bInputting the data into a support vector machine for training to obtain a b-th iteration SVM vehicle classification model

Inputting the vehicle data set S to be classified into the SVM vehicle classification model of the b-th iteration

In (1), the classification result of the vehicle data set S of the b-th iteration is recorded as

J-th vehicle data g in the vehicle data set S representing the b-th iteration_jThe classification result of (2); thereby obtaining a vehicle data set S 'with classification results of the b-th iteration'_b；

Step 3.4, obtaining the Y of the b-th iteration from the vehicle recommendation data set_bComparing the vehicle data and the priorities of the vehicle safety types of any y-th pair of vehicle data to obtain a comparison result Q_yThereby obtaining Y_bThe priority set of the b-th iteration formed by the comparison results; y is more than or equal to 1 and less than or equal to Y_b(ii) a And combined with the priority set of the M pairs of vehicle data into an overall priority set for the b-th iterationCombining;

step 3.5, inputting the total priority set of the b-th iteration into a Ranking-SVM for training to obtain a Ranking-SVM vehicle Ranking model of the b-th iteration

Inputting a vehicle data set S to be classified into the b-th iteration Ranking-SVM vehicle Ranking model

In (1), the ordering result of the vehicle data set S of the b-th iteration is recorded as

J-th vehicle data g in the vehicle data set S representing the b-th iteration_jThe sorting result of (1);

step 3.6, sequencing result D of the b-th iteration_bSorting in descending order to obtain sorting result after sorting in descending order

Wherein the content of the first and second substances,

representing the sorting result D 'of the b-th iteration after descending'_bJ-th vehicle data g'_jThe sorting result of (1);

step 4, evaluating the quality of the vehicle classification model:

step 4.1, judging the descending sorting result D 'of the b-th iteration'_bTwo adjacent sorting results

And

the number of vehicles with classification results of the corresponding vehicle data in the b-th iterationData set S'_bIf the former is larger than the latter, the classification result of the corresponding two vehicle data of the b-th iteration is normal, otherwise, the classification result of the corresponding two vehicle data of the b-th iteration is in conflict; obtaining a vehicle recommendation data set of the b-th iteration formed by all the vehicle data pairs with conflicting classification results of the b-th iteration;

step 4.2, selecting the rank corresponding to the vehicle data with the top rank in each pair of vehicle data in the vehicle recommendation data set of the b-th iteration as the rank of the corresponding vehicle data pair of the b-th iteration, thereby obtaining the vehicle recommendation data rank set of the b-th iteration, and recording the rank set as the vehicle recommendation data rank set of the b-th iteration

Represents the rank of the X 'th vehicle data pair of the b-th iteration, X' is more than or equal to 1 and less than or equal to X_b；X_bRepresenting a total number of vehicle recommendation data pairs for the b-th iteration;

step 4.3, obtaining the consistency P of the b-th iteration by using the formula (2)_bTo evaluate the SVM vehicle classification model of the b-th iteration

In the formula (2), P_b∈[0,1]；

Step 5, if P_bIf < Δ, then b +1 is assigned to b, n + Z_bAssigning n to M + Y_bAnd (3) after assigning the value to the M, returning to the step 3.2 for sequential execution, otherwise, finishing the optimization of the SVM vehicle classification model and realizing the optimal classification of the vehicle data set S to be classified.

Compared with the prior art, the invention has the beneficial effects that:

1. in some vehicle classification applications, there is a sequential relationship between vehicle classes, and the relative sequential relationship between vehicle data is easily perceived. Aiming at the vehicle classification application scene, the method improves the interactive vehicle classification method by means of the cognition of the user on the relative sequence relation among the vehicle data, thereby providing the interactive vehicle classification method supported by sequencing; by using the method, the user can mark the categories of the vehicle data as few as possible, and the sequence information among the data provided by the sequencing model is utilized, so that invalid vehicle data marks are reduced, and the marking efficiency of the vehicle data is greatly improved.

2. The invention further provides a recommendation method based on vehicle data pairs for recommending the vehicle candidate marking data. By adopting the method, the vehicle data with problems can be guaranteed to be recommended certainly, and the scale controllability of the vehicle recommended data is guaranteed.

3. The invention further optimizes the sequence of the vehicle recommendation data; at the initial stage of the recommendation method, the number of vehicle recommendation data pairs is relatively large, and each vehicle recommendation data is judged to generate great burden on a user; therefore, the method based on the aggregation degree of the candidate points is adopted to evaluate each vehicle data pair, the priority of the candidate points with lower aggregation degree is improved, and therefore the user is assisted to make corresponding decisions.

4. The invention also provides a set of new scheme for evaluating the quality of the vehicle classification result. In order to reduce the burden of the user, the user is assisted to decide whether to continuously optimize the model; according to the method, the evaluation strategy of the model consistency is adopted, so that a user has a corresponding basis for judging the quality of the current vehicle classification result, the user can more conveniently understand the result of the model, the interpretability and the reliability of the model are greatly improved, and the trained model is more easily accepted by the user.

Drawings

FIG. 1 is a flow chart of the inventive method;

FIG. 2 is a schematic diagram of a vehicle classification result quality metric view layout strategy according to the present invention.

Detailed Description

In this embodiment, an interactive vehicle data classification method under sequencing support is suitable for a vehicle classification problem with a priority sequence relationship, emphasizes and supports a sequence relationship between vehicle data input by a user and uses the sequence relationship to optimize an interactive classification process, based on this process, not only can category information of the vehicle data be set, but also a relative sequence relationship of the vehicle data can be set, and a vehicle data classification model and a vehicle data sequencing model are compared, so that an interactive classification effect is improved by adjusting the data sequence relationship, and display and evaluation of the vehicle classification model are obtained. Specifically, as shown in fig. 1, the method comprises the following steps:

step 1, obtaining a vehicle data training model:

step 1.2, for the classification model, because the support vector machine SVM is a small sample learning method and the training and predicting speeds are fast, it is often adopted by the interactive classification method: on one hand, because interactive classification is difficult to require a user to label a large number of samples for training; another aspect is that interactive systems have high speed requirements. The invention also adopts a classification model based on the SVM, inputs n vehicle data with vehicle safety categories into a support vector machine for training to obtain an SVM vehicle classification model M_cTo be classifiedThe vehicle data set S is input into an SVM vehicle classification model M_cIn (2), the classification result of the obtained vehicle data set S is denoted as C ═ { C ═ C₁,c₂,…,c_j…,c_NIn which c is_jRepresenting the jth vehicle data g in the vehicle data set S_jThe classification result of (2); n is the total number of the vehicle data in the vehicle data set S; so as to obtain a vehicle data set S', 1 is equal to or more than j is equal to or less than N with a classification result;

step 1.3, obtaining M pairs of vehicle data from the vehicle data set S' with the classification result, comparing the priority of the vehicle safety class of any M-th pair of vehicle data, and marking the obtained comparison result as Q_mObtaining a priority set of M pairs of vehicle data consisting of M comparison results; m is more than or equal to 1 and less than or equal to M;

and step 1.4, for the sequencing model, the Ranking-SVM intuitively applies the classic two-classification SVM model to the sequencing problem. Inputting the priority set of the M pairs of vehicle data into a Ranking-SVM for training to obtain a Ranking-SVM vehicle Ranking model M_rInputting the vehicle data set S' with the classification result into a Ranking-SVM vehicle Ranking model M_rIn (D), the ranking result of the vehicle data set S' is recorded as D ═ D₁,d₂,…,d_j,…,d_N}，d_jRepresenting the jth vehicle data g in the vehicle data set S_jThe sorting result of (1);

step 2, obtaining vehicle recommendation data:

and 2.1, if the sequence relation between the data categories obtained through the classification model is inconsistent with the sequence relation of the adjacent data obtained through the sequencing model, the current model can be considered to not process data well, and the data can be considered as vehicle recommendation data. Judging two adjacent sequencing results D ' in the sequencing result D ' after descending '_jAnd d'_j+1If the priority order of the corresponding vehicle data in the classification result C is larger than that of the corresponding vehicle data in the classification result C, the classification result of the corresponding two vehicle data is normal, otherwise, the classification result of the corresponding two vehicle data is in conflict; thus, all the vehicle data pairs with conflicting classification results form a vehicle recommendation data set;

and 2.3, based on the 'boundary point aggregation effect hypothesis', the data far away from the boundary can be processed preferentially, so that the data can be processed rapidly, and the model is iterated. But does not know the true classification boundary, how to measure the distance of the data point to the boundary? According to the "assumption of the aggregation effect of boundary points", if a candidate annotation data is around a boundary, there should be many other candidate annotation data nearby. Based on the recognition, the invention measures the distance from the candidate annotation data to the boundary by using the aggregation degree of the candidate annotation data, and obtains the priority P of the x-th vehicle data pair ranking by using the formula (3)_riority(t_x)：

In the formula (3), t_yIndicating the ranking of the y-th vehicle data pair, y is more than or equal to 1 and less than or equal to X, and y is not equal to X. P_riority(t_x) Indicating that the farther any data in a data pair is from the classification boundary, the higher the ranking priority of that data pair. The candidate data set displays all candidate data point pairs from high to low in priority. For example, the problem of carrying out three classifications of 'high-risk vehicles, medium-risk vehicles and low-risk vehicles' is solved. According to the formula (4), the low-risk vehicle-medium-risk vehicle and the low-risk vehicle-high-risk vehicleThe effects of coincidence on P are the same, and in practice their effects do differ. Candidate data points with confusion between different classes are thus presented by a triangular array of (m-1) × (m-1) as shown in FIG. 2, with the abscissa of the triangular array from left to right (c)₁,c₂,...c_m-1) The ordinate is (c) from top to bottom₂,c₃,...c_m) Display of c in the (i, j) th region_iAnd c_jWith confusing candidate points appearing in between.

Step 3, optimizing the classification model and the sequencing model:

and 3.1, optimizing the model by further processing the recommended data in the recommended data set. Defining the iteration times as b; and initializing b as 1;

step 3.2, obtaining Z except n pieces of vehicle data in the vehicle recommendation data set_bIndividual vehicle data, and setting the vehicle safety class of the z-th vehicle data as p_z，p_z∈{L₁,L₂,…,L_h,…,L_H}，1≤z≤Z_b(ii) a And merging the data with n vehicle data with vehicle safety categories into a data category set of a b-th iteration

Wherein

Step 3.3, the data category of the b-th iteration is collected S_bInputting the data into a support vector machine for training to obtain a b-th iteration SVM vehicle classification model

Inputting a vehicle data set S to be classified into the SVM vehicle classification model of the b-th iteration

Obtaining the vehicle data of the b-th iterationThe classification result of the set S is recorded as

Step 3.4, obtaining the Y of the b-th iteration from the vehicle recommendation data set_bComparing the vehicle data and the priorities of the vehicle safety types of any y-th pair of vehicle data to obtain a comparison result Q_yThereby obtaining Y_bThe priority set of the b-th iteration formed by the comparison results; y is more than or equal to 1 and less than or equal to Y_b(ii) a And combined with the priority set of M pairs of vehicle data into an overall priority set for the b-th iteration;

Inputting a vehicle data set S to be classified into a b-th iteration Ranking-SVM vehicle Ranking model

Wherein the content of the first and second substances,

step 4, evaluating the quality of the classification model:

the interactive data classification method is an interactive drive iterative process, so the current classification result is evaluated through observed information, and further whether continuous iteration is needed or not is determined. Although the classification result can be observed, it is difficult to make an intuitive sense of the quality of the classification result.

And

vehicle data set S 'with classification result of b-th iteration of corresponding vehicle data'_bIf the former is larger than the latter, the classification result of the corresponding two vehicle data of the b-th iteration is normal, otherwise, the classification result of the corresponding two vehicle data of the b-th iteration is in conflict; obtaining a vehicle recommendation data set of the b-th iteration formed by all the vehicle data pairs with conflicting classification results of the b-th iteration;

In the formula (2), P_b∈[0,1]Degree of agreement of model P_bIndicating the degree of coincidence between the classification result and the ranking result, P_bThe larger the classification result, the higher the quality of the classification result, and the better the classification model.

Claims

1. A method for interactive vehicle data classification with sequencing support, comprising the steps of:

step 1, obtaining a vehicle data training model:

step 2, obtaining vehicle recommendation data:

step 2.1, judging two adjacent sequencing results in the sequencing results D' after descendingd′_jAnd d'_j+1If the priority order of the corresponding vehicle data in the classification result C is larger than that of the corresponding vehicle data in the classification result C, the classification result of the corresponding two vehicle data is normal, otherwise, the classification result of the corresponding two vehicle data is in conflict; thus, all the vehicle data pairs with conflicting classification results form a vehicle recommendation data set;

step 3.1, defining the iteration times as b; and initializing b as 1;

Wherein

Step 3.4, obtaining the Y of the b-th iteration from the vehicle recommendation data set_bComparing the vehicle data and the priorities of the vehicle safety types of any y-th pair of vehicle data to obtain a comparison result Q_yThereby obtaining Y_bThe priority set of the b-th iteration formed by the comparison results; y is more than or equal to 1 and less than or equal to Y_b(ii) a And combined with the priority set of the M pairs of vehicle data into an overall priority set for the b-th iteration;

Wherein the content of the first and second substances,

step 4, evaluating the quality of the vehicle classification model:

And

vehicle data set S 'with classification result of corresponding vehicle data at the b-th iteration'_bIf the former is larger than the latter, the classification result of the corresponding two vehicle data of the b-th iteration is normal, otherwise, the classification result of the corresponding two vehicles of the b-th iteration is normalThe classification results of the data conflict; obtaining a vehicle recommendation data set of the b-th iteration formed by all the vehicle data pairs with conflicting classification results of the b-th iteration;

In the formula (2), P_b∈[0,1]；