CN109189876B

CN109189876B - Data processing method and device

Info

Publication number: CN109189876B
Application number: CN201811010287.0A
Authority: CN
Inventors: 刘均; 张小琼
Original assignee: Shenzhen Launch Technology Co Ltd
Current assignee: Shenzhen Launch Technology Co Ltd
Priority date: 2018-08-31
Filing date: 2018-08-31
Publication date: 2021-09-10
Anticipated expiration: 2038-08-31
Also published as: CN109189876A

Abstract

The embodiment of the invention discloses a data processing method and a related device. Acquiring m pieces of position information of vehicle diagnosis equipment, wherein the m pieces of position information are self geographical position information acquired by the vehicle diagnosis equipment during vehicle diagnosis, and m is an integer greater than 2; classifying the m position information according to a clustering algorithm to obtain a K position information set; determining a first set with the highest reliability in the K position information sets according to a preset reliability condition, wherein K is an integer which is greater than 1 and less than N; and determining the regional position of the vehicle service shop where the vehicle diagnosis equipment is located according to the position information in the first set. The technical scheme of the embodiment of the invention is beneficial to enhancing the credibility of the positioning position result, reducing the calculation complexity and being easy to process a large amount of data.

Description

Data processing method and device

Technical Field

The invention relates to the field of data analysis, in particular to a data processing method and device of vehicle diagnosis equipment.

Background

The vehicle diagnosis equipment is used as a vehicle networking terminal and is an important source of vehicle networking data in the big data era. Although the vehicle diagnosis apparatus is a separate large apparatus, its location is varied due to the flexibility of use of the vehicle diagnosis apparatus. The acquired position information of the vehicle diagnostic equipment reflects the characteristic that the position of the vehicle diagnostic equipment changes, so the position of the vehicle diagnostic equipment is often unfixed. In practical applications, the application scenario of the vehicle diagnostic device is mostly a vehicle maintenance factory, and therefore the geographic location of the maintenance factory can be determined by the location information of the vehicle diagnostic device, but in doing so, the location information of the relatively fixed vehicle diagnostic device needs to be determined first.

Regarding the extraction strategy of the position information of the vehicle diagnosis device, the most direct method at present is to extract one or two pieces of position information from the acquired position information of a plurality of devices by random sampling, and obtain the position information of the vehicle diagnosis device through analysis. However, the result of such processing has low reliability.

Disclosure of Invention

The embodiment of the invention provides a data processing method, which can enhance the credibility of the obtained position information of the vehicle diagnosis equipment, reduce the calculation complexity and facilitate the processing of a large amount of data.

In a first aspect, an embodiment of the present invention provides a data processing method, including:

acquiring m pieces of position information of vehicle diagnosis equipment, wherein the m pieces of position information are self geographical position information acquired by the vehicle diagnosis equipment during vehicle diagnosis, and m is an integer greater than 2; classifying the m position information according to a clustering algorithm to obtain a K position information set; determining a first set with the highest reliability in the K position information sets according to a preset reliability condition, wherein K is an integer which is greater than 1 and less than N; and determining the regional position of the vehicle service shop where the vehicle diagnosis equipment is located according to the position information in the first set.

By implementing the embodiment of the invention, the acquired position information of a large number of vehicle diagnosis devices is classified according to a clustering algorithm, the positioning precision is improved, then the first set with the highest reliability is screened out according to the preset reliability condition, the influence of individual wrong position information is reduced, and finally the region position of the vehicle maintenance plant where the vehicle diagnosis devices are located is determined. The method can enhance the credibility of the result of the position of the positioned area, reduce the calculation complexity and facilitate the processing of a large amount of data.

In a possible implementation manner, the classifying the m position information according to a clustering algorithm to obtain K position information sets includes:

selecting K clustering center points from a sample set formed by the m position information; wherein the set of samples is { x⁽¹⁾，x⁽²⁾，...，x^(m)}，x⁽ⁱ⁾∈Rⁿ，x⁽ⁱ⁾Position information of an ith sample in the sample set, i ═ 1, 2.·, m; the K cluster center point structuresThe set of the formed central points is { mu⁽¹⁾，μ⁽²⁾，...，μ^(K)}，μ^(j)∈Rⁿ，μ^(j)Position information of a jth clustering center point of the set of center points, j being 1, 2. RⁿIs n-dimensional vector space, and n is an integer greater than or equal to 1; according to formula c^(j)＝arg min_j||x⁽ⁱ⁾-μ^(j)||²Calculate the ith sample x⁽ⁱ⁾The cluster c to which it belongs^(j)(ii) a Wherein, | | x⁽ⁱ⁾-μ^(j)||²Is x⁽ⁱ⁾To mu^(j)Square of Euclidean distance of (d), arg min_j||x⁽ⁱ⁾-μ^(j)||²Is when mu^(j)When the cluster center point is reached, let to μ^(j)All arguments x for taking the minimum of the sum of squares of the euclidean distances of⁽ⁱ⁾A set of (a); according to the formula

Updating each of the cluster center points mu^(j)Up to a distortion function

Converging; wherein, c^(j)Is a cluster of a class j,

location information x for said cluster of classes j⁽ⁱ⁾，

For the sum of the characteristics of all location information in the class cluster j,

to count the number of location information in the class cluster j,

the Euclidean distance square sum of each position information of the class cluster j to the cluster center point of the class cluster j is obtained; when the distortion function is converged, the obtained K clusters correspond to the distortion functionK sets of location information.

In a possible implementation manner, the first set with the highest reliability in the K sets of location information is c^(j)And the cluster center point of the first set is mu^(j)J is K, K is more than 0 and less than or equal to K, and K is an integer; the determining the regional position of the vehicle service shop where the vehicle diagnosis device is located according to the position information in the first set includes: determining the first set c^(j)And a cluster center point μ of said first set^(j)Then, using the cluster center point mu^(j)Determining position information in a circular area in the first set, wherein the circular area is a round point and has a radius of a preset numerical value; and determining the area position of the vehicle maintenance factory according to the position information in the circular area.

In a possible implementation manner, the determining, according to a preset reliability condition, a first set with the highest reliability in the K sets of location information includes: calculating the ratio of the quantity of the position information in each position information set to m; and determining the position information set corresponding to the maximum ratio as the first set with the highest credibility.

In one possible implementation manner, after determining the regional location of the vehicle maintenance factory to which the vehicle diagnosis device belongs, the method further includes: and acquiring vehicle related data acquired by the vehicle-mounted diagnosis equipment at a position corresponding to the position information in the first set.

step 1: selecting K clustering center points from a sample set formed by the m position information; selecting some clusters from the obtained position data and randomly initializing the respective clustering center points of the clusters, thereby presetting K as the number of the clustering center points and also representing the number of the clusters. Wherein the set of samples is { x⁽¹⁾，x⁽²⁾，...，x^(m)}，x⁽ⁱ⁾∈Rⁿ，x⁽ⁱ⁾For the ith sample in the set of samplesPosition information, i ═ 1, 2.., m; the central point set formed by the K cluster central points is { mu⁽¹⁾，μ⁽²⁾，...，μ^(K)}，μ^(j)∈Rⁿ，μ^(j)Position information of a jth clustering center point of the set of center points, j being 1, 2. RⁿIs n-dimensional vector space, and n is an integer greater than or equal to 1;

step 2: according to formula c^(j)＝arg min_j||x⁽ⁱ⁾-μ^(j)||²Calculate the ith sample x⁽ⁱ⁾The cluster to which it belongs

Wherein, | | x⁽ⁱ⁾-μ^(j)||²Is x⁽ⁱ⁾To mu^(j)Square of Euclidean distance of (d), arg min_j||x⁽ⁱ⁾- μ (j)2 is all arguments x such that the sum of squares of euclidean distances to μ (j) takes a minimum value when μ (j) is the cluster center point⁽ⁱ⁾A set of (a);

and step 3: according to the formula

Calculating each of the cluster center points

Wherein, c^(j)Is a cluster of a class j,

location information x for said cluster of classes j⁽ⁱ⁾，

counting the number of position information in the class cluster j;

representing a result total name of the clustering center point after the clustering center point is calculated for the first time; by analogy, the following references

The same applies to all meanings of the above.

And 4, step 4: determining distortion function

Whether to converge;

and 5: according to formula c^(j)＝arg min_j||x⁽ⁱ⁾-μ^(j)||²Calculate the ith sample x⁽ⁱ⁾The cluster to which it belongs

Step 6: according to the formula

Updating each of the cluster center points

And 7: determining distortion function

Whether to converge;

and 8: according to formula c^(j)＝arg min_j||x⁽ⁱ⁾-μ^(j)||²Calculate the ith sample x⁽ⁱ⁾The cluster to which it belongs

And step 9: according to the formula

Updating each of the cluster center points

Step 10: determining distortion function

Whether to converge;

step 11: according to formula c^(j)＝arg min_j||x⁽ⁱ⁾-μ^(j)||²Calculate the ith sample x⁽ⁱ⁾The cluster to which it belongs

Step 12: according to the formula

Updating each of the cluster center points

Step 13: determining distortion function

Whether to converge;

……

step W: determining distortion function

Whether to converge; function of distortion

When converging, K clustering center points mu are obtained^(j)And the K cluster center points mu^(j)Respectively corresponding cluster class c^(j)That is, the obtained K cluster types correspond to the K position information sets; wherein the content of the first and second substances,

for said cluster jAnd W is the last step executed by the clustering algorithm when the distortion function is converged.

In one possible implementation manner, after determining the regional location of the vehicle maintenance factory to which the vehicle diagnosis device belongs, the method further includes: and acquiring vehicle related data acquired by the vehicle diagnosis equipment at a position corresponding to the position information in the first set.

In a second aspect, an embodiment of the present invention provides a data processing apparatus, including: the device comprises a first acquisition unit, a classification unit, a screening unit, a determination unit and a second acquisition unit; the first obtaining unit is used for obtaining m pieces of position information of the vehicle diagnosis equipment by the background server, wherein the m pieces of position information are self geographical position information acquired by the vehicle diagnosis equipment during vehicle diagnosis, and m is an integer greater than 2; the classification unit is used for classifying the m position information according to a clustering algorithm to obtain a K position information set; the screening unit is used for determining a first set with the highest reliability in the K position information sets according to a preset reliability condition, wherein K is an integer which is greater than 1 and smaller than N; the determining unit is used for determining the region position of the vehicle maintenance factory where the vehicle diagnostic equipment is located according to the position information in the first set. The second obtaining unit is configured to obtain vehicle-related data collected at a position corresponding to the position information of the vehicle-mounted diagnostic device in the first set after the determining unit determines the area position of the vehicle maintenance plant to which the vehicle diagnostic device belongs.

By implementing the embodiment of the invention, the acquired position information of a large number of vehicle diagnosis devices is classified according to a clustering algorithm, the positioning precision is improved, then the first set with the highest reliability is screened out according to the preset reliability condition, the influence of individual wrong position information is reduced, and finally the region position of the vehicle maintenance plant where the vehicle diagnosis devices are located is determined. The invention can enhance the credibility of the regional position result, reduce the calculation complexity and is easy to process a large amount of data.

In a possible implementation manner, the classification unit is specifically configured to: selecting K clustering center points from a sample set formed by the m position information; wherein the set of samples is { x⁽¹⁾，x⁽²⁾，...，x^(m)}，x⁽ⁱ⁾∈Rⁿ，x⁽ⁱ⁾Position information of an ith sample in the sample set, i ═ 1, 2.·, m; the central point set formed by the K cluster central points is { mu⁽¹⁾，μ⁽²⁾，...，μ^(K)}，μ^(j)∈Rⁿ，μ^(j)Position information of a jth clustering center point of the set of center points, j being 1, 2. RⁿIs n-dimensional vector space, and n is an integer greater than or equal to 1; according to formula c^(j)＝arg min_j||x⁽ⁱ⁾-μ^(j)||²Calculate the ith sample x⁽ⁱ⁾The cluster c to which it belongs^(j)(ii) a Wherein, | | x⁽ⁱ⁾-μ^(j)||²Is x⁽ⁱ⁾To mu^(j)Square of Euclidean distance of (d), arg min_j||x⁽ⁱ⁾-μ^(j)||²Is when mu^(j)When the cluster center point is reached, let to μ^(j)All arguments x for taking the minimum of the sum of squares of the euclidean distances of⁽ⁱ⁾A set of (a); according to the formula

Updating each of the cluster center points mu^(j)Up to a distortion function

Converging; wherein, c^(j)Is a cluster of a class j,

location information x for said cluster of classes j⁽ⁱ⁾，

to count the number of location information in the class cluster j,

the Euclidean distance square sum of each position information of the class cluster j to the cluster center point of the class cluster j is obtained; and when the distortion function is converged, the obtained K clusters correspond to the K position information sets.

In a possible implementation manner, the screening unit is specifically configured to: c is the first set with the highest credibility in the K position information sets^(j)And the cluster center point of the first set is mu^(j)J is K, K is more than 0 and less than or equal to K, and K is an integer; the determining unit is specifically configured to: determining the first set c^(j)And a cluster center point μ of said first set^(j)Then, using the cluster center point mu^(j)A circular region with a predetermined radius and a circular pointA field determining location information in the first set within the circular region; and determining the area position of the vehicle maintenance factory according to the position information in the circular area.

In a possible implementation manner, the screening unit is specifically configured to: calculating the ratio of the quantity of the position information in each position information set to m; and determining the position information set corresponding to the maximum ratio as the first set with the highest credibility.

In one possible implementation manner, the apparatus further includes: and the second acquisition unit is used for acquiring vehicle related data acquired by the vehicle-mounted diagnosis equipment at a position corresponding to the position information in the first set after the determination unit determines the region position of the vehicle maintenance plant to which the vehicle diagnosis equipment belongs.

In a possible implementation manner, the classification unit is specifically configured to:

step 1: selecting K clustering center points from a sample set formed by the m position information; selecting some clusters from the acquired position data and randomly initializing respective cluster center points of the clusters. The length of the clustering center point is the same as that of each position data vector, so that K is preset as the number of the clustering center points and also represents the number of the clusters. Wherein the set of samples is { x⁽¹⁾，x⁽²⁾，...，x^(m)}，x⁽ⁱ⁾∈Rⁿ，x⁽ⁱ⁾Position information of an ith sample in the sample set, i ═ 1, 2.·, m; the central point set formed by the K cluster central points is { mu⁽¹⁾，μ⁽²⁾，...，μ^(K)}，μ^(j)∈Rⁿ，μ^(j)Position information of a jth clustering center point of the set of center points, j being 1, 2. RⁿIs n-dimensional vector space, and n is an integer greater than or equal to 1;

and step 3: according to the formula

Calculating each of the cluster center points

Wherein, c^(j)Is a cluster of a class j,

location information x for said cluster of classes j⁽ⁱ⁾，

counting the number of position information in the class cluster j;

The meaning of the expression is the same.

And 4, step 4: determining distortion function

Whether to converge;

and 5: according to the formula

Calculate the ith sample x⁽ⁱ⁾The cluster to which it belongs

Step 6: according to the formula

Updating each of the cluster center points

And 7: determining distortion function

Whether to converge;

And step 9: according to the formula

Updating each of the cluster center points

Step 10: determining distortion function

Whether to converge;

Step 12: according to the formula

Updating each of the cluster center points

Step 13: determining distortion function

Whether to converge;

……

step W: determining distortion function

Whether to converge; function of distortion

and W is the final step executed by the clustering algorithm when the distortion function is converged, wherein W is the Euclidean distance square sum of each position information of the class cluster j to the clustering center point of the class cluster j.

In a possible implementation manner, the screening unit is specifically configured to: c is the first set with the highest credibility in the K position information sets^(j)And the cluster center point of the first set is mu^(j)J is K, K is more than 0 and less than or equal to K, and K is an integer; the determining unit is specifically configured to: determining the first set c^(j)And a cluster center point μ of said first set^(j)Then, using the cluster center point mu^(j)Determining position information in a circular area in the first set, wherein the circular area is a round point and has a radius of a preset numerical value; and determining the area position of the vehicle maintenance factory according to the position information in the circular area.

In one possible implementation manner, the apparatus further includes: and the second acquisition unit is used for acquiring the vehicle related data acquired by the vehicle diagnosis equipment at the position corresponding to the position information in the first set after the determination unit determines the region position of the vehicle maintenance factory to which the vehicle diagnosis equipment belongs.

In the embodiment of the invention, the acquired position information of a large number of vehicle diagnosis devices is classified according to a clustering algorithm, the positioning precision is improved, then a first set with the highest reliability is screened out according to a preset reliability condition, the influence of individual wrong position information is reduced, and finally the regional position of a vehicle maintenance factory where the vehicle diagnosis devices are located is determined by effectively utilizing the target position information in the set in the first set according to a certain rule. The method and the device can enhance the credibility of the regional position result, reduce the calculation complexity and facilitate the processing of a large amount of data.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments of the present invention or the prior art will be briefly introduced below, it is obvious that the drawings and the attached tables in the following description are only some embodiments of the present invention, and other drawings and attached tables can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a system architecture diagram of data processing for a vehicle diagnostic device provided by an embodiment of the present invention;

FIG. 2 is an interactive schematic diagram of a vehicle diagnostic device and a server provided by an embodiment of the invention;

FIG. 3 is a table for arranging and presenting m pieces of location information provided by an embodiment of the present invention;

FIGS. 4-15 are schematic diagrams illustrating classification results of m location information data points according to a K-means algorithm according to an embodiment of the present invention; fig. 4 is a data point distribution diagram of m pieces of position information according to an embodiment of the present invention; FIG. 5 shows the data of FIG. 4 passing through the ith sample x⁽ⁱ⁾The cluster to which it belongs

And each of said cluster center points

Calculating a classification result graph; fig. 15 is a final classification result diagram of m position information through a K-means clustering algorithm according to the embodiment of the present invention;

FIG. 16 is a diagram illustrating the number of position information in the position information set and the ratio of the number to m, which is counted according to FIG. 15, according to an embodiment of the present invention;

FIG. 17 is a schematic illustration of the determination of vehicle service plant area locations in the first set of the screen of FIG. 16 in accordance with an embodiment of the present invention;

fig. 18 is a schematic structural diagram of a data processing device of a vehicle diagnostic apparatus according to an embodiment of the present invention;

fig. 19 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," and "fourth," etc. in the description and claims of the invention and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments. The technical scheme of the embodiment of the invention can be applied to the fields of data processing, cluster analysis and the like. When the fields and the scenes of the application of the method and the device are different, the names of specific equipment and places in the embodiment of the invention are also different.

First, some terms in the present invention are explained to facilitate understanding by those skilled in the art.

(1) The vehicle related data is data collected in the vehicle maintenance factory after being determined by the clustering algorithm, the collected data is subsequently processed, the purpose of analyzing the scale, the operation condition and other contents of the vehicle maintenance factory to which the vehicle diagnosis equipment belongs is achieved, and the analysis result is fed back to the vehicle maintenance factory so that the vehicle maintenance factory can scientifically adjust various conditions of the vehicle maintenance factory.

(2) The vehicle diagnostic device is a generic term for various devices commonly used in a vehicle service facility, such as a diagnostic device and a service device.

(3) Euclidean distance (euclidean metric), also known as the euclidean metric, is a commonly used definition of distance, which refers to the true distance between two points in an N-dimensional space, or the natural length of a vector (i.e., the distance of the point from the origin). In the invention, the position data points are divided into the cluster where the cluster center point with the shortest Euclidean distance to the data points is located. For convenience of calculation, the square of the Euclidean distance from each position information data point to the central point of the cluster is used as a division standard.

(4) The K-Medians algorithm, a variation of K-Means, uses the median of the data set to calculate the cluster center of the class cluster.

(5) The mean shift (mean shift) clustering algorithm is based on a sliding window to find dense regions of data points. The mean shift clustering algorithm is a centroid-based algorithm that locates the center point of each cluster by updating the candidate point of the center point to the mean of the points within the sliding window. And then carrying out similar window removal processing on the candidate window to finally form a center point set and a corresponding grouping cluster.

(6) The maximum Expectation (EM) clustering algorithm using the Gaussian mixture (GMM) model assumes that data points are Gaussian distributed, and the corresponding K-Means assumes that the data point distribution is circular, and K-Means is a special case of GMMs, where the clusters appear circular when the variance is close to 0 in all dimensions. The gaussian distribution (ellipse) gives more possibilities, and the shape of the cluster can be described by two parameters: mean and standard deviation. The cluster can be an ellipse of any shape with standard deviation in both x and y directions. Thus, each gaussian distribution is assigned to a single class cluster. Before clustering, a maximum Expectation (EM) optimization algorithm is first adopted. The mean and standard deviation of the data set were found.

(7) Spectral Clustering (Spectral Clustering) is a graph theory-based Clustering method, and is used for Clustering feature vectors of a Laplace matrix of sample data so as to cluster the sample data. The meaning of the spectrum is illustrated as follows: for example, matrix a, the totality of all its eigenvalues being collectively referred to as the spectrum of a. Most of the spectrum-related algorithms are algorithms related to feature values. And the spectral radius is the largest eigenvalue of all eigenvalues.

(8) The Dbscan Clustering algorithm (Density-Based Spatial Clustering of Applications with Noise) is a relatively representative Density-Based Clustering algorithm that defines class clusters as the largest set of Density-connected points, can divide regions with sufficiently high Density into class clusters, and can find clusters of arbitrary shapes in noisy Spatial databases.

(9) Hierarchical clustering algorithms (hierarchical methods) recursively merge or split data objects until some termination condition is satisfied. According to the decomposition mode of the hierarchy, the method can be divided into split hierarchical clustering from top to bottom and agglomeration hierarchical clustering from bottom to top.

Referring to fig. 1, fig. 1 is a schematic diagram of a system architecture for processing data of a vehicle diagnostic device, and as shown in fig. 1, the data processing method of a vehicle diagnostic device according to the present invention can be applied to the system architecture. The system architecture comprises the vehicle, the vehicle diagnosis equipment, the background server and the like. The icon of the background server, which represents the background server, may be composed of a plurality of servers. The vehicle 1, the vehicle 2, and the vehicle 12 each represent a vehicle that is diagnosed at a certain position, and are numbered only for distinction. The vehicle diagnosis device 1-a position, the vehicle diagnosis device 1-B position, the vehicle diagnosis device 1-C position, and the vehicle diagnosis device 1-D position respectively represent position information of the vehicle diagnosis device 1 belonging to the service shop 1 at A, B, C, D of the 4 positions. By analogy, the meaning of the names of the vehicle diagnostic device 2-a position, the vehicular diagnostic device 3-D position can be known. The service shop 1 and the corresponding circular area, the service shop 2 and the corresponding circular area, and the service shop 3 and the corresponding circular area respectively represent the area locations of the vehicle service shop to which the vehicle diagnostic apparatus 1, the vehicle diagnostic apparatus 2, and the vehicle diagnostic apparatus 3 belong. The position information of the vehicle diagnosis device can be uploaded to the background server through a transmission mode such as a network. The contents of the vehicle, the vehicle diagnosis equipment, the maintenance factory, the background server and the like cannot be exhaustively listed, so for convenience of image description, the embodiment of the invention only lists a certain number, but does not represent the number of the used vehicles in practical application. Since the use positions of the vehicle diagnostic equipment are mostly in a service shop, most of the vehicle diagnostic equipment icons are concentrated in a circular area of the service shop; the device icon that is not within the circular area indicates that the vehicle diagnostic device is not within the service shop to which it belongs when the device is used.

It is to be understood that the system architecture of fig. 1 is only an exemplary implementation of the embodiments of the present invention. The system architecture in the embodiments of the present invention may include, but is not limited to, the above system architecture.

The technical problem proposed in the present invention is specifically analyzed and solved by combining the above system architecture and the data processing embodiments provided in the present invention.

Referring to fig. 2, fig. 2 is an interaction schematic diagram of the vehicle diagnostic device and the backend server, and the following description will be given with reference to fig. 2 from an interaction side of the vehicle diagnostic device and the backend server, where an embodiment of the method mainly takes a K-Means algorithm as an example for description, and may specifically include steps S201 to S204. Optionally, step S205 may be further included. Wherein step S202 provides possible implementations of other clustering algorithms.

Step S201: acquiring m pieces of position information of vehicle diagnosis equipment, wherein the m pieces of position information are self geographical position information acquired by the vehicle diagnosis equipment during vehicle diagnosis, and m is an integer greater than 2;

specifically, any one of the m pieces of location information contains at least longitude and latitude information. Any one of the position information may be stored in the form of a spatial n-dimensional vector or in the form of an ordered pair of real numbers. The form of the acquired location information may include: acquiring the position information of the single device transmitted back by the vehicle diagnosis device; alternatively, a set of location information of the plurality of devices themselves transmitted back by the vehicle diagnosis device is acquired. In the application and calculation process of the algorithm, different storage forms can have different details of data processing methods, and the core application of the algorithm is not influenced.

Step S202: classifying the m position information according to a clustering algorithm to obtain a K position information set;

specifically, before classifying the m position information according to a clustering algorithm, in order to facilitate displaying classification results classified according to different clustering algorithms, the m position information may be preprocessed, where the preprocessing manner may include: arranging and presenting the m pieces of location information in a table form, please refer to fig. 3, where fig. 3 is a table for arranging and presenting the m pieces of location information provided by the embodiment of the present invention; alternatively, the m pieces of location information are arranged and presented in the form of an image, please refer to fig. 4, where fig. 4 is a data point distribution diagram of the m pieces of location information provided by the embodiment of the present invention. In the embodiment of the present invention, a method of sorting m pieces of position information in the form of an image is taken as an example. And then classifying the m position information according to a clustering algorithm to obtain a K position information set. The category of the clustering algorithm is selected according to the type of data and the purpose of clustering.

The main clustering algorithms can be divided into the following: partitional clustering, hierarchical clustering, density-based clustering, fuzzy clustering. In each class of methods, there are widely used algorithms, such as: a K-means clustering algorithm in the division clustering, an agglomeration type hierarchical clustering algorithm in the hierarchical clustering, and the like. It should be noted that the research on the clustering problem is not limited to hard clustering such as the K-means algorithm, i.e. each data can only be classified into one type. Fuzzy clustering is also a branch of cluster analysis which is widely studied. Fuzzy clustering determines the degree of membership of each data to each cluster through a membership function, rather than rigidly classifying a data object into a cluster, such as the FCM algorithm. Therefore, the implementation mode of the invention is not limited to the analysis of data by using a certain algorithm, but one or more algorithms or modes can be adopted according to the specific characteristics of the data, and the calculation simplicity and the reliability of the conclusion are sought. For example, it is possible to empirically sample the samples based on their distribution, then perform hierarchical clustering over a small sample range, and then perform K-means clustering using the K values obtained from the hierarchical clustering applied to the entire sample. In the embodiment of the present invention, in combination with the characteristics of the location information in the embodiment of the present invention, the following 7 algorithms are taken as examples, and the location information is classified respectively.

In a possible implementation manner, classifying the m position information according to a K-Means algorithm to obtain a K position information set, please refer to fig. 5-15, which may include: selecting K clustering center points from a sample set formed by the m position information; wherein the set of samples is { x⁽¹⁾，x⁽²⁾，...，x^(m)}，x⁽ⁱ⁾∈Rⁿ，x⁽ⁱ⁾Position information of an ith sample in the sample set, i ═ 1, 2.·, m; the central point set formed by the K cluster central points is { mu⁽¹⁾，μ⁽²⁾，...，μ^(K)}，μ^(j)∈Rⁿ，μ^(j)Position information of a jth clustering center point of the set of center points, j being 1, 2. RⁿIs n-dimensional vector space, and n is an integer greater than or equal to 1; according to formula c^(j)＝arg min_j||x⁽ⁱ⁾-μ^(j)||²Calculate the ith sample x⁽ⁱ⁾The cluster c to which it belongs^(j)(ii) a Wherein, | | x⁽ⁱ⁾-μ^(j)||²Is x⁽ⁱ⁾To mu^(j)Square of Euclidean distance of (d), arg min_j||x⁽ⁱ⁾-μ^(j)||²Is when mu^(j)When the cluster center point is reached, let to μ^(j)All arguments x for taking the minimum of the sum of squares of the euclidean distances of⁽ⁱ⁾A set of (a); according to the formula

Updating each of the cluster center points mu^(j)Up to a distortion function

Converging; wherein, c^(j)Is a cluster of a class j,

location information x for said cluster of classes j⁽ⁱ⁾，

to count the number of location information in the class cluster j,

The K-Means clustering algorithm provided by the embodiment of the invention has the advantages of simple and convenient calculation, quick and simple algorithm, higher efficiency on large data sets, and suitability for mining large-scale data sets and processing data-intensive clusters.

In a possible implementation manner, classifying the m position information according to a K-Medians algorithm to obtain a K position information set may include the following 4 steps: 1) the sample set of position information samples is { x }⁽¹⁾，x⁽²⁾，...，x^(m)}，x⁽ⁱ⁾∈Rⁿ(ii) a And selecting K cluster center points from the sample set. 2) According to formula c^(j)＝arg min_j||x⁽ⁱ⁾-μ^(j)||²Calculating the class c to which each sample i belongs^(j)Namely, the category with the minimum Euclidean distance from the position information sample to the cluster center point of the cluster to which the position information sample belongs. 3) Calculating the median of each cluster, and determining the center μ of each cluster^(j). 4) And continuously repeating the steps 2) and 3) until stable K clusters and cluster center points corresponding to the K clusters are determined.

The K-Medians clustering algorithm provided by the embodiment of the invention has the advantages that the central point is calculated by using the median of the data, and the calculation result is prevented from being influenced by abnormal data.

In a possible implementation manner, classifying the m position information according to a mean shift clustering algorithm to obtain K position information sets may include the following 4 steps: 1) and determining the radius r of the sliding window, and starting sliding by using a circular sliding window with the radius r of the randomly selected cluster center point C. Moving to a more dense region in each iteration until convergence. 2) Each time a new region is slid, the mean value within the sliding window is calculated as the center point, and the number of points within the sliding window is the density within the window. In each movement, the window moves towards a more dense region. 3) Moving the window, calculating the center point within the window and the density within the window until there is no direction to accommodate more points within the window, i.e., moving until the density within the circle no longer increases. 4) And 3), generating a plurality of sliding windows, reserving the window containing the most points when the sliding windows are overlapped, clustering according to the sliding window where the data points are positioned, and finally forming a stable window, namely a central point set and a corresponding grouping cluster.

The mean shift clustering algorithm provided by the embodiment of the invention has the advantages that the number of the clusters does not need to be known, and the number of the clusters can be automatically classified through the calculation of the algorithm; during the calculation process, the clustering centers are gathered towards the maximum point density and are less influenced by the data mean.

In a possible implementation, classifying the m position information according to a maximum Expectation (EM) clustering algorithm using a gaussian mixture (GMM) model to obtain K position information sets may include the following 4 steps: 1) the number of class clusters is selected and the gaussian distribution parameters (mean and variance) for each class cluster are initialized randomly. It is also possible to first give a relatively accurate mean and variance from the data. 2) Given the Gaussian distribution of each class cluster, the probability of each data point belonging to each class cluster is calculated. The closer a point is to the center of the gaussian distribution, the more likely it belongs to the cluster. 3) Calculating the gaussian distribution parameters based on these probabilities maximizes the probability of a data point, and these new parameters can be calculated using a weighting of the probability of a data point, which is the probability that the data point belongs to the class of clusters. 4) Iterations 2) and 3) are repeated until the variation in the iterations is not large.

The maximum Expectation (EM) clustering algorithm using a Gaussian mixture (GMM) model provided in the embodiment of the present invention has an advantage in that, since GMMs use mean and standard deviation, identifiable cluster-like shapes may be elliptical, not limited to circular; because GMMs use probabilities, a data point can belong to multiple clusters, improving the accuracy of the calculation.

In a possible implementation manner, classifying the m position information according to a spectral clustering algorithm to obtain a K position information set may include the following 4 steps: 1) the a sample and the b sample are similar in measurement, namely Gaussian similarity

Wherein sigma is a hyper-parameter, a is more than or equal to 1 and less than or equal to m, b is more than or equal to 1 and less than or equal to m, a is not equal to b, and a and b are integers. 2) Forming a similarity matrix W ═ S_abm x m, symmetric matrix, wherein S_aaIt should be equal to 1, but for the convenience of calculation, 0 is written, so the similarity matrix becomes a symmetric matrix with 0 on the main diagonal. 3) Calculating the sum d of the similarity of the a-th sample to all other samples_a＝S_a1+S_a2+…+S_a(m-1)(with respect to S)_aSome of them use only the first K S' S to form K-class_aAdding or setting a threshold value, truncating S less than the threshold value_a) (ii) a In graph theory d_aCalled degree, can be understood as the weight of the connecting edge. Degree d of all points_aA degree matrix D (diagonal matrix) is formed. 4) Forming a laplacian matrix L ═ D-W, L being a symmetric semi-positive definite matrix, the minimum eigenvalue being 0, the corresponding eigenvectors being all 1 vectors. Arranging the eigenvalues of L from small to large, lambda₁...λ_mCorresponding feature vector u₁...u_mIf the clustering into K classes is required, the eigenvectors corresponding to the first K eigenvalues are taken to form a matrix U_mK, such that the feature corresponding to the first sample is considered to be u₁₁，u₁₂，...，u_1KThe second sample is characterized by u₂₁，u₂₂，...，u_2KThe characteristics of the m-th sampleIs u_m1，u_m2，...，u_mKAnd performing K mean value on the m samples, and finally, obtaining the clustering result of the m samples, namely the clustering result of the original position information.

The spectral clustering algorithm provided by the embodiment of the invention has the advantages that the characteristic vectors of the Laplace matrix of the sample data are clustered based on the graph theory, so that the clustering of the sample data is realized; when the data sample distribution exhibits non-spherical shapes, it can be identified and processed.

In a possible implementation manner, classifying the m position information according to a Dbscan clustering algorithm to obtain a K position information set may include the following 2 steps: 1) firstly, determining the radius r and minPoints; starting from an arbitrary data point which has not been visited, and taking the point as the center, whether the number of points contained in a circle with r as the radius is greater than or equal to minPoints or not is judged, if so, the point is marked as central point, otherwise, the point is marked as noise point. 2) Repeating the step 1), if a noise point exists in a circle with a radius of a certain central point, marking the point as an edge point, and otherwise, still indicating the noise point. Repeat step 1) knowing that all points have been visited.

The Dbscan clustering algorithm provided by the embodiment of the invention has the advantages that the density-based algorithm does not need to know the number of the class clusters, and the number of the class clusters of the data samples to be processed can be automatically obtained through calculation.

In a possible implementation manner, classifying the m position information according to the aggregation hierarchical clustering in the hierarchical clustering algorithm to obtain a K position information set, may include the following 3 steps: 1) each data point is treated as a single cluster, then a metric is selected that measures the distance between two clusters, the distances between all individuals are calculated, and the two samples that are closest together are found. 2) And taking the small group as a new individual, calculating the distance between all the individuals and the rest individuals, finding out two individuals closest to each other to form a group, and repeating the steps. 3) Finally, until the desired number of clusters is obtained.

The hierarchical clustering algorithm provided by the embodiment of the invention has the advantages that the hierarchical clustering structure on different granularities can be obtained by setting different relevant parameter values; in terms of cluster shape, hierarchical clustering is applicable to clustering of arbitrary shapes and is insensitive to the input order of samples. The selection of the distance measurement standard is not sensitive, the number of the class clusters does not need to be known, and the number of the class clusters can be divided subjectively.

Step S203: determining a first set with the highest reliability in the K position information sets according to a preset reliability condition, wherein K is an integer which is greater than 1 and less than N;

specifically, the preset credibility condition may include that the number of location information in the location information set is used as a condition for judging whether the location information set is credible. The determining of whether the position information set is authentic by using the number of position information in the position information set as a condition for determining whether the position information set is authentic may specifically include the following three conditions: 1. calculating the ratio of the number of the position information in each position information set to m; and determining the position information set corresponding to the maximum ratio as the first set with the highest credibility. 2. Calculating the ratio of the number of the position information in each position information set to m; and determining a position information set corresponding to the ratio above a preset value as the first set with the highest reliability. 3. And comparing the quantity of the position information in each position information set, and then sorting the position information sets according to the sequence from the big quantity to the small quantity, wherein the corresponding position information set with the largest quantity is taken as the first set with the highest credibility. Wherein, the setting of the preset value may include: the method is set by the user according to the practical experience of the user; or the server sets a default numerical value corresponding to the data characteristic according to the data characteristic of the acquired position information. The reliability conditions in the steps of the method are applicable to the clustering results of the 7 clustering algorithms exemplified in step S202.

In one possible implementation manner, the ratio of the number of the position information in each position information set to m is calculated; determining the position information set corresponding to the maximum ratio as the first set with the highest credibility. Referring to fig. 16, fig. 16 shows the amount of position information in the position information sets and the ratio of the amount of position information to m counted according to fig. 15, as shown in fig. 16, the ratio is the ratio of the amount of position information to m in each position information set, the magnitude of the ratio is compared, and the ratio is determined as

The position information set of (2) is the first set with the highest credibility; wherein the value of m is the sum of the number of the position information in each position information set.

Step S204: and determining the regional position of the vehicle service shop where the vehicle diagnosis equipment is located according to the position information in the first set.

Specifically, the first set is a first set c with the highest credibility in the K position information sets^(j)The cluster center point of the first set is mu^(j)K is more than 0 and less than or equal to K, and K is an integer.

In one possible implementation, the first set c is determined^(j)And a cluster center point μ of said first set^(j)Then, using the cluster center point mu^(j)Determining position information in a circular area in the first set, wherein the circular area is a round point and has a radius of a preset numerical value; and determining the area position of the vehicle maintenance factory according to the position information in the circular area. Referring to fig. 17, fig. 17 is a schematic diagram of determining locations of areas of a vehicle service facility in the first set of the screen of fig. 16 according to the embodiment of the present invention, and as shown in fig. 17, solid circular areas in the first set of the dashed oval areas are determined according to the operations in the possible implementation manners described above; then, the area location of the vehicle service shop is determined from the location information within the solid line circular area. The preset value of the radius can be set according to the experience of the user or other reasonable modes.

Step S205: and acquiring vehicle related data acquired by the vehicle-mounted diagnosis equipment at a position corresponding to the position information in the first set.

Specifically, the vehicle-related data may include the following 3 cases: 1. the number of vehicles diagnosed by the vehicle diagnostic device over a period of time; 2. frequency of vehicle diagnostic device usage over a period of time; 3. and counting vehicle fault problems detected by the vehicle diagnosis equipment. Step S205 is to process the vehicle related data in the vehicle service shop after step S204 is executed. Step S205 is an optional step in the embodiment of the present invention.

As mentioned in step S202, fig. 3 is a table for arranging and presenting m pieces of location information, as shown in fig. 3, where the location information x is provided by the embodiment of the present invention⁽ⁱ⁾Expressed with respect to longitude

Latitude

Ordered real pairs of

The embodiment of the invention provides a method for sorting and displaying m pieces of position information in a table; wherein i is more than 0 and less than or equal to m, and i is an integer.

Fig. 4 is a data point distribution diagram of m pieces of position information provided by the embodiment of the present invention, and as shown in fig. 4, scales marked on the horizontal axis and the vertical axis are used only for explaining the embodiment of the present invention, and the scales marked in specific use are selected according to the obtained specific data characteristics. Position information x in the figure⁽ⁱ⁾Data points expressed as two-dimensional vectors, K ═ 3, the dots represent location information data points, and the cross points represent cluster center points of the respective clusters. Due to the position information x⁽ⁱ⁾The storage form and the arrangement mode of the data processing system are various and are not limited to the combination for processing the data information.

FIG. 5 shows the data of FIG. 4 passing through the ith sample x⁽ⁱ⁾The cluster to which it belongs

And each of said cluster center points

The calculated classification result map of (1). By analogy, fig. 6-14 are graphs of classification results of location information data points after each cycle step of the K-means algorithm; fig. 15 shows a final classification result of the m position information obtained by the K-means clustering algorithm according to the embodiment of the present invention, where as shown in fig. 15, the position of the clustering center is different from the initial position thereof, and is a determination result obtained by reasonable calculation.

According to the embodiment of the invention, the acquired position information of a large number of vehicle diagnosis devices is classified according to a clustering algorithm, the positioning precision is improved, then a first set with the highest reliability is screened out according to a preset reliability condition, the influence of individual wrong position information is reduced, and finally, the target position information in the set is effectively utilized in the first set according to a certain rule to determine the region position of a vehicle maintenance factory where the vehicle diagnosis devices are located. The method and the device can enhance the credibility of the regional position result, reduce the calculation complexity and facilitate the processing of a large amount of data.

The method of the embodiments of the present invention is explained in detail above, and the related apparatus of the embodiments of the present invention is provided below. The embodiment of the apparatus is also mainly described by taking a K-Means algorithm as an example, wherein the application of the classification unit refers to possible implementation manners of other clustering algorithms.

Referring to fig. 18, fig. 18 is a schematic structural diagram of a vehicle diagnostic device data processing apparatus according to an embodiment of the present invention, where the vehicle diagnostic device data processing apparatus 18 may include: a first obtaining unit 1801, a classification unit 1802, a filtering unit 1803, a determination unit 1804, and a second obtaining unit 1805. Wherein the content of the first and second substances,

a first acquisition unit 1801 for acquiring m pieces of positional information of the vehicular diagnostic apparatus;

a classification unit 1802, configured to classify the m position information according to a clustering algorithm, so as to obtain a K position information set;

in a possible implementation, a classification unit 1802 for classifying the m bits according to the K-Means algorithmThe information is classified to obtain K position information sets, as shown in fig. 5-15, which may include: selecting K clustering center points from a sample set formed by the m position information; wherein the set of samples is { x⁽¹⁾，x⁽²⁾，...，x^(m)}，x⁽ⁱ⁾∈Rⁿ，x⁽ⁱ⁾Position information of an ith sample in the sample set, i ═ 1, 2.·, m; the central point set formed by the K cluster central points is { mu⁽¹⁾，μ⁽²⁾，...，μ^(K)}，μ^(j)∈Rⁿ，μ^(j)Position information of a jth clustering center point of the set of center points, j being 1, 2. RⁿIs n-dimensional vector space, and n is an integer greater than or equal to 1; according to formula c^(j)＝arg min_j||x⁽ⁱ⁾-μ^(j)||²Calculate the ith sample x⁽ⁱ⁾The cluster c to which it belongs^(j)(ii) a Wherein, | | x⁽ⁱ⁾-μ^(j)||²Is x⁽ⁱ⁾To mu^(j)Square of Euclidean distance of (d), arg min_j||x⁽ⁱ⁾-μ^(j)||²Is when mu^(j)When the cluster center point is reached, let to μ^(j)All arguments x for taking the minimum of the sum of squares of the euclidean distances of⁽ⁱ⁾A set of (a); according to the formula

Updating each of the cluster center points mu^(j)Up to a distortion function

Converging; wherein, c^(j)Is a cluster of a class j,

location information x for said cluster of classes j⁽ⁱ⁾，

to count the number of location information in the class cluster j,

In a possible implementation manner, the classifying unit 1802, configured to classify the m position information according to a K-Medians algorithm, to obtain a K position information set, may include: 1) the sample set of position information samples is { x }⁽¹⁾，x⁽²⁾，...，x^(m)}，x⁽ⁱ⁾∈Rⁿ(ii) a And selecting K cluster center points from the sample set. 2) According to formula c^(j)＝arg min_j||x⁽ⁱ⁾-μ^(j)||²Calculating the class c to which each sample i belongs^(j)Namely, the category with the minimum Euclidean distance from the position information sample to the cluster center point of the cluster to which the position information sample belongs. 3) Calculating the median of each cluster, and determining the center μ of each cluster^(j). 4) And continuously repeating the steps 2) and 3) until stable K clusters and cluster center points corresponding to the K clusters are determined.

In a possible implementation manner, the classifying unit 1802 is configured to classify the m position information according to a mean shift clustering algorithm to obtain K position information sets, and may include: 1) determining the radius r of a sliding window, and starting sliding by using a circular sliding window with the radius r of a randomly selected clustering center point C; moving to a more dense region in each iteration until convergence. 2) Each time sliding is carried out to a new area, calculating the average value in the sliding window as a central point, wherein the number of points in the sliding window is the density in the window; in each movement, the window moves towards a more dense region. 3) Moving the window, calculating the center point within the window and the density within the window until there is no direction to accommodate more points within the window, i.e., moving until the density within the circle no longer increases. 4) And 3), generating a plurality of sliding windows, reserving the window containing the most points when the sliding windows are overlapped, clustering according to the sliding window where the data points are positioned, and finally forming a stable window, namely a central point set and a corresponding grouping cluster.

In a possible implementation, the classifying unit 1802, configured to classify the m location information according to a maximum Expectation (EM) clustering algorithm using a gaussian mixture (GMM) model to obtain K location information sets, may include: 1) the number of clusters is selected and the gaussian distribution parameters (mean and variance) of each cluster are initialized randomly, or a relatively accurate mean and variance may be given first based on the data. 2) Given the Gaussian distribution of each class cluster, calculating the probability that each data point belongs to each class cluster; the closer a point is to the center of the gaussian distribution, the more likely it belongs to the cluster. 3) Calculating the gaussian distribution parameters based on these probabilities maximizes the probability of a data point, and these new parameters can be calculated using a weighting of the probability of a data point, which is the probability that the data point belongs to the class of clusters. 4) Iterations 2) and 3) are repeated until the variation in the iterations is not large.

In a possible implementation manner, the classifying unit 1802, configured to classify the m position information according to a spectral clustering algorithm to obtain K position information sets, may include the following 4 steps: 1) the a sample and the b sample are similar in measurement, namely Gaussian similarity

Wherein sigma is a hyper-parameter, a is more than or equal to 1 and less than or equal to m, b is more than or equal to 1 and less than or equal to m, a is not equal to b, and a and b are integers. 2) Forming a similarity matrix W ═ S_abm x m, symmetric matrix, wherein S_aaIt should be equal to 1, but for the convenience of calculation, 0 is written, so the similarity matrix becomes a symmetric matrix with 0 on the main diagonal. 3) Calculating the sum d of the similarity of the a-th sample to all other samples_a＝S_a1+S_a2+…+S_a(m-1)(with respect to S)_aSome of them use only the first K S' S to form K-class_aAdding or setting a threshold value, truncating S less than the threshold value_a) (ii) a In graph theory d_aCalled degree, can be understood as the weight of the connecting edge. Degree d of all points_aA degree matrix D (diagonal matrix) is formed. 4) Forming a laplacian matrix L ═ D-W, L being a symmetric semi-positive definite matrix, the minimum eigenvalue being 0, the corresponding eigenvectors being all 1 vectors. Arranging the eigenvalues of L from small to large, lambda₁...λ_mCorresponding feature vector u₁...u_mIf the clustering into K classes is required, the eigenvectors corresponding to the first K eigenvalues are taken to form a matrix U_mK, such that the feature corresponding to the first sample is considered to be u₁₁，u₁₂，...，u_1KThe second sample is characterized by u₂₁，u₂₂，...，u_2KThe characteristic of the mth sample is u_ml，u_m2，...，u_mKAnd performing K mean value on the m samples, and finally, obtaining the clustering result of the m samples, namely the clustering result of the original position information.

The spectral clustering algorithm provided by the embodiment of the invention has the advantages that the clustering of the sample data is realized by clustering the characteristic vector of the Laplacian matrix of the sample data based on the graph theory; when the data sample distribution exhibits non-spherical shapes, it can be identified and processed.

In a possible implementation manner, the classifying unit 1802 is configured to classify the m position information according to a Dbscan clustering algorithm to obtain K position information sets, and may include: 1) firstly, determining the radius r and minPoints; starting from an arbitrary data point which has not been visited, and taking the point as the center, whether the number of points contained in a circle with r as the radius is greater than or equal to minPoints or not is judged, if so, the point is marked as central point, otherwise, the point is marked as noise point. 2) Repeating the step 1), if a noise point exists in a circle with a radius of a certain central point, marking the point as an edge point, and otherwise, still indicating the noise point. Repeat step 1) knowing that all points have been visited.

In a possible implementation manner, the classifying unit 1802, configured to classify the m position information according to a hierarchical clustering of agglomerations in a hierarchical clustering algorithm, so as to obtain K position information sets, may include: 1) each data point is treated as a single cluster, then a metric is selected that measures the distance between two clusters, the distances between all individuals are calculated, and the two samples that are closest together are found. 2) And taking the small group as a new individual, calculating the distance between all the individuals and the rest individuals, finding out two individuals closest to each other to form a group, and repeating the steps. 3) Until the desired number of clusters is obtained.

A screening unit 1803, configured to determine, according to a preset reliability condition, a first set with the highest reliability in the K location information sets;

in a possible implementation manner, the screening unit 1803 is configured to calculate ratios of the number of location information in each location information set to m, respectively; and determining the position information set corresponding to the maximum ratio as the first set with the highest credibility.

A determining unit 1804, configured to determine, according to the position information in the first set, a region position of a vehicle maintenance factory where the vehicle diagnostic apparatus is located.

In a possible implementation manner, the determining unit 1804 is configured to, after the screening unit executes the corresponding content, enable the screening module, specifically, to enable the first set with the highest confidence level in the determined K sets of location information to be c^(j)And the cluster center point of the first set is mu^(j)J is K, K is more than 0 and less than or equal to K, and K is an integer; determining the first set c^(j)And a cluster center point μ of said first set^(j)Then, using the cluster center point mu^(j)Determining position information in a circular area in the first set, wherein the circular area is a round point and has a radius of a preset numerical value; and determining the area position of the vehicle maintenance factory according to the position information in the circular area.

A second obtaining unit 1805, configured to obtain vehicle-related data collected at a position corresponding to the position information in the first set of the vehicle-mounted diagnostic device. The second obtaining unit 1805 is an optional unit in this embodiment of the present invention.

As shown in fig. 19, fig. 19 is a schematic structural diagram of an apparatus according to an embodiment of the present invention. The vehicle diagnostic device processing apparatus 18 may be implemented in the structure shown in fig. 19, and the device 19 may include at least one storage component 1901, at least one communication component 1902, and at least one processing component 1903. In addition, the device may also include general components such as an antenna, a power supply, etc., which will not be described in detail herein.

The storage component 1901 may be a Read-Only Memory (ROM) or other types of static storage devices that can store static information and instructions, a Random Access Memory (RAM) or other types of dynamic storage devices that can store information and instructions, an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical Disc storage, optical Disc storage (which may include Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integral to the processor.

The communication component 1902 may be a device for communicating with other devices or communication networks, such as an upgrade server, a key server, a device inside a vehicle, etc.

The processing unit 1903 may be a general purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs according to the above schemes.

When the apparatus shown in fig. 19 is the vehicle diagnostic apparatus data processing device 18, the processing means acquires m pieces of positional information of the vehicle diagnostic apparatus; classifying the m position information according to a clustering algorithm to obtain a K position information set; determining a first set with the highest reliability in the K position information sets according to a preset reliability condition; and determining the regional position of the vehicle service shop where the vehicle diagnosis equipment is located according to the position information in the first set.

It should be noted that, for the functions of each functional unit of the vehicle diagnostic device data processing apparatus 18 described in the embodiment of the apparatus of the present invention, reference may be made to the description of the vehicle diagnostic device data processing method in the method embodiment described in fig. 2 to 17, and details are not repeated herein.

An embodiment of the present invention further provides a computer storage medium, where the computer storage medium may store a program, and the program may include some or all of the steps described in the above method embodiments when executed.

Embodiments of the present invention also provide a computer program, which may include instructions that, when executed by a computer, cause the computer to perform some or all of the steps including any one of the method embodiments described above.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus can be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. The elements of the above device embodiments may or may not be physically separated, and some or all of the elements may be selected according to actual needs to achieve the purpose of the solution of the embodiments of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product.

Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and may include several instructions to enable a computer device (which may be a personal computer, a server, a network device, or the like, and may specifically be a processor in the computer device) to execute all or part of the steps of the above methods according to the embodiments of the present invention. Among them, the aforementioned storage medium may include: a U-disk, a removable hard disk, a magnetic disk, an optical disk, a Read-Only Memory (ROM) or a Random Access Memory (RAM), and the like. The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A data processing method, comprising:

acquiring m pieces of position information of vehicle diagnosis equipment, wherein the m pieces of position information are self geographical position information acquired by the vehicle diagnosis equipment during vehicle diagnosis, and m is an integer greater than 2;

selecting K clustering center points from a sample set formed by the m position information; wherein the set of samples is { x⁽¹⁾,x⁽²⁾,…,x^(m)}，x⁽ⁱ⁾∈Rⁿ，x⁽ⁱ⁾Position information of the ith sample in the sample set, i is 1, 2, …, m; the central point set formed by the K cluster central points is { mu⁽¹⁾,μ⁽²⁾,…,μ^(K)}，μ^(j)∈Rⁿ，μ^(j)Position information of the jth cluster center point of the center point set, j being 1, 2, …, K; rⁿIs n-dimensional vector space, and n is an integer greater than or equal to 1;

according to formula c^(j)＝arg min_j||x⁽ⁱ⁾-μ^(j)||²Calculate the ith sample x⁽ⁱ⁾The cluster c to which it belongs^(j)(ii) a Wherein, | | x⁽ⁱ⁾-μ^(j)||²Is x⁽ⁱ⁾To mu^(j)Square of Euclidean distance of (d), arg min_j||x⁽ⁱ⁾-μ^(j)||²Is when mu^(j)When the cluster center point is reached, let to μ^(j)All arguments x for taking the minimum of the sum of squares of the euclidean distances of⁽ⁱ⁾A set of (a);

according to the formula

Updating each of the cluster center points mu^(j)Up to a distortion function

Converging; wherein, c^(j)Is a cluster of a class j,

location information x for said cluster of classes j⁽ⁱ⁾，

For all positions in the class cluster jThe characteristics of the information and the information,

to count the number of location information in the class cluster j,

the Euclidean distance square sum of each position information of the class cluster j to the cluster center point of the class cluster j is obtained;

when the distortion function is converged, the obtained K clusters correspond to the K position information sets;

determining a first set with the highest reliability in the K position information sets according to a preset reliability condition, wherein K is an integer which is greater than 1 and less than N;

and determining the regional position of the vehicle service shop where the vehicle diagnosis equipment is located according to the position information in the first set.

2. The method according to claim 1, wherein the first set with the highest confidence level in the K sets of location information is c^(j)And the cluster center point of the first set is mu^(j)，j＝k，0<K is less than or equal to K, and K is an integer; the determining the regional position of the vehicle service shop where the vehicle diagnosis device is located according to the position information in the first set includes:

determining the first set c^(j)And a cluster center point μ of said first set^(j)Then, using the cluster center point mu^(j)Determining position information in a circular area in the first set, wherein the circular area is a round point and has a radius of a preset numerical value;

and determining the area position of the vehicle maintenance factory according to the position information in the circular area.

3. The method according to any one of claims 1 or 2, wherein the determining a first set with the highest reliability from the K sets of position information according to a preset reliability condition comprises:

calculating the ratio of the quantity of the position information in each position information set to m;

and determining the position information set corresponding to the maximum ratio as the first set with the highest credibility.

4. The method according to any one of claims 1 or 2, wherein after determining the regional location of the vehicle service shop to which the vehicle diagnostic device belongs, further comprising:

and acquiring vehicle related data acquired by the vehicle diagnosis equipment at a position corresponding to the position information in the first set.

5. A data processing apparatus, comprising:

the vehicle diagnosis device comprises a first acquisition unit, a second acquisition unit and a control unit, wherein the first acquisition unit is used for acquiring m pieces of position information of the vehicle diagnosis device, the m pieces of position information are self geographical position information acquired by the vehicle diagnosis device when the vehicle diagnosis device carries out vehicle diagnosis, and m is an integer larger than 2;

a classification unit to:

according to formula c^(j)＝arg min_j||x⁽ⁱ⁾-μ^(j)||²Calculate the ith sample x⁽ⁱ⁾The cluster c to which it belongs^(j)(ii) a Wherein, | | x⁽ⁱ⁾-μ^(j)||²Is x⁽ⁱ⁾To mu^(j)European style distance ofSquare of ion, arg min_j||x⁽ⁱ⁾-μ^(j)||²Is when mu^(j)When the cluster center point is reached, let to μ^(j)All arguments x for taking the minimum of the sum of squares of the euclidean distances of⁽ⁱ⁾A set of (a);

according to the formula

Updating each of the cluster center points mu^(j)Up to a distortion function

Converging; wherein, c^(j)Is a cluster of a class j,

location information x for said cluster of classes j⁽ⁱ⁾，

to count the number of location information in the class cluster j,

the screening unit is used for determining a first set with the highest reliability in the K position information sets according to a preset reliability condition, wherein K is an integer which is greater than 1 and smaller than N;

and the determining unit is used for determining the regional position of the vehicle maintenance factory where the vehicle diagnostic equipment is located according to the position information in the first set.

6. The apparatus according to claim 5, wherein the screening unit is specifically configured to:

c is the first set with the highest credibility in the K position information sets^(j)And the cluster center point of the first set is mu^(j)，j＝k，0<K is less than or equal to K, and K is an integer;

the determining unit is specifically configured to:

determining the first set c^(j)And a cluster center point μ of said first set^(j)Then, using the cluster center point mu^(j)Determining position information in a circular area in the first set, wherein the circular area is a round point and has a radius of a preset numerical value; and determining the area position of the vehicle maintenance factory according to the position information in the circular area.

7. The apparatus according to any one of claims 5 or 6, wherein the screening unit is specifically configured to:

calculating the ratio of the quantity of the position information in each position information set to m; and determining the position information set corresponding to the maximum ratio as the first set with the highest credibility.

8. The apparatus of any one of claims 5 or 6, further comprising:

and the second acquisition unit is used for acquiring the vehicle related data acquired by the vehicle diagnosis equipment at the position corresponding to the position information in the first set after the determination unit determines the region position of the vehicle maintenance factory to which the vehicle diagnosis equipment belongs.