CN109189876B - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN109189876B
CN109189876B CN201811010287.0A CN201811010287A CN109189876B CN 109189876 B CN109189876 B CN 109189876B CN 201811010287 A CN201811010287 A CN 201811010287A CN 109189876 B CN109189876 B CN 109189876B
Authority
CN
China
Prior art keywords
position information
cluster
vehicle
determining
center point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811010287.0A
Other languages
Chinese (zh)
Other versions
CN109189876A (en
Inventor
刘均
张小琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Launch Technology Co Ltd
Original Assignee
Shenzhen Launch Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Launch Technology Co Ltd filed Critical Shenzhen Launch Technology Co Ltd
Priority to CN201811010287.0A priority Critical patent/CN109189876B/en
Publication of CN109189876A publication Critical patent/CN109189876A/en
Application granted granted Critical
Publication of CN109189876B publication Critical patent/CN109189876B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a data processing method and a related device. Acquiring m pieces of position information of vehicle diagnosis equipment, wherein the m pieces of position information are self geographical position information acquired by the vehicle diagnosis equipment during vehicle diagnosis, and m is an integer greater than 2; classifying the m position information according to a clustering algorithm to obtain a K position information set; determining a first set with the highest reliability in the K position information sets according to a preset reliability condition, wherein K is an integer which is greater than 1 and less than N; and determining the regional position of the vehicle service shop where the vehicle diagnosis equipment is located according to the position information in the first set. The technical scheme of the embodiment of the invention is beneficial to enhancing the credibility of the positioning position result, reducing the calculation complexity and being easy to process a large amount of data.

Description

Data processing method and device
Technical Field
The invention relates to the field of data analysis, in particular to a data processing method and device of vehicle diagnosis equipment.
Background
The vehicle diagnosis equipment is used as a vehicle networking terminal and is an important source of vehicle networking data in the big data era. Although the vehicle diagnosis apparatus is a separate large apparatus, its location is varied due to the flexibility of use of the vehicle diagnosis apparatus. The acquired position information of the vehicle diagnostic equipment reflects the characteristic that the position of the vehicle diagnostic equipment changes, so the position of the vehicle diagnostic equipment is often unfixed. In practical applications, the application scenario of the vehicle diagnostic device is mostly a vehicle maintenance factory, and therefore the geographic location of the maintenance factory can be determined by the location information of the vehicle diagnostic device, but in doing so, the location information of the relatively fixed vehicle diagnostic device needs to be determined first.
Regarding the extraction strategy of the position information of the vehicle diagnosis device, the most direct method at present is to extract one or two pieces of position information from the acquired position information of a plurality of devices by random sampling, and obtain the position information of the vehicle diagnosis device through analysis. However, the result of such processing has low reliability.
Disclosure of Invention
The embodiment of the invention provides a data processing method, which can enhance the credibility of the obtained position information of the vehicle diagnosis equipment, reduce the calculation complexity and facilitate the processing of a large amount of data.
In a first aspect, an embodiment of the present invention provides a data processing method, including:
acquiring m pieces of position information of vehicle diagnosis equipment, wherein the m pieces of position information are self geographical position information acquired by the vehicle diagnosis equipment during vehicle diagnosis, and m is an integer greater than 2; classifying the m position information according to a clustering algorithm to obtain a K position information set; determining a first set with the highest reliability in the K position information sets according to a preset reliability condition, wherein K is an integer which is greater than 1 and less than N; and determining the regional position of the vehicle service shop where the vehicle diagnosis equipment is located according to the position information in the first set.
By implementing the embodiment of the invention, the acquired position information of a large number of vehicle diagnosis devices is classified according to a clustering algorithm, the positioning precision is improved, then the first set with the highest reliability is screened out according to the preset reliability condition, the influence of individual wrong position information is reduced, and finally the region position of the vehicle maintenance plant where the vehicle diagnosis devices are located is determined. The method can enhance the credibility of the result of the position of the positioned area, reduce the calculation complexity and facilitate the processing of a large amount of data.
In a possible implementation manner, the classifying the m position information according to a clustering algorithm to obtain K position information sets includes:
selecting K clustering center points from a sample set formed by the m position information; wherein the set of samples is { x(1),x(2),...,x(m)},x(i)∈Rn,x(i)Position information of an ith sample in the sample set, i ═ 1, 2.·, m; the K cluster center point structuresThe set of the formed central points is { mu(1),μ(2),...,μ(K)},μ(j)∈Rn,μ(j)Position information of a jth clustering center point of the set of center points, j being 1, 2. RnIs n-dimensional vector space, and n is an integer greater than or equal to 1; according to formula c(j)=arg minj||x(i)(j)||2Calculate the ith sample x(i)The cluster c to which it belongs(j)(ii) a Wherein, | | x(i)(j)||2Is x(i)To mu(j)Square of Euclidean distance of (d), arg minj||x(i)(j)||2Is when mu(j)When the cluster center point is reached, let to μ(j)All arguments x for taking the minimum of the sum of squares of the euclidean distances of(i)A set of (a); according to the formula
Figure BDA0001784885620000021
Updating each of the cluster center points mu(j)Up to a distortion function
Figure BDA0001784885620000022
Converging; wherein, c(j)Is a cluster of a class j,
Figure BDA0001784885620000026
location information x for said cluster of classes j(i)
Figure BDA0001784885620000023
For the sum of the characteristics of all location information in the class cluster j,
Figure BDA0001784885620000024
to count the number of location information in the class cluster j,
Figure BDA0001784885620000025
the Euclidean distance square sum of each position information of the class cluster j to the cluster center point of the class cluster j is obtained; when the distortion function is converged, the obtained K clusters correspond to the distortion functionK sets of location information.
In a possible implementation manner, the first set with the highest reliability in the K sets of location information is c(j)And the cluster center point of the first set is mu(j)J is K, K is more than 0 and less than or equal to K, and K is an integer; the determining the regional position of the vehicle service shop where the vehicle diagnosis device is located according to the position information in the first set includes: determining the first set c(j)And a cluster center point μ of said first set(j)Then, using the cluster center point mu(j)Determining position information in a circular area in the first set, wherein the circular area is a round point and has a radius of a preset numerical value; and determining the area position of the vehicle maintenance factory according to the position information in the circular area.
In a possible implementation manner, the determining, according to a preset reliability condition, a first set with the highest reliability in the K sets of location information includes: calculating the ratio of the quantity of the position information in each position information set to m; and determining the position information set corresponding to the maximum ratio as the first set with the highest credibility.
In one possible implementation manner, after determining the regional location of the vehicle maintenance factory to which the vehicle diagnosis device belongs, the method further includes: and acquiring vehicle related data acquired by the vehicle-mounted diagnosis equipment at a position corresponding to the position information in the first set.
In a possible implementation manner, the classifying the m position information according to a clustering algorithm to obtain K position information sets includes:
step 1: selecting K clustering center points from a sample set formed by the m position information; selecting some clusters from the obtained position data and randomly initializing the respective clustering center points of the clusters, thereby presetting K as the number of the clustering center points and also representing the number of the clusters. Wherein the set of samples is { x(1),x(2),...,x(m)},x(i)∈Rn,x(i)For the ith sample in the set of samplesPosition information, i ═ 1, 2.., m; the central point set formed by the K cluster central points is { mu(1),μ(2),...,μ(K)},μ(j)∈Rn,μ(j)Position information of a jth clustering center point of the set of center points, j being 1, 2. RnIs n-dimensional vector space, and n is an integer greater than or equal to 1;
step 2: according to formula c(j)=arg minj||x(i)(j)||2Calculate the ith sample x(i)The cluster to which it belongs
Figure BDA00017848856200000323
Wherein, | | x(i)(j)||2Is x(i)To mu(j)Square of Euclidean distance of (d), arg minj||x(i)- μ (j)2 is all arguments x such that the sum of squares of euclidean distances to μ (j) takes a minimum value when μ (j) is the cluster center point(i)A set of (a);
and step 3: according to the formula
Figure BDA0001784885620000031
Calculating each of the cluster center points
Figure BDA0001784885620000032
Wherein, c(j)Is a cluster of a class j,
Figure BDA00017848856200000324
location information x for said cluster of classes j(i)
Figure BDA0001784885620000033
For the sum of the characteristics of all location information in the class cluster j,
Figure BDA0001784885620000034
counting the number of position information in the class cluster j;
Figure BDA0001784885620000035
representing a result total name of the clustering center point after the clustering center point is calculated for the first time; by analogy, the following references
Figure BDA0001784885620000036
Figure BDA0001784885620000037
The same applies to all meanings of the above.
And 4, step 4: determining distortion function
Figure BDA0001784885620000038
Whether to converge;
and 5: according to formula c(j)=arg minj||x(i)(j)||2Calculate the ith sample x(i)The cluster to which it belongs
Figure BDA0001784885620000039
Step 6: according to the formula
Figure BDA00017848856200000310
Updating each of the cluster center points
Figure BDA00017848856200000311
And 7: determining distortion function
Figure BDA00017848856200000312
Whether to converge;
and 8: according to formula c(j)=arg minj||x(i)(j)||2Calculate the ith sample x(i)The cluster to which it belongs
Figure BDA00017848856200000313
And step 9: according to the formula
Figure BDA00017848856200000314
Updating each of the cluster center points
Figure BDA00017848856200000315
Step 10: determining distortion function
Figure BDA00017848856200000316
Whether to converge;
step 11: according to formula c(j)=arg minj||x(i)(j)||2Calculate the ith sample x(i)The cluster to which it belongs
Figure BDA00017848856200000317
Step 12: according to the formula
Figure BDA00017848856200000318
Updating each of the cluster center points
Figure BDA00017848856200000319
Step 13: determining distortion function
Figure BDA00017848856200000320
Whether to converge;
……
step W: determining distortion function
Figure BDA00017848856200000321
Whether to converge; function of distortion
Figure BDA00017848856200000322
When converging, K clustering center points mu are obtained(j)And the K cluster center points mu(j)Respectively corresponding cluster class c(j)That is, the obtained K cluster types correspond to the K position information sets; wherein the content of the first and second substances,
Figure BDA0001784885620000041
for said cluster jAnd W is the last step executed by the clustering algorithm when the distortion function is converged.
In a possible implementation manner, the first set with the highest reliability in the K sets of location information is c(j)And the cluster center point of the first set is mu(j)J is K, K is more than 0 and less than or equal to K, and K is an integer; the determining the regional position of the vehicle service shop where the vehicle diagnosis device is located according to the position information in the first set includes: determining the first set c(j)And a cluster center point μ of said first set(j)Then, using the cluster center point mu(j)Determining position information in a circular area in the first set, wherein the circular area is a round point and has a radius of a preset numerical value; and determining the area position of the vehicle maintenance factory according to the position information in the circular area.
In a possible implementation manner, the determining, according to a preset reliability condition, a first set with the highest reliability in the K sets of location information includes: calculating the ratio of the quantity of the position information in each position information set to m; and determining the position information set corresponding to the maximum ratio as the first set with the highest credibility.
In one possible implementation manner, after determining the regional location of the vehicle maintenance factory to which the vehicle diagnosis device belongs, the method further includes: and acquiring vehicle related data acquired by the vehicle diagnosis equipment at a position corresponding to the position information in the first set.
In a second aspect, an embodiment of the present invention provides a data processing apparatus, including: the device comprises a first acquisition unit, a classification unit, a screening unit, a determination unit and a second acquisition unit; the first obtaining unit is used for obtaining m pieces of position information of the vehicle diagnosis equipment by the background server, wherein the m pieces of position information are self geographical position information acquired by the vehicle diagnosis equipment during vehicle diagnosis, and m is an integer greater than 2; the classification unit is used for classifying the m position information according to a clustering algorithm to obtain a K position information set; the screening unit is used for determining a first set with the highest reliability in the K position information sets according to a preset reliability condition, wherein K is an integer which is greater than 1 and smaller than N; the determining unit is used for determining the region position of the vehicle maintenance factory where the vehicle diagnostic equipment is located according to the position information in the first set. The second obtaining unit is configured to obtain vehicle-related data collected at a position corresponding to the position information of the vehicle-mounted diagnostic device in the first set after the determining unit determines the area position of the vehicle maintenance plant to which the vehicle diagnostic device belongs.
By implementing the embodiment of the invention, the acquired position information of a large number of vehicle diagnosis devices is classified according to a clustering algorithm, the positioning precision is improved, then the first set with the highest reliability is screened out according to the preset reliability condition, the influence of individual wrong position information is reduced, and finally the region position of the vehicle maintenance plant where the vehicle diagnosis devices are located is determined. The invention can enhance the credibility of the regional position result, reduce the calculation complexity and is easy to process a large amount of data.
In a possible implementation manner, the classification unit is specifically configured to: selecting K clustering center points from a sample set formed by the m position information; wherein the set of samples is { x(1),x(2),...,x(m)},x(i)∈Rn,x(i)Position information of an ith sample in the sample set, i ═ 1, 2.·, m; the central point set formed by the K cluster central points is { mu(1),μ(2),...,μ(K)},μ(j)∈Rn,μ(j)Position information of a jth clustering center point of the set of center points, j being 1, 2. RnIs n-dimensional vector space, and n is an integer greater than or equal to 1; according to formula c(j)=arg minj||x(i)(j)||2Calculate the ith sample x(i)The cluster c to which it belongs(j)(ii) a Wherein, | | x(i)(j)||2Is x(i)To mu(j)Square of Euclidean distance of (d), arg minj||x(i)(j)||2Is when mu(j)When the cluster center point is reached, let to μ(j)All arguments x for taking the minimum of the sum of squares of the euclidean distances of(i)A set of (a); according to the formula
Figure BDA0001784885620000051
Updating each of the cluster center points mu(j)Up to a distortion function
Figure BDA0001784885620000052
Converging; wherein, c(j)Is a cluster of a class j,
Figure BDA0001784885620000056
location information x for said cluster of classes j(i)
Figure BDA0001784885620000053
For the sum of the characteristics of all location information in the class cluster j,
Figure BDA0001784885620000054
to count the number of location information in the class cluster j,
Figure BDA0001784885620000055
the Euclidean distance square sum of each position information of the class cluster j to the cluster center point of the class cluster j is obtained; and when the distortion function is converged, the obtained K clusters correspond to the K position information sets.
In a possible implementation manner, the screening unit is specifically configured to: c is the first set with the highest credibility in the K position information sets(j)And the cluster center point of the first set is mu(j)J is K, K is more than 0 and less than or equal to K, and K is an integer; the determining unit is specifically configured to: determining the first set c(j)And a cluster center point μ of said first set(j)Then, using the cluster center point mu(j)A circular region with a predetermined radius and a circular pointA field determining location information in the first set within the circular region; and determining the area position of the vehicle maintenance factory according to the position information in the circular area.
In a possible implementation manner, the screening unit is specifically configured to: calculating the ratio of the quantity of the position information in each position information set to m; and determining the position information set corresponding to the maximum ratio as the first set with the highest credibility.
In one possible implementation manner, the apparatus further includes: and the second acquisition unit is used for acquiring vehicle related data acquired by the vehicle-mounted diagnosis equipment at a position corresponding to the position information in the first set after the determination unit determines the region position of the vehicle maintenance plant to which the vehicle diagnosis equipment belongs.
In a possible implementation manner, the classification unit is specifically configured to:
step 1: selecting K clustering center points from a sample set formed by the m position information; selecting some clusters from the acquired position data and randomly initializing respective cluster center points of the clusters. The length of the clustering center point is the same as that of each position data vector, so that K is preset as the number of the clustering center points and also represents the number of the clusters. Wherein the set of samples is { x(1),x(2),...,x(m)},x(i)∈Rn,x(i)Position information of an ith sample in the sample set, i ═ 1, 2.·, m; the central point set formed by the K cluster central points is { mu(1),μ(2),...,μ(K)},μ(j)∈Rn,μ(j)Position information of a jth clustering center point of the set of center points, j being 1, 2. RnIs n-dimensional vector space, and n is an integer greater than or equal to 1;
step 2: according to formula c(j)=arg minj||x(i)(j)||2Calculate the ith sample x(i)The cluster to which it belongs
Figure BDA0001784885620000061
Wherein, | | x(i)(j)||2Is x(i)To mu(j)Square of Euclidean distance of (d), arg minj||x(i)- μ (j)2 is all arguments x such that the sum of squares of euclidean distances to μ (j) takes a minimum value when μ (j) is the cluster center point(i)A set of (a);
and step 3: according to the formula
Figure BDA0001784885620000062
Calculating each of the cluster center points
Figure BDA0001784885620000063
Wherein, c(j)Is a cluster of a class j,
Figure BDA00017848856200000625
location information x for said cluster of classes j(i)
Figure BDA0001784885620000064
For the sum of the characteristics of all location information in the class cluster j,
Figure BDA0001784885620000065
counting the number of position information in the class cluster j;
Figure BDA0001784885620000066
representing a result total name of the clustering center point after the clustering center point is calculated for the first time; by analogy, the following references
Figure BDA0001784885620000067
Figure BDA0001784885620000068
The meaning of the expression is the same.
And 4, step 4: determining distortion function
Figure BDA0001784885620000069
Whether to converge;
and 5: according to the formula
Figure BDA00017848856200000610
Calculate the ith sample x(i)The cluster to which it belongs
Figure BDA00017848856200000611
Step 6: according to the formula
Figure BDA00017848856200000612
Updating each of the cluster center points
Figure BDA00017848856200000613
And 7: determining distortion function
Figure BDA00017848856200000614
Whether to converge;
and 8: according to formula c(j)=arg minj||x(i)(j)||2Calculate the ith sample x(i)The cluster to which it belongs
Figure BDA00017848856200000615
And step 9: according to the formula
Figure BDA00017848856200000616
Updating each of the cluster center points
Figure BDA00017848856200000617
Step 10: determining distortion function
Figure BDA00017848856200000618
Whether to converge;
step 11: according to formula c(j)=arg minj||x(i)(j)||2Calculate the ith sample x(i)The cluster to which it belongs
Figure BDA00017848856200000619
Step 12: according to the formula
Figure BDA00017848856200000620
Updating each of the cluster center points
Figure BDA00017848856200000621
Step 13: determining distortion function
Figure BDA00017848856200000622
Whether to converge;
……
step W: determining distortion function
Figure BDA00017848856200000623
Whether to converge; function of distortion
Figure BDA00017848856200000624
When converging, K clustering center points mu are obtained(j)And the K cluster center points mu(j)Respectively corresponding cluster class c(j)That is, the obtained K cluster types correspond to the K position information sets; wherein the content of the first and second substances,
Figure BDA0001784885620000071
and W is the final step executed by the clustering algorithm when the distortion function is converged, wherein W is the Euclidean distance square sum of each position information of the class cluster j to the clustering center point of the class cluster j.
In a possible implementation manner, the screening unit is specifically configured to: c is the first set with the highest credibility in the K position information sets(j)And the cluster center point of the first set is mu(j)J is K, K is more than 0 and less than or equal to K, and K is an integer; the determining unit is specifically configured to: determining the first set c(j)And a cluster center point μ of said first set(j)Then, using the cluster center point mu(j)Determining position information in a circular area in the first set, wherein the circular area is a round point and has a radius of a preset numerical value; and determining the area position of the vehicle maintenance factory according to the position information in the circular area.
In a possible implementation manner, the screening unit is specifically configured to: calculating the ratio of the quantity of the position information in each position information set to m; and determining the position information set corresponding to the maximum ratio as the first set with the highest credibility.
In one possible implementation manner, the apparatus further includes: and the second acquisition unit is used for acquiring the vehicle related data acquired by the vehicle diagnosis equipment at the position corresponding to the position information in the first set after the determination unit determines the region position of the vehicle maintenance factory to which the vehicle diagnosis equipment belongs.
In the embodiment of the invention, the acquired position information of a large number of vehicle diagnosis devices is classified according to a clustering algorithm, the positioning precision is improved, then a first set with the highest reliability is screened out according to a preset reliability condition, the influence of individual wrong position information is reduced, and finally the regional position of a vehicle maintenance factory where the vehicle diagnosis devices are located is determined by effectively utilizing the target position information in the set in the first set according to a certain rule. The method and the device can enhance the credibility of the regional position result, reduce the calculation complexity and facilitate the processing of a large amount of data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments of the present invention or the prior art will be briefly introduced below, it is obvious that the drawings and the attached tables in the following description are only some embodiments of the present invention, and other drawings and attached tables can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a system architecture diagram of data processing for a vehicle diagnostic device provided by an embodiment of the present invention;
FIG. 2 is an interactive schematic diagram of a vehicle diagnostic device and a server provided by an embodiment of the invention;
FIG. 3 is a table for arranging and presenting m pieces of location information provided by an embodiment of the present invention;
FIGS. 4-15 are schematic diagrams illustrating classification results of m location information data points according to a K-means algorithm according to an embodiment of the present invention; fig. 4 is a data point distribution diagram of m pieces of position information according to an embodiment of the present invention; FIG. 5 shows the data of FIG. 4 passing through the ith sample x(i)The cluster to which it belongs
Figure BDA0001784885620000072
And each of said cluster center points
Figure BDA0001784885620000073
Calculating a classification result graph; fig. 15 is a final classification result diagram of m position information through a K-means clustering algorithm according to the embodiment of the present invention;
FIG. 16 is a diagram illustrating the number of position information in the position information set and the ratio of the number to m, which is counted according to FIG. 15, according to an embodiment of the present invention;
FIG. 17 is a schematic illustration of the determination of vehicle service plant area locations in the first set of the screen of FIG. 16 in accordance with an embodiment of the present invention;
fig. 18 is a schematic structural diagram of a data processing device of a vehicle diagnostic apparatus according to an embodiment of the present invention;
fig. 19 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of the invention and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments. The technical scheme of the embodiment of the invention can be applied to the fields of data processing, cluster analysis and the like. When the fields and the scenes of the application of the method and the device are different, the names of specific equipment and places in the embodiment of the invention are also different.
First, some terms in the present invention are explained to facilitate understanding by those skilled in the art.
(1) The vehicle related data is data collected in the vehicle maintenance factory after being determined by the clustering algorithm, the collected data is subsequently processed, the purpose of analyzing the scale, the operation condition and other contents of the vehicle maintenance factory to which the vehicle diagnosis equipment belongs is achieved, and the analysis result is fed back to the vehicle maintenance factory so that the vehicle maintenance factory can scientifically adjust various conditions of the vehicle maintenance factory.
(2) The vehicle diagnostic device is a generic term for various devices commonly used in a vehicle service facility, such as a diagnostic device and a service device.
(3) Euclidean distance (euclidean metric), also known as the euclidean metric, is a commonly used definition of distance, which refers to the true distance between two points in an N-dimensional space, or the natural length of a vector (i.e., the distance of the point from the origin). In the invention, the position data points are divided into the cluster where the cluster center point with the shortest Euclidean distance to the data points is located. For convenience of calculation, the square of the Euclidean distance from each position information data point to the central point of the cluster is used as a division standard.
(4) The K-Medians algorithm, a variation of K-Means, uses the median of the data set to calculate the cluster center of the class cluster.
(5) The mean shift (mean shift) clustering algorithm is based on a sliding window to find dense regions of data points. The mean shift clustering algorithm is a centroid-based algorithm that locates the center point of each cluster by updating the candidate point of the center point to the mean of the points within the sliding window. And then carrying out similar window removal processing on the candidate window to finally form a center point set and a corresponding grouping cluster.
(6) The maximum Expectation (EM) clustering algorithm using the Gaussian mixture (GMM) model assumes that data points are Gaussian distributed, and the corresponding K-Means assumes that the data point distribution is circular, and K-Means is a special case of GMMs, where the clusters appear circular when the variance is close to 0 in all dimensions. The gaussian distribution (ellipse) gives more possibilities, and the shape of the cluster can be described by two parameters: mean and standard deviation. The cluster can be an ellipse of any shape with standard deviation in both x and y directions. Thus, each gaussian distribution is assigned to a single class cluster. Before clustering, a maximum Expectation (EM) optimization algorithm is first adopted. The mean and standard deviation of the data set were found.
(7) Spectral Clustering (Spectral Clustering) is a graph theory-based Clustering method, and is used for Clustering feature vectors of a Laplace matrix of sample data so as to cluster the sample data. The meaning of the spectrum is illustrated as follows: for example, matrix a, the totality of all its eigenvalues being collectively referred to as the spectrum of a. Most of the spectrum-related algorithms are algorithms related to feature values. And the spectral radius is the largest eigenvalue of all eigenvalues.
(8) The Dbscan Clustering algorithm (Density-Based Spatial Clustering of Applications with Noise) is a relatively representative Density-Based Clustering algorithm that defines class clusters as the largest set of Density-connected points, can divide regions with sufficiently high Density into class clusters, and can find clusters of arbitrary shapes in noisy Spatial databases.
(9) Hierarchical clustering algorithms (hierarchical methods) recursively merge or split data objects until some termination condition is satisfied. According to the decomposition mode of the hierarchy, the method can be divided into split hierarchical clustering from top to bottom and agglomeration hierarchical clustering from bottom to top.
Referring to fig. 1, fig. 1 is a schematic diagram of a system architecture for processing data of a vehicle diagnostic device, and as shown in fig. 1, the data processing method of a vehicle diagnostic device according to the present invention can be applied to the system architecture. The system architecture comprises the vehicle, the vehicle diagnosis equipment, the background server and the like. The icon of the background server, which represents the background server, may be composed of a plurality of servers. The vehicle 1, the vehicle 2, and the vehicle 12 each represent a vehicle that is diagnosed at a certain position, and are numbered only for distinction. The vehicle diagnosis device 1-a position, the vehicle diagnosis device 1-B position, the vehicle diagnosis device 1-C position, and the vehicle diagnosis device 1-D position respectively represent position information of the vehicle diagnosis device 1 belonging to the service shop 1 at A, B, C, D of the 4 positions. By analogy, the meaning of the names of the vehicle diagnostic device 2-a position, the vehicular diagnostic device 3-D position can be known. The service shop 1 and the corresponding circular area, the service shop 2 and the corresponding circular area, and the service shop 3 and the corresponding circular area respectively represent the area locations of the vehicle service shop to which the vehicle diagnostic apparatus 1, the vehicle diagnostic apparatus 2, and the vehicle diagnostic apparatus 3 belong. The position information of the vehicle diagnosis device can be uploaded to the background server through a transmission mode such as a network. The contents of the vehicle, the vehicle diagnosis equipment, the maintenance factory, the background server and the like cannot be exhaustively listed, so for convenience of image description, the embodiment of the invention only lists a certain number, but does not represent the number of the used vehicles in practical application. Since the use positions of the vehicle diagnostic equipment are mostly in a service shop, most of the vehicle diagnostic equipment icons are concentrated in a circular area of the service shop; the device icon that is not within the circular area indicates that the vehicle diagnostic device is not within the service shop to which it belongs when the device is used.
It is to be understood that the system architecture of fig. 1 is only an exemplary implementation of the embodiments of the present invention. The system architecture in the embodiments of the present invention may include, but is not limited to, the above system architecture.
The technical problem proposed in the present invention is specifically analyzed and solved by combining the above system architecture and the data processing embodiments provided in the present invention.
Referring to fig. 2, fig. 2 is an interaction schematic diagram of the vehicle diagnostic device and the backend server, and the following description will be given with reference to fig. 2 from an interaction side of the vehicle diagnostic device and the backend server, where an embodiment of the method mainly takes a K-Means algorithm as an example for description, and may specifically include steps S201 to S204. Optionally, step S205 may be further included. Wherein step S202 provides possible implementations of other clustering algorithms.
Step S201: acquiring m pieces of position information of vehicle diagnosis equipment, wherein the m pieces of position information are self geographical position information acquired by the vehicle diagnosis equipment during vehicle diagnosis, and m is an integer greater than 2;
specifically, any one of the m pieces of location information contains at least longitude and latitude information. Any one of the position information may be stored in the form of a spatial n-dimensional vector or in the form of an ordered pair of real numbers. The form of the acquired location information may include: acquiring the position information of the single device transmitted back by the vehicle diagnosis device; alternatively, a set of location information of the plurality of devices themselves transmitted back by the vehicle diagnosis device is acquired. In the application and calculation process of the algorithm, different storage forms can have different details of data processing methods, and the core application of the algorithm is not influenced.
Step S202: classifying the m position information according to a clustering algorithm to obtain a K position information set;
specifically, before classifying the m position information according to a clustering algorithm, in order to facilitate displaying classification results classified according to different clustering algorithms, the m position information may be preprocessed, where the preprocessing manner may include: arranging and presenting the m pieces of location information in a table form, please refer to fig. 3, where fig. 3 is a table for arranging and presenting the m pieces of location information provided by the embodiment of the present invention; alternatively, the m pieces of location information are arranged and presented in the form of an image, please refer to fig. 4, where fig. 4 is a data point distribution diagram of the m pieces of location information provided by the embodiment of the present invention. In the embodiment of the present invention, a method of sorting m pieces of position information in the form of an image is taken as an example. And then classifying the m position information according to a clustering algorithm to obtain a K position information set. The category of the clustering algorithm is selected according to the type of data and the purpose of clustering.
The main clustering algorithms can be divided into the following: partitional clustering, hierarchical clustering, density-based clustering, fuzzy clustering. In each class of methods, there are widely used algorithms, such as: a K-means clustering algorithm in the division clustering, an agglomeration type hierarchical clustering algorithm in the hierarchical clustering, and the like. It should be noted that the research on the clustering problem is not limited to hard clustering such as the K-means algorithm, i.e. each data can only be classified into one type. Fuzzy clustering is also a branch of cluster analysis which is widely studied. Fuzzy clustering determines the degree of membership of each data to each cluster through a membership function, rather than rigidly classifying a data object into a cluster, such as the FCM algorithm. Therefore, the implementation mode of the invention is not limited to the analysis of data by using a certain algorithm, but one or more algorithms or modes can be adopted according to the specific characteristics of the data, and the calculation simplicity and the reliability of the conclusion are sought. For example, it is possible to empirically sample the samples based on their distribution, then perform hierarchical clustering over a small sample range, and then perform K-means clustering using the K values obtained from the hierarchical clustering applied to the entire sample. In the embodiment of the present invention, in combination with the characteristics of the location information in the embodiment of the present invention, the following 7 algorithms are taken as examples, and the location information is classified respectively.
In a possible implementation manner, classifying the m position information according to a K-Means algorithm to obtain a K position information set, please refer to fig. 5-15, which may include: selecting K clustering center points from a sample set formed by the m position information; wherein the set of samples is { x(1),x(2),...,x(m)},x(i)∈Rn,x(i)Position information of an ith sample in the sample set, i ═ 1, 2.·, m; the central point set formed by the K cluster central points is { mu(1),μ(2),...,μ(K)},μ(j)∈Rn,μ(j)Position information of a jth clustering center point of the set of center points, j being 1, 2. RnIs n-dimensional vector space, and n is an integer greater than or equal to 1; according to formula c(j)=arg minj||x(i)(j)||2Calculate the ith sample x(i)The cluster c to which it belongs(j)(ii) a Wherein, | | x(i)(j)||2Is x(i)To mu(j)Square of Euclidean distance of (d), arg minj||x(i)(j)||2Is when mu(j)When the cluster center point is reached, let to μ(j)All arguments x for taking the minimum of the sum of squares of the euclidean distances of(i)A set of (a); according to the formula
Figure BDA0001784885620000111
Updating each of the cluster center points mu(j)Up to a distortion function
Figure BDA0001784885620000112
Converging; wherein, c(j)Is a cluster of a class j,
Figure BDA0001784885620000116
location information x for said cluster of classes j(i)
Figure BDA0001784885620000113
For the sum of the characteristics of all location information in the class cluster j,
Figure BDA0001784885620000114
to count the number of location information in the class cluster j,
Figure BDA0001784885620000115
the Euclidean distance square sum of each position information of the class cluster j to the cluster center point of the class cluster j is obtained; and when the distortion function is converged, the obtained K clusters correspond to the K position information sets.
The K-Means clustering algorithm provided by the embodiment of the invention has the advantages of simple and convenient calculation, quick and simple algorithm, higher efficiency on large data sets, and suitability for mining large-scale data sets and processing data-intensive clusters.
In a possible implementation manner, classifying the m position information according to a K-Medians algorithm to obtain a K position information set may include the following 4 steps: 1) the sample set of position information samples is { x }(1),x(2),...,x(m)},x(i)∈Rn(ii) a And selecting K cluster center points from the sample set. 2) According to formula c(j)=arg minj||x(i)(j)||2Calculating the class c to which each sample i belongs(j)Namely, the category with the minimum Euclidean distance from the position information sample to the cluster center point of the cluster to which the position information sample belongs. 3) Calculating the median of each cluster, and determining the center μ of each cluster(j). 4) And continuously repeating the steps 2) and 3) until stable K clusters and cluster center points corresponding to the K clusters are determined.
The K-Medians clustering algorithm provided by the embodiment of the invention has the advantages that the central point is calculated by using the median of the data, and the calculation result is prevented from being influenced by abnormal data.
In a possible implementation manner, classifying the m position information according to a mean shift clustering algorithm to obtain K position information sets may include the following 4 steps: 1) and determining the radius r of the sliding window, and starting sliding by using a circular sliding window with the radius r of the randomly selected cluster center point C. Moving to a more dense region in each iteration until convergence. 2) Each time a new region is slid, the mean value within the sliding window is calculated as the center point, and the number of points within the sliding window is the density within the window. In each movement, the window moves towards a more dense region. 3) Moving the window, calculating the center point within the window and the density within the window until there is no direction to accommodate more points within the window, i.e., moving until the density within the circle no longer increases. 4) And 3), generating a plurality of sliding windows, reserving the window containing the most points when the sliding windows are overlapped, clustering according to the sliding window where the data points are positioned, and finally forming a stable window, namely a central point set and a corresponding grouping cluster.
The mean shift clustering algorithm provided by the embodiment of the invention has the advantages that the number of the clusters does not need to be known, and the number of the clusters can be automatically classified through the calculation of the algorithm; during the calculation process, the clustering centers are gathered towards the maximum point density and are less influenced by the data mean.
In a possible implementation, classifying the m position information according to a maximum Expectation (EM) clustering algorithm using a gaussian mixture (GMM) model to obtain K position information sets may include the following 4 steps: 1) the number of class clusters is selected and the gaussian distribution parameters (mean and variance) for each class cluster are initialized randomly. It is also possible to first give a relatively accurate mean and variance from the data. 2) Given the Gaussian distribution of each class cluster, the probability of each data point belonging to each class cluster is calculated. The closer a point is to the center of the gaussian distribution, the more likely it belongs to the cluster. 3) Calculating the gaussian distribution parameters based on these probabilities maximizes the probability of a data point, and these new parameters can be calculated using a weighting of the probability of a data point, which is the probability that the data point belongs to the class of clusters. 4) Iterations 2) and 3) are repeated until the variation in the iterations is not large.
The maximum Expectation (EM) clustering algorithm using a Gaussian mixture (GMM) model provided in the embodiment of the present invention has an advantage in that, since GMMs use mean and standard deviation, identifiable cluster-like shapes may be elliptical, not limited to circular; because GMMs use probabilities, a data point can belong to multiple clusters, improving the accuracy of the calculation.
In a possible implementation manner, classifying the m position information according to a spectral clustering algorithm to obtain a K position information set may include the following 4 steps: 1) the a sample and the b sample are similar in measurement, namely Gaussian similarity
Figure BDA0001784885620000121
Wherein sigma is a hyper-parameter, a is more than or equal to 1 and less than or equal to m, b is more than or equal to 1 and less than or equal to m, a is not equal to b, and a and b are integers. 2) Forming a similarity matrix W ═ Sabm x m, symmetric matrix, wherein SaaIt should be equal to 1, but for the convenience of calculation, 0 is written, so the similarity matrix becomes a symmetric matrix with 0 on the main diagonal. 3) Calculating the sum d of the similarity of the a-th sample to all other samplesa=Sa1+Sa2+…+Sa(m-1)(with respect to S)aSome of them use only the first K S' S to form K-classaAdding or setting a threshold value, truncating S less than the threshold valuea) (ii) a In graph theory daCalled degree, can be understood as the weight of the connecting edge. Degree d of all pointsaA degree matrix D (diagonal matrix) is formed. 4) Forming a laplacian matrix L ═ D-W, L being a symmetric semi-positive definite matrix, the minimum eigenvalue being 0, the corresponding eigenvectors being all 1 vectors. Arranging the eigenvalues of L from small to large, lambda1...λmCorresponding feature vector u1...umIf the clustering into K classes is required, the eigenvectors corresponding to the first K eigenvalues are taken to form a matrix UmK, such that the feature corresponding to the first sample is considered to be u11,u12,...,u1KThe second sample is characterized by u21,u22,...,u2KThe characteristics of the m-th sampleIs um1,um2,...,umKAnd performing K mean value on the m samples, and finally, obtaining the clustering result of the m samples, namely the clustering result of the original position information.
The spectral clustering algorithm provided by the embodiment of the invention has the advantages that the characteristic vectors of the Laplace matrix of the sample data are clustered based on the graph theory, so that the clustering of the sample data is realized; when the data sample distribution exhibits non-spherical shapes, it can be identified and processed.
In a possible implementation manner, classifying the m position information according to a Dbscan clustering algorithm to obtain a K position information set may include the following 2 steps: 1) firstly, determining the radius r and minPoints; starting from an arbitrary data point which has not been visited, and taking the point as the center, whether the number of points contained in a circle with r as the radius is greater than or equal to minPoints or not is judged, if so, the point is marked as central point, otherwise, the point is marked as noise point. 2) Repeating the step 1), if a noise point exists in a circle with a radius of a certain central point, marking the point as an edge point, and otherwise, still indicating the noise point. Repeat step 1) knowing that all points have been visited.
The Dbscan clustering algorithm provided by the embodiment of the invention has the advantages that the density-based algorithm does not need to know the number of the class clusters, and the number of the class clusters of the data samples to be processed can be automatically obtained through calculation.
In a possible implementation manner, classifying the m position information according to the aggregation hierarchical clustering in the hierarchical clustering algorithm to obtain a K position information set, may include the following 3 steps: 1) each data point is treated as a single cluster, then a metric is selected that measures the distance between two clusters, the distances between all individuals are calculated, and the two samples that are closest together are found. 2) And taking the small group as a new individual, calculating the distance between all the individuals and the rest individuals, finding out two individuals closest to each other to form a group, and repeating the steps. 3) Finally, until the desired number of clusters is obtained.
The hierarchical clustering algorithm provided by the embodiment of the invention has the advantages that the hierarchical clustering structure on different granularities can be obtained by setting different relevant parameter values; in terms of cluster shape, hierarchical clustering is applicable to clustering of arbitrary shapes and is insensitive to the input order of samples. The selection of the distance measurement standard is not sensitive, the number of the class clusters does not need to be known, and the number of the class clusters can be divided subjectively.
Step S203: determining a first set with the highest reliability in the K position information sets according to a preset reliability condition, wherein K is an integer which is greater than 1 and less than N;
specifically, the preset credibility condition may include that the number of location information in the location information set is used as a condition for judging whether the location information set is credible. The determining of whether the position information set is authentic by using the number of position information in the position information set as a condition for determining whether the position information set is authentic may specifically include the following three conditions: 1. calculating the ratio of the number of the position information in each position information set to m; and determining the position information set corresponding to the maximum ratio as the first set with the highest credibility. 2. Calculating the ratio of the number of the position information in each position information set to m; and determining a position information set corresponding to the ratio above a preset value as the first set with the highest reliability. 3. And comparing the quantity of the position information in each position information set, and then sorting the position information sets according to the sequence from the big quantity to the small quantity, wherein the corresponding position information set with the largest quantity is taken as the first set with the highest credibility. Wherein, the setting of the preset value may include: the method is set by the user according to the practical experience of the user; or the server sets a default numerical value corresponding to the data characteristic according to the data characteristic of the acquired position information. The reliability conditions in the steps of the method are applicable to the clustering results of the 7 clustering algorithms exemplified in step S202.
In one possible implementation manner, the ratio of the number of the position information in each position information set to m is calculated; determining the position information set corresponding to the maximum ratio as the first set with the highest credibility. Referring to fig. 16, fig. 16 shows the amount of position information in the position information sets and the ratio of the amount of position information to m counted according to fig. 15, as shown in fig. 16, the ratio is the ratio of the amount of position information to m in each position information set, the magnitude of the ratio is compared, and the ratio is determined as
Figure BDA0001784885620000141
The position information set of (2) is the first set with the highest credibility; wherein the value of m is the sum of the number of the position information in each position information set.
Step S204: and determining the regional position of the vehicle service shop where the vehicle diagnosis equipment is located according to the position information in the first set.
Specifically, the first set is a first set c with the highest credibility in the K position information sets(j)The cluster center point of the first set is mu(j)K is more than 0 and less than or equal to K, and K is an integer.
In one possible implementation, the first set c is determined(j)And a cluster center point μ of said first set(j)Then, using the cluster center point mu(j)Determining position information in a circular area in the first set, wherein the circular area is a round point and has a radius of a preset numerical value; and determining the area position of the vehicle maintenance factory according to the position information in the circular area. Referring to fig. 17, fig. 17 is a schematic diagram of determining locations of areas of a vehicle service facility in the first set of the screen of fig. 16 according to the embodiment of the present invention, and as shown in fig. 17, solid circular areas in the first set of the dashed oval areas are determined according to the operations in the possible implementation manners described above; then, the area location of the vehicle service shop is determined from the location information within the solid line circular area. The preset value of the radius can be set according to the experience of the user or other reasonable modes.
Step S205: and acquiring vehicle related data acquired by the vehicle-mounted diagnosis equipment at a position corresponding to the position information in the first set.
Specifically, the vehicle-related data may include the following 3 cases: 1. the number of vehicles diagnosed by the vehicle diagnostic device over a period of time; 2. frequency of vehicle diagnostic device usage over a period of time; 3. and counting vehicle fault problems detected by the vehicle diagnosis equipment. Step S205 is to process the vehicle related data in the vehicle service shop after step S204 is executed. Step S205 is an optional step in the embodiment of the present invention.
As mentioned in step S202, fig. 3 is a table for arranging and presenting m pieces of location information, as shown in fig. 3, where the location information x is provided by the embodiment of the present invention(i)Expressed with respect to longitude
Figure BDA0001784885620000142
Latitude
Figure BDA0001784885620000143
Ordered real pairs of
Figure BDA0001784885620000144
The embodiment of the invention provides a method for sorting and displaying m pieces of position information in a table; wherein i is more than 0 and less than or equal to m, and i is an integer.
Fig. 4 is a data point distribution diagram of m pieces of position information provided by the embodiment of the present invention, and as shown in fig. 4, scales marked on the horizontal axis and the vertical axis are used only for explaining the embodiment of the present invention, and the scales marked in specific use are selected according to the obtained specific data characteristics. Position information x in the figure(i)Data points expressed as two-dimensional vectors, K ═ 3, the dots represent location information data points, and the cross points represent cluster center points of the respective clusters. Due to the position information x(i)The storage form and the arrangement mode of the data processing system are various and are not limited to the combination for processing the data information.
FIG. 5 shows the data of FIG. 4 passing through the ith sample x(i)The cluster to which it belongs
Figure BDA0001784885620000145
And each of said cluster center points
Figure BDA0001784885620000146
The calculated classification result map of (1). By analogy, fig. 6-14 are graphs of classification results of location information data points after each cycle step of the K-means algorithm; fig. 15 shows a final classification result of the m position information obtained by the K-means clustering algorithm according to the embodiment of the present invention, where as shown in fig. 15, the position of the clustering center is different from the initial position thereof, and is a determination result obtained by reasonable calculation.
According to the embodiment of the invention, the acquired position information of a large number of vehicle diagnosis devices is classified according to a clustering algorithm, the positioning precision is improved, then a first set with the highest reliability is screened out according to a preset reliability condition, the influence of individual wrong position information is reduced, and finally, the target position information in the set is effectively utilized in the first set according to a certain rule to determine the region position of a vehicle maintenance factory where the vehicle diagnosis devices are located. The method and the device can enhance the credibility of the regional position result, reduce the calculation complexity and facilitate the processing of a large amount of data.
The method of the embodiments of the present invention is explained in detail above, and the related apparatus of the embodiments of the present invention is provided below. The embodiment of the apparatus is also mainly described by taking a K-Means algorithm as an example, wherein the application of the classification unit refers to possible implementation manners of other clustering algorithms.
Referring to fig. 18, fig. 18 is a schematic structural diagram of a vehicle diagnostic device data processing apparatus according to an embodiment of the present invention, where the vehicle diagnostic device data processing apparatus 18 may include: a first obtaining unit 1801, a classification unit 1802, a filtering unit 1803, a determination unit 1804, and a second obtaining unit 1805. Wherein the content of the first and second substances,
a first acquisition unit 1801 for acquiring m pieces of positional information of the vehicular diagnostic apparatus;
a classification unit 1802, configured to classify the m position information according to a clustering algorithm, so as to obtain a K position information set;
in a possible implementation, a classification unit 1802 for classifying the m bits according to the K-Means algorithmThe information is classified to obtain K position information sets, as shown in fig. 5-15, which may include: selecting K clustering center points from a sample set formed by the m position information; wherein the set of samples is { x(1),x(2),...,x(m)},x(i)∈Rn,x(i)Position information of an ith sample in the sample set, i ═ 1, 2.·, m; the central point set formed by the K cluster central points is { mu(1),μ(2),...,μ(K)},μ(j)∈Rn,μ(j)Position information of a jth clustering center point of the set of center points, j being 1, 2. RnIs n-dimensional vector space, and n is an integer greater than or equal to 1; according to formula c(j)=arg minj||x(i)(j)||2Calculate the ith sample x(i)The cluster c to which it belongs(j)(ii) a Wherein, | | x(i)(j)||2Is x(i)To mu(j)Square of Euclidean distance of (d), arg minj||x(i)(j)||2Is when mu(j)When the cluster center point is reached, let to μ(j)All arguments x for taking the minimum of the sum of squares of the euclidean distances of(i)A set of (a); according to the formula
Figure BDA0001784885620000151
Updating each of the cluster center points mu(j)Up to a distortion function
Figure BDA0001784885620000152
Converging; wherein, c(j)Is a cluster of a class j,
Figure BDA0001784885620000153
location information x for said cluster of classes j(i)
Figure BDA0001784885620000161
For the sum of the characteristics of all location information in the class cluster j,
Figure BDA0001784885620000162
to count the number of location information in the class cluster j,
Figure BDA0001784885620000163
the Euclidean distance square sum of each position information of the class cluster j to the cluster center point of the class cluster j is obtained; and when the distortion function is converged, the obtained K clusters correspond to the K position information sets.
The K-Means clustering algorithm provided by the embodiment of the invention has the advantages of simple and convenient calculation, quick and simple algorithm, higher efficiency on large data sets, and suitability for mining large-scale data sets and processing data-intensive clusters.
In a possible implementation manner, the classifying unit 1802, configured to classify the m position information according to a K-Medians algorithm, to obtain a K position information set, may include: 1) the sample set of position information samples is { x }(1),x(2),...,x(m)},x(i)∈Rn(ii) a And selecting K cluster center points from the sample set. 2) According to formula c(j)=arg minj||x(i)(j)||2Calculating the class c to which each sample i belongs(j)Namely, the category with the minimum Euclidean distance from the position information sample to the cluster center point of the cluster to which the position information sample belongs. 3) Calculating the median of each cluster, and determining the center μ of each cluster(j). 4) And continuously repeating the steps 2) and 3) until stable K clusters and cluster center points corresponding to the K clusters are determined.
The K-Medians clustering algorithm provided by the embodiment of the invention has the advantages that the central point is calculated by using the median of the data, and the calculation result is prevented from being influenced by abnormal data.
In a possible implementation manner, the classifying unit 1802 is configured to classify the m position information according to a mean shift clustering algorithm to obtain K position information sets, and may include: 1) determining the radius r of a sliding window, and starting sliding by using a circular sliding window with the radius r of a randomly selected clustering center point C; moving to a more dense region in each iteration until convergence. 2) Each time sliding is carried out to a new area, calculating the average value in the sliding window as a central point, wherein the number of points in the sliding window is the density in the window; in each movement, the window moves towards a more dense region. 3) Moving the window, calculating the center point within the window and the density within the window until there is no direction to accommodate more points within the window, i.e., moving until the density within the circle no longer increases. 4) And 3), generating a plurality of sliding windows, reserving the window containing the most points when the sliding windows are overlapped, clustering according to the sliding window where the data points are positioned, and finally forming a stable window, namely a central point set and a corresponding grouping cluster.
The mean shift clustering algorithm provided by the embodiment of the invention has the advantages that the number of the clusters does not need to be known, and the number of the clusters can be automatically classified through the calculation of the algorithm; during the calculation process, the clustering centers are gathered towards the maximum point density and are less influenced by the data mean.
In a possible implementation, the classifying unit 1802, configured to classify the m location information according to a maximum Expectation (EM) clustering algorithm using a gaussian mixture (GMM) model to obtain K location information sets, may include: 1) the number of clusters is selected and the gaussian distribution parameters (mean and variance) of each cluster are initialized randomly, or a relatively accurate mean and variance may be given first based on the data. 2) Given the Gaussian distribution of each class cluster, calculating the probability that each data point belongs to each class cluster; the closer a point is to the center of the gaussian distribution, the more likely it belongs to the cluster. 3) Calculating the gaussian distribution parameters based on these probabilities maximizes the probability of a data point, and these new parameters can be calculated using a weighting of the probability of a data point, which is the probability that the data point belongs to the class of clusters. 4) Iterations 2) and 3) are repeated until the variation in the iterations is not large.
The maximum Expectation (EM) clustering algorithm using a Gaussian mixture (GMM) model provided in the embodiment of the present invention has an advantage in that, since GMMs use mean and standard deviation, identifiable cluster-like shapes may be elliptical, not limited to circular; because GMMs use probabilities, a data point can belong to multiple clusters, improving the accuracy of the calculation.
In a possible implementation manner, the classifying unit 1802, configured to classify the m position information according to a spectral clustering algorithm to obtain K position information sets, may include the following 4 steps: 1) the a sample and the b sample are similar in measurement, namely Gaussian similarity
Figure BDA0001784885620000171
Wherein sigma is a hyper-parameter, a is more than or equal to 1 and less than or equal to m, b is more than or equal to 1 and less than or equal to m, a is not equal to b, and a and b are integers. 2) Forming a similarity matrix W ═ Sabm x m, symmetric matrix, wherein SaaIt should be equal to 1, but for the convenience of calculation, 0 is written, so the similarity matrix becomes a symmetric matrix with 0 on the main diagonal. 3) Calculating the sum d of the similarity of the a-th sample to all other samplesa=Sa1+Sa2+…+Sa(m-1)(with respect to S)aSome of them use only the first K S' S to form K-classaAdding or setting a threshold value, truncating S less than the threshold valuea) (ii) a In graph theory daCalled degree, can be understood as the weight of the connecting edge. Degree d of all pointsaA degree matrix D (diagonal matrix) is formed. 4) Forming a laplacian matrix L ═ D-W, L being a symmetric semi-positive definite matrix, the minimum eigenvalue being 0, the corresponding eigenvectors being all 1 vectors. Arranging the eigenvalues of L from small to large, lambda1...λmCorresponding feature vector u1...umIf the clustering into K classes is required, the eigenvectors corresponding to the first K eigenvalues are taken to form a matrix UmK, such that the feature corresponding to the first sample is considered to be u11,u12,...,u1KThe second sample is characterized by u21,u22,...,u2KThe characteristic of the mth sample is uml,um2,...,umKAnd performing K mean value on the m samples, and finally, obtaining the clustering result of the m samples, namely the clustering result of the original position information.
The spectral clustering algorithm provided by the embodiment of the invention has the advantages that the clustering of the sample data is realized by clustering the characteristic vector of the Laplacian matrix of the sample data based on the graph theory; when the data sample distribution exhibits non-spherical shapes, it can be identified and processed.
In a possible implementation manner, the classifying unit 1802 is configured to classify the m position information according to a Dbscan clustering algorithm to obtain K position information sets, and may include: 1) firstly, determining the radius r and minPoints; starting from an arbitrary data point which has not been visited, and taking the point as the center, whether the number of points contained in a circle with r as the radius is greater than or equal to minPoints or not is judged, if so, the point is marked as central point, otherwise, the point is marked as noise point. 2) Repeating the step 1), if a noise point exists in a circle with a radius of a certain central point, marking the point as an edge point, and otherwise, still indicating the noise point. Repeat step 1) knowing that all points have been visited.
The Dbscan clustering algorithm provided by the embodiment of the invention has the advantages that the density-based algorithm does not need to know the number of the class clusters, and the number of the class clusters of the data samples to be processed can be automatically obtained through calculation.
In a possible implementation manner, the classifying unit 1802, configured to classify the m position information according to a hierarchical clustering of agglomerations in a hierarchical clustering algorithm, so as to obtain K position information sets, may include: 1) each data point is treated as a single cluster, then a metric is selected that measures the distance between two clusters, the distances between all individuals are calculated, and the two samples that are closest together are found. 2) And taking the small group as a new individual, calculating the distance between all the individuals and the rest individuals, finding out two individuals closest to each other to form a group, and repeating the steps. 3) Until the desired number of clusters is obtained.
The hierarchical clustering algorithm provided by the embodiment of the invention has the advantages that the hierarchical clustering structure on different granularities can be obtained by setting different relevant parameter values; in terms of cluster shape, hierarchical clustering is applicable to clustering of arbitrary shapes and is insensitive to the input order of samples. The selection of the distance measurement standard is not sensitive, the number of the class clusters does not need to be known, and the number of the class clusters can be divided subjectively.
A screening unit 1803, configured to determine, according to a preset reliability condition, a first set with the highest reliability in the K location information sets;
in a possible implementation manner, the screening unit 1803 is configured to calculate ratios of the number of location information in each location information set to m, respectively; and determining the position information set corresponding to the maximum ratio as the first set with the highest credibility.
A determining unit 1804, configured to determine, according to the position information in the first set, a region position of a vehicle maintenance factory where the vehicle diagnostic apparatus is located.
In a possible implementation manner, the determining unit 1804 is configured to, after the screening unit executes the corresponding content, enable the screening module, specifically, to enable the first set with the highest confidence level in the determined K sets of location information to be c(j)And the cluster center point of the first set is mu(j)J is K, K is more than 0 and less than or equal to K, and K is an integer; determining the first set c(j)And a cluster center point μ of said first set(j)Then, using the cluster center point mu(j)Determining position information in a circular area in the first set, wherein the circular area is a round point and has a radius of a preset numerical value; and determining the area position of the vehicle maintenance factory according to the position information in the circular area.
A second obtaining unit 1805, configured to obtain vehicle-related data collected at a position corresponding to the position information in the first set of the vehicle-mounted diagnostic device. The second obtaining unit 1805 is an optional unit in this embodiment of the present invention.
As shown in fig. 19, fig. 19 is a schematic structural diagram of an apparatus according to an embodiment of the present invention. The vehicle diagnostic device processing apparatus 18 may be implemented in the structure shown in fig. 19, and the device 19 may include at least one storage component 1901, at least one communication component 1902, and at least one processing component 1903. In addition, the device may also include general components such as an antenna, a power supply, etc., which will not be described in detail herein.
The storage component 1901 may be a Read-Only Memory (ROM) or other types of static storage devices that can store static information and instructions, a Random Access Memory (RAM) or other types of dynamic storage devices that can store information and instructions, an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical Disc storage, optical Disc storage (which may include Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integral to the processor.
The communication component 1902 may be a device for communicating with other devices or communication networks, such as an upgrade server, a key server, a device inside a vehicle, etc.
The processing unit 1903 may be a general purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs according to the above schemes.
When the apparatus shown in fig. 19 is the vehicle diagnostic apparatus data processing device 18, the processing means acquires m pieces of positional information of the vehicle diagnostic apparatus; classifying the m position information according to a clustering algorithm to obtain a K position information set; determining a first set with the highest reliability in the K position information sets according to a preset reliability condition; and determining the regional position of the vehicle service shop where the vehicle diagnosis equipment is located according to the position information in the first set.
It should be noted that, for the functions of each functional unit of the vehicle diagnostic device data processing apparatus 18 described in the embodiment of the apparatus of the present invention, reference may be made to the description of the vehicle diagnostic device data processing method in the method embodiment described in fig. 2 to 17, and details are not repeated herein.
An embodiment of the present invention further provides a computer storage medium, where the computer storage medium may store a program, and the program may include some or all of the steps described in the above method embodiments when executed.
Embodiments of the present invention also provide a computer program, which may include instructions that, when executed by a computer, cause the computer to perform some or all of the steps including any one of the method embodiments described above.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus can be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. The elements of the above device embodiments may or may not be physically separated, and some or all of the elements may be selected according to actual needs to achieve the purpose of the solution of the embodiments of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product.
Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and may include several instructions to enable a computer device (which may be a personal computer, a server, a network device, or the like, and may specifically be a processor in the computer device) to execute all or part of the steps of the above methods according to the embodiments of the present invention. Among them, the aforementioned storage medium may include: a U-disk, a removable hard disk, a magnetic disk, an optical disk, a Read-Only Memory (ROM) or a Random Access Memory (RAM), and the like. The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A data processing method, comprising:
acquiring m pieces of position information of vehicle diagnosis equipment, wherein the m pieces of position information are self geographical position information acquired by the vehicle diagnosis equipment during vehicle diagnosis, and m is an integer greater than 2;
selecting K clustering center points from a sample set formed by the m position information; wherein the set of samples is { x(1),x(2),…,x(m)},x(i)∈Rn,x(i)Position information of the ith sample in the sample set, i is 1, 2, …, m; the central point set formed by the K cluster central points is { mu(1)(2),…,μ(K)},μ(j)∈Rn,μ(j)Position information of the jth cluster center point of the center point set, j being 1, 2, …, K; rnIs n-dimensional vector space, and n is an integer greater than or equal to 1;
according to formula c(j)=arg minj||x(i)(j)||2Calculate the ith sample x(i)The cluster c to which it belongs(j)(ii) a Wherein, | | x(i)(j)||2Is x(i)To mu(j)Square of Euclidean distance of (d), arg minj||x(i)(j)||2Is when mu(j)When the cluster center point is reached, let to μ(j)All arguments x for taking the minimum of the sum of squares of the euclidean distances of(i)A set of (a);
according to the formula
Figure FDA0003109141730000011
Updating each of the cluster center points mu(j)Up to a distortion function
Figure FDA0003109141730000012
Converging; wherein, c(j)Is a cluster of a class j,
Figure FDA0003109141730000016
location information x for said cluster of classes j(i)
Figure FDA0003109141730000013
For all positions in the class cluster jThe characteristics of the information and the information,
Figure FDA0003109141730000014
to count the number of location information in the class cluster j,
Figure FDA0003109141730000015
the Euclidean distance square sum of each position information of the class cluster j to the cluster center point of the class cluster j is obtained;
when the distortion function is converged, the obtained K clusters correspond to the K position information sets;
determining a first set with the highest reliability in the K position information sets according to a preset reliability condition, wherein K is an integer which is greater than 1 and less than N;
and determining the regional position of the vehicle service shop where the vehicle diagnosis equipment is located according to the position information in the first set.
2. The method according to claim 1, wherein the first set with the highest confidence level in the K sets of location information is c(j)And the cluster center point of the first set is mu(j),j=k,0<K is less than or equal to K, and K is an integer; the determining the regional position of the vehicle service shop where the vehicle diagnosis device is located according to the position information in the first set includes:
determining the first set c(j)And a cluster center point μ of said first set(j)Then, using the cluster center point mu(j)Determining position information in a circular area in the first set, wherein the circular area is a round point and has a radius of a preset numerical value;
and determining the area position of the vehicle maintenance factory according to the position information in the circular area.
3. The method according to any one of claims 1 or 2, wherein the determining a first set with the highest reliability from the K sets of position information according to a preset reliability condition comprises:
calculating the ratio of the quantity of the position information in each position information set to m;
and determining the position information set corresponding to the maximum ratio as the first set with the highest credibility.
4. The method according to any one of claims 1 or 2, wherein after determining the regional location of the vehicle service shop to which the vehicle diagnostic device belongs, further comprising:
and acquiring vehicle related data acquired by the vehicle diagnosis equipment at a position corresponding to the position information in the first set.
5. A data processing apparatus, comprising:
the vehicle diagnosis device comprises a first acquisition unit, a second acquisition unit and a control unit, wherein the first acquisition unit is used for acquiring m pieces of position information of the vehicle diagnosis device, the m pieces of position information are self geographical position information acquired by the vehicle diagnosis device when the vehicle diagnosis device carries out vehicle diagnosis, and m is an integer larger than 2;
a classification unit to:
selecting K clustering center points from a sample set formed by the m position information; wherein the set of samples is { x(1),x(2),…,x(m)},x(i)∈Rn,x(i)Position information of the ith sample in the sample set, i is 1, 2, …, m; the central point set formed by the K cluster central points is { mu(1)(2),…,μ(K)},μ(j)∈Rn,μ(j)Position information of the jth cluster center point of the center point set, j being 1, 2, …, K; rnIs n-dimensional vector space, and n is an integer greater than or equal to 1;
according to formula c(j)=arg minj||x(i)(j)||2Calculate the ith sample x(i)The cluster c to which it belongs(j)(ii) a Wherein, | | x(i)(j)||2Is x(i)To mu(j)European style distance ofSquare of ion, arg minj||x(i)(j)||2Is when mu(j)When the cluster center point is reached, let to μ(j)All arguments x for taking the minimum of the sum of squares of the euclidean distances of(i)A set of (a);
according to the formula
Figure FDA0003109141730000031
Updating each of the cluster center points mu(j)Up to a distortion function
Figure FDA0003109141730000032
Converging; wherein, c(j)Is a cluster of a class j,
Figure FDA0003109141730000036
location information x for said cluster of classes j(i)
Figure FDA0003109141730000033
For the sum of the characteristics of all location information in the class cluster j,
Figure FDA0003109141730000034
to count the number of location information in the class cluster j,
Figure FDA0003109141730000035
the Euclidean distance square sum of each position information of the class cluster j to the cluster center point of the class cluster j is obtained;
when the distortion function is converged, the obtained K clusters correspond to the K position information sets;
the screening unit is used for determining a first set with the highest reliability in the K position information sets according to a preset reliability condition, wherein K is an integer which is greater than 1 and smaller than N;
and the determining unit is used for determining the regional position of the vehicle maintenance factory where the vehicle diagnostic equipment is located according to the position information in the first set.
6. The apparatus according to claim 5, wherein the screening unit is specifically configured to:
c is the first set with the highest credibility in the K position information sets(j)And the cluster center point of the first set is mu(j),j=k,0<K is less than or equal to K, and K is an integer;
the determining unit is specifically configured to:
determining the first set c(j)And a cluster center point μ of said first set(j)Then, using the cluster center point mu(j)Determining position information in a circular area in the first set, wherein the circular area is a round point and has a radius of a preset numerical value; and determining the area position of the vehicle maintenance factory according to the position information in the circular area.
7. The apparatus according to any one of claims 5 or 6, wherein the screening unit is specifically configured to:
calculating the ratio of the quantity of the position information in each position information set to m; and determining the position information set corresponding to the maximum ratio as the first set with the highest credibility.
8. The apparatus of any one of claims 5 or 6, further comprising:
and the second acquisition unit is used for acquiring the vehicle related data acquired by the vehicle diagnosis equipment at the position corresponding to the position information in the first set after the determination unit determines the region position of the vehicle maintenance factory to which the vehicle diagnosis equipment belongs.
CN201811010287.0A 2018-08-31 2018-08-31 Data processing method and device Active CN109189876B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811010287.0A CN109189876B (en) 2018-08-31 2018-08-31 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811010287.0A CN109189876B (en) 2018-08-31 2018-08-31 Data processing method and device

Publications (2)

Publication Number Publication Date
CN109189876A CN109189876A (en) 2019-01-11
CN109189876B true CN109189876B (en) 2021-09-10

Family

ID=64917632

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811010287.0A Active CN109189876B (en) 2018-08-31 2018-08-31 Data processing method and device

Country Status (1)

Country Link
CN (1) CN109189876B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109982291B (en) * 2019-03-01 2020-08-14 同济大学 Method for detecting weak connection of Internet of vehicles with infrastructure in urban scene
CN111339294B (en) * 2020-02-11 2023-07-25 普信恒业科技发展(北京)有限公司 Customer data classification method and device and electronic equipment
CN111459162B (en) * 2020-04-07 2021-11-16 珠海格力电器股份有限公司 Standby position planning method and device, storage medium and computer equipment
CN112801193B (en) * 2021-02-03 2023-04-07 拉扎斯网络科技(上海)有限公司 Positioning data processing method and device, electronic equipment and medium
CN113505691B (en) * 2021-07-09 2024-03-15 中国矿业大学(北京) Coal rock identification method and identification credibility indication method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104488004A (en) * 2012-05-23 2015-04-01 实耐宝公司 Methods and systems for providing vehicle repair information
CN104636354A (en) * 2013-11-07 2015-05-20 华为技术有限公司 Position point of interest clustering method and related device
CN105960021A (en) * 2016-07-07 2016-09-21 济南东朔微电子有限公司 Improved position fingerprint indoor positioning method
JP2017151043A (en) * 2016-02-26 2017-08-31 株式会社デンソー Object recognition device and program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104488004A (en) * 2012-05-23 2015-04-01 实耐宝公司 Methods and systems for providing vehicle repair information
CN104636354A (en) * 2013-11-07 2015-05-20 华为技术有限公司 Position point of interest clustering method and related device
JP2017151043A (en) * 2016-02-26 2017-08-31 株式会社デンソー Object recognition device and program
CN105960021A (en) * 2016-07-07 2016-09-21 济南东朔微电子有限公司 Improved position fingerprint indoor positioning method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
改进模糊C均值聚类法的车辆实际行驶工况构建;高建平等;《河南科技大学学报(自然科学版)》;20171231;第38卷(第6期);第21-27页 *

Also Published As

Publication number Publication date
CN109189876A (en) 2019-01-11

Similar Documents

Publication Publication Date Title
CN109189876B (en) Data processing method and device
US10176246B2 (en) Fast grouping of time series
US20190166024A1 (en) Network anomaly analysis apparatus, method, and non-transitory computer readable storage medium thereof
Monteiro et al. Fitting isochrones to open cluster photometric data-A new global optimization tool
CN108763420A (en) Sorting technique, device, terminal and the computer readable storage medium of data object
CN108764726B (en) Method and device for making decision on request according to rules
CN105824855B (en) Method and device for screening and classifying data objects and electronic equipment
CN112257801B (en) Incremental clustering method and device for images, electronic equipment and storage medium
CN106033425A (en) A data processing device and a data processing method
CN107203772B (en) User type identification method and device
CN110796159A (en) Power data classification method and system based on k-means algorithm
CN111062431A (en) Image clustering method, image clustering device, electronic device, and storage medium
CN106610977B (en) Data clustering method and device
CN115688760A (en) Intelligent diagnosis guiding method, device, equipment and storage medium
CN112437053A (en) Intrusion detection method and device
CN115309906A (en) Intelligent data classification technology based on knowledge graph technology
CN110348516B (en) Data processing method, data processing device, storage medium and electronic equipment
Diao et al. Clustering by Detecting Density Peaks and Assigning Points by Similarity‐First Search Based on Weighted K‐Nearest Neighbors Graph
CN114430530B (en) Space division method, apparatus, device, medium, and program product
CN113850346B (en) Edge service secondary clustering method and system for multi-dimensional attribute perception in MEC environment
CN112800138B (en) Big data classification method and system
Saxena et al. Evolving efficient clustering patterns in liver patient data through data mining techniques
CN114610825A (en) Method and device for confirming associated grid set, electronic equipment and storage medium
CN115878989A (en) Model training method, device and storage medium
CN108154162A (en) A kind of clustering method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant