CN113379454A - Data processing method and device, electronic equipment and storage medium - Google Patents
Data processing method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN113379454A CN113379454A CN202110644205.3A CN202110644205A CN113379454A CN 113379454 A CN113379454 A CN 113379454A CN 202110644205 A CN202110644205 A CN 202110644205A CN 113379454 A CN113379454 A CN 113379454A
- Authority
- CN
- China
- Prior art keywords
- broker
- travel
- character string
- cluster
- collaborated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title abstract description 24
- 238000000034 method Methods 0.000 claims abstract description 36
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 30
- 238000012545 processing Methods 0.000 claims abstract description 23
- 238000007621 cluster analysis Methods 0.000 claims abstract description 12
- 230000015654 memory Effects 0.000 claims description 21
- 238000004590 computer program Methods 0.000 claims description 19
- 238000004458 analytical method Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 7
- 238000004891 communication Methods 0.000 abstract description 5
- 230000000977 initiatory effect Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 238000010606 normalization Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0204—Market segmentation
- G06Q30/0205—Location or geographical consideration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/103—Workflow collaboration or project management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/16—Real estate
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Data Mining & Analysis (AREA)
- Entrepreneurship & Innovation (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Operations Research (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Game Theory and Decision Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the disclosure discloses a data processing method, a data processing device, an electronic device and a storage medium, wherein the method comprises the following steps: determining a corresponding travel character string based on the work travel data of any broker in brokers to be collaborated, wherein the travel character string is used for representing work events of any broker in any time block within a set time period; and carrying out cluster analysis on the travel character strings of the broker to be collaborated based on an improved DBSCAN algorithm to obtain a cluster division result, wherein the cluster division result is used for indicating a recommendation result of a collaborator. The embodiment of the disclosure can find out brokers with similar itineraries, achieve the purpose of enabling a broker initiating a cooperation alliance to seek the best cooperation broker, ensure that the cooperation brokers can perform efficient communication, and improve the operation effect of broker cooperative work.
Description
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.
Background
In the real estate industry, in order to better maintain the customer resources and maximally improve the transaction rate of the customer resources, a working mode of a broker cooperation alliance is provided, wherein the working mode of the broker cooperation alliance refers to the cooperative work of a plurality of brokers to jointly maintain a part of the customer resources so as to improve the transaction rate of the customer resources.
In the related art, when one broker initiates a cooperation alliance, since the operation habits of other brokers are not known, generally, the known broker or the broker in the same store tends to be selected for cooperation, but the mode and the habit of the selected cooperative broker for maintaining the operation may be greatly different, so that the selected cooperative broker cannot perform efficient communication, and further the effect of the cooperative operation of the broker cannot be optimal.
Disclosure of Invention
One technical problem to be solved by the embodiments of the present disclosure is: a data processing method, an apparatus, an electronic device and a storage medium are provided.
According to an aspect of the embodiments of the present disclosure, there is provided a data processing method, including:
determining a corresponding travel character string based on the work travel data of any broker in brokers to be collaborated, wherein the travel character string is used for representing work events of any broker in any time block within a set time period;
and carrying out cluster analysis on the travel character strings of the broker to be collaborated based on an improved DBSCAN algorithm to obtain a cluster division result, wherein the cluster division result is used for indicating a recommendation result of a collaborator.
In an embodiment of the present disclosure, the performing cluster analysis on the travel character string of the broker to be collaborated based on the improved DBSCAN algorithm to obtain a cluster division result includes:
determining the travel character string of the broker to be collaborated as a clustering object of the improved DBSCAN algorithm;
and clustering the travel character strings of the broker to be cooperated based on a set neighborhood distance threshold and a sample number threshold in the neighborhood, wherein the distance between different travel character strings is determined based on the editing distance between the travel character strings.
In yet another embodiment of the present disclosure, the method further comprises:
and calculating the editing distance of any two journey character strings in the journey character strings of the broker to be cooperated, wherein the editing distance is the minimum number of editing operations required for converting one journey character string into another journey character string.
In another embodiment of the present disclosure, the determining a corresponding travel character string based on the job travel data of any broker among brokers to be collaborated includes:
carrying out standard processing on the operation travel data of any broker according to the division mode of time blocks in a set time period to obtain standardized travel data;
based on the normalized trip data, a corresponding trip string is determined.
In another embodiment of the present disclosure, after performing cluster analysis on the travel character string of the broker to be collaborated based on an improved DBSCAN algorithm to obtain a cluster division result, the method further includes:
receiving an operation instruction for triggering the cooperative alliance;
determining a corresponding cluster from the cluster division result based on the identification information of the broker triggering the operation instruction;
outputting information of brokers in the cluster.
According to still another aspect of the embodiments of the present disclosure, there is provided a data processing apparatus, the apparatus including:
the system comprises a character string determining module, a processing module and a processing module, wherein the character string determining module is used for determining a corresponding travel character string based on the operation travel data of any broker in brokers to be collaborated, and the travel character string is used for representing the operation events of any broker in any time block in a set time period;
and the clustering module is used for carrying out clustering analysis on the travel character string of the broker to be collaborated based on the improved DBSCAN algorithm to obtain a cluster division result, and the cluster division result is used for indicating a recommendation result of the collaborator.
In one embodiment of the present disclosure, the clustering module includes:
the first determining submodule is used for determining the travel character string of the broker to be cooperated as a clustering object of the improved DBSCAN algorithm;
and the clustering submodule is used for clustering the travel character strings of the broker to be cooperated based on a set neighborhood distance threshold and a sample number threshold in the neighborhood, wherein the distance between different travel character strings is determined based on the editing distance between the travel character strings.
In yet another embodiment of the present disclosure, the apparatus further comprises:
and the distance calculation module is used for calculating the editing distance of any two journey character strings in the journey character strings of the brokers to be collaborated, wherein the editing distance is the minimum number of editing operations required for converting one journey character string into another journey character string.
In still another embodiment of the present disclosure, the character string determination module includes:
the standardized sub-module is used for carrying out standardized processing on the operation travel data of any broker according to the division mode of time blocks in a set time period to obtain standardized travel data;
and the second determining submodule is used for determining a corresponding stroke character string based on the normalized stroke data.
In yet another embodiment of the present disclosure, the apparatus further comprises:
the receiving module is used for receiving an operation instruction for triggering the cooperative alliance;
the cluster determining module is used for determining a corresponding cluster from the cluster dividing result based on the identification information of the broker triggering the operation instruction;
and the information output module is used for outputting the information of the brokers in the cluster.
According to still another aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a memory for storing a computer program;
a processor for executing the computer program stored in the memory, and when the computer program is executed, the data processing method is realized.
According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described data processing method.
According to yet another aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the above-described data processing method.
Based on the data processing method and device, the electronic device and the storage medium provided by the embodiments of the present disclosure, the corresponding journey character string is determined according to the broker-Based job journey data, different characters in the journey character string are used for representing different job events, so that the job journey of the broker is represented in a character string form, then the journey character string of the broker to be cooperated can be clustered and analyzed Based on an improved Noise-Based Density Clustering algorithm (DBSCAN for short), a cluster division result is obtained, and the broker is clustered according to journey similarity. Therefore, the embodiment of the disclosure can find out the brokers with similar itineraries, achieve the purpose of enabling the broker initiating the cooperation alliance to seek the best cooperation broker, ensure that the cooperation brokers can perform efficient communication, and improve the operation effect of the cooperation of the brokers.
The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:
FIG. 1 is a flow diagram of one embodiment of a data processing method of the present disclosure;
FIG. 2A is a flow chart of yet another embodiment of a data processing method of the present disclosure;
FIG. 2B is a schematic diagram of a travel string of the data processing method of the present disclosure;
FIG. 3 is a flow chart of yet another embodiment of a data processing method of the present disclosure;
FIG. 4 is a schematic block diagram of one embodiment of a data processing apparatus according to the present disclosure;
FIG. 5 is a schematic block diagram of a data processing apparatus according to yet another embodiment of the present disclosure;
fig. 6 is a block diagram of an electronic device according to an exemplary embodiment of the disclosure.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
The disclosed embodiments may be applied to electronic devices such as computer systems/servers, which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as computer systems/servers, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.
The electronic device, such as a computer system/server, may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
Summary of the disclosure
The technical scheme provided by the embodiment of the disclosure is used for improving the effect of cooperative operation of the cooperative alliance in the house property transaction.
At present, when a broker initiates a cooperation alliance, generally the broker is apt to select known brokers or brokers in stores to cooperate with the known brokers, but the known brokers are probably not the best choice for cooperation, and for a large number of unknown brokers, because the habits and modes of maintenance work of the unknown brokers are not known, a broker with similar work habits is difficult to select to cooperate with the unknown brokers, so that the cooperative work effect of the cooperation alliance cannot achieve the best effect.
Exemplary embodiments
FIG. 1 is a flow diagram of one embodiment of a data processing method of the present disclosure; the data processing method can be applied to a real estate transaction platform, as shown in fig. 1, and comprises the following steps:
in step 101, a corresponding trip character string is determined based on the job trip data of any one of the brokers to be collaborated.
In one embodiment, the travel string is used to characterize a job event that occurred at any time block within a set period of time by the any broker. The travel character string is a time series characteristic in which the job travel data of one broker is displayed in a time block manner, and a job event on each time block can be represented by one character, for example, a set time period is 24 hours, the length of the time block is 1 hour, each time block corresponds to one job event, and different job events can be displayed by different characters.
In one embodiment, the set time period may be one day (24 hours), or two days, and the length of this time period may be set by the developer according to the operating status of the brokers, for example, if the operating cycle of each broker is one day and the operating status of each day is similar, the set time period may be set to one day.
In an embodiment, the length of the time block may also be preset, and the length of each time block may be one hour, two hours, or half an hour. The length of the time block is determined based on the operation habits of the broker, for example, if the time length required for each operation event, such as source follow-up, source bring and watch, source talk, source maintenance, etc., is about one hour, the length of the time block may be set to one hour.
In one embodiment, the broker to be collaborated refers to a broker that can collaborate, that is, a broker that supports a broker collaboration alliance and has a collaborative work idea.
In one embodiment, the job schedule data refers to information about job events that occur within a set time period, for example, job events that occur in a passenger source appointment at 11: 03-12: 05.
In one embodiment, since the job schedule data of the broker may not be normative, that is, a job event may not occur in a time block, but may cross time blocks, for example, a job event of a customer source negotiation occurs in 11: 03-12: 05, the job schedule data needs to be normalized and then converted into a schedule character string, for example, a job event of a customer source negotiation occurs in 11: 00-12: 00 times when the job schedule data of "job event of a customer source negotiation occurs in 11: 03-12: 05" is normalized.
In one embodiment, different job events are represented by different numbers, as shown in fig. 2B, which illustrates a travel string for displaying job travel data, and the same job event in the travel string may be characterized by the same character, for example, the job event of the customer source negotiation corresponds to a character of 5.
In step 102, a travel character string of the broker to be collaborated is subjected to cluster analysis based on the improved DBSCAN algorithm, so as to obtain a cluster division result, wherein the cluster division result is used for indicating a recommendation result of the collaborator.
In one embodiment, to measure the difference in job strokes (broker's stroke strings) of different brokers, an edit distance of any two stroke strings in the stroke strings of brokers to be collaborated may be calculated, where the edit distance is the minimum number of edit operations required to convert one stroke string to another to form a string. The edit distance is the minimum number of edit operations required to convert one run string to another. Generally, the editing operation includes an insertion operation, a deletion operation, and a replacement operation, and their operation costs are all 1. For example, since the stroke string 111111112233144552213311 and the stroke string 111111112233144552212211 have two different characters, i.e., 4 th from the last and 3 rd from the last, the edit distance between the two stroke strings is 2.
In an embodiment, character strings with similar strokes may be grouped into a class by using the improved DBSCAN algorithm, in this embodiment, when the DBSCAN algorithm measures differences between samples, an edit distance is used for measurement, so that a plurality of clusters with similar strokes, that is, a cluster division result, may be obtained.
In the above steps 101 to 102, a corresponding stroke character string is determined based on the operation stroke data of the broker, different characters in the stroke character string are used for representing different operation events, so that the operation stroke of the broker is expressed in a character string form, then, cluster analysis can be performed on the stroke character string of the broker to be collaborated based on an improved DBSCAN algorithm, so as to obtain cluster division results, a sample in each cluster in the cluster division results is the broker with similar stroke, when the broker performs a collaboration alliance, the broker with similar stroke can be recommended to the broker requesting to perform the collaboration alliance according to the cluster division results, so as to achieve the purpose of recommending the best collaboration partner.
To better illustrate the data processing scheme of the present disclosure, another embodiment is described below.
FIG. 2A is a flow chart of yet another embodiment of a data processing method of the present disclosure, and FIG. 2B is a schematic diagram of a trip string of the data processing method of the present disclosure; the embodiment exemplifies how to perform clustering by the DBSCAN algorithm to determine the broker of similar trips, as shown in fig. 2A, including the following steps:
in step 201, a corresponding travel character string is determined based on the job travel data of any one of the brokers to be collaborated.
In an embodiment, the specific implementation manner of step 201 may refer to the description of step 101 in the embodiment shown in fig. 1, and is not described in detail here.
In step 202, the travel character string of the broker to be collaborated is determined as the clustering object of the improved DBSCAN algorithm.
In an embodiment, by using the travel character strings as clustering objects, whether the travel of the brokers is similar or not can be determined according to the similarity of the travel character strings (the similarity is determined by editing distance), so that the brokers with similar travel are classified into one class, and the samples with dissimilar travel are not classified into one class.
In one embodiment, when determining run strings as cluster objects, the edit distance may be used to measure the similarity between run strings.
In step 203, based on the set neighborhood distance threshold and the number threshold of samples in the neighborhood, the travel character strings of the brokers to be collaborated are clustered.
In an embodiment, the neighborhood distance threshold parameter e of the DBSCAN algorithm and the number threshold value MinPts of samples in the neighborhood may be preset, for example, according to the guiding principle of the DBSCAN algorithm, MinPts is not less than dim +1, dim represents the dimension of the data to be clustered, because the first few most similar clusters are expected to be found by using the DBSCAN algorithm, so the MinPts may be preset to be 10 or about 10 numbers.
In one embodiment, e describes the neighborhood distance threshold for a certain sample, and to ensure that brokers with similar trips can be found after clustering, so the value of e is not too large, e can be set according to the length of the trip string, e.g., if the length of the trip string is 24, e can be preset to 2.
In one embodiment, the distance between different run strings is determined by the edit distance between run strings. In specific implementation, two sequences Tr can be determined by formula (1)1=<p1,p2,p3,…,pn>And Tr2=<q1,q2,q3,…,qm>Edit distance of (d):
in the formula (1), wherein, Tr1(n-1)Represents a channel formed by Tr1The first n-1 point in (1), similarly, Tr2(m-1)Represents a channel formed by Tr2The first m-1 point in (1)
Wherein, two sequences Tr1=<p1,p2,p3,…,pn>And Tr2=<q1,q2,q3,...,qm>It may correspond to two stroke strings, and thus the edit distance of the two stroke strings may be calculated by the above equation (1).
Through the steps 201 to 203, the embodiment can realize clustering of the travel character strings through the set neighborhood distance threshold, the sample number threshold in the neighborhood, and the edit distance between the travel character strings, so as to obtain the classification of brokers with similar travel.
Fig. 3 is a flowchart of an embodiment of real-time data warehouse monitoring of the data processing method of the present disclosure, which is exemplarily illustrated how to implement recommending a collaboration broker according to job travel data, as shown in fig. 3, and includes the following steps:
in step 301, the job trip data of any broker is normalized according to the division manner of the time blocks in the set time period, so as to obtain normalized trip data.
In one embodiment, the job schedule data refers to information of job events that occur within a set time period by the broker, where the job events mainly refer to job types without paying attention to areas of real jobs.
In one embodiment, since the job schedule data of the broker may not be normative, that is, a job event may not occur in a time block, but may cross time blocks, for example, a job event of a customer source negotiation occurs in 11: 03-12: 05, the job schedule data needs to be normalized and then converted into a schedule character string, for example, a job event of a customer source negotiation occurs in 11: 00-12: 00 times when the job schedule data of "job event of a customer source negotiation occurs in 11: 03-12: 05" is normalized.
In one embodiment, the normalized travel data obtained after the normalization process is job data that can be accurately mapped to each time block.
In step 302, based on the normalized trip data, a corresponding trip string is determined.
In an embodiment, the specific implementation manner of step 302 may refer to the description of step 101 in the embodiment shown in fig. 1, and is not described in detail here.
In step 303, cluster analysis is performed on the travel character strings of the broker to be collaborated based on the improved DBSCAN algorithm, so as to obtain a cluster division result, where the cluster division result is used to indicate a recommendation result of the broker.
In one embodiment, the input to the DBSCAN algorithm is a sample set D ═ (x1, x 2.., xm), neighborhood parameters (e, MinPts), distance metrics for trip strings using edit distance.
The output is the cluster division result C.
The process of cluster analysis using the DBSCAN algorithm is as follows:
(1) initializing a set of core objectsInitializing cluster number k equal to 0, initializing sample set Γ equal to D, and cluster partitioning
(2) For j ═ 1,2,. m, finding an e-neighborhood subsample set N e (xj) of the sample xj in a distance measurement mode; and if the number of the samples in the subsample set meets the condition that the element is larger than or equal to MinPts and belongs to the element (xj) |, adding the sample xj into the core object sample set: Ω ═ ω { xj }.
(4) In the core object set omega, a core object o is randomly selected, a current cluster core object queue omega cur ═ o }, a class sequence number k ═ k +1 are initialized, a current cluster sample set Ck ═ o } is initialized, and an unaccessed sample set Γ ═ Γ - { o } is updated.
(5) If the current cluster core object queueAnd (3) after the current cluster Ck is generated, updating the cluster partition C to be { C1, C2.., Ck }, updating the core object set omega to be omega-Ck, and turning to the step (3), otherwise, updating the core object set omega to be omega-Ck.
(6) Taking out a core object o 'from the current cluster core object queue Ω cur, finding out all the e-neighborhood subsample sets N e (o') by using the neighborhood distance threshold e, making Δ e (o ') nΓ), updating the current cluster sample set Ck ═ Ck Δ, updating the unaccessed sample set Γ - Δ, updating Ω cur ≧ Ω cur § Δ (Δ ≧ Ω) -o', and turning to step 5.
The output result is: cluster division result C ═ C1, C2.
In an embodiment, identification information of brokers with similar itineraries, such as job numbers or names of brokers, is recorded in each cluster in the cluster division result.
In step 304, an operation instruction triggering the cooperative alliance is received.
In step 305, a corresponding cluster is determined from the cluster division result based on the identification information of the broker that triggered the operation instruction.
In an embodiment, if there is a broker that triggers an operation of a cooperative alliance, that is, an operation of searching for a cooperative broker, a corresponding cluster may be determined from a cluster division result according to identification information of the broker that triggers an operation order.
In step 306, information of brokers in the cluster is output.
In one embodiment, information of the brokers in the determined cluster is output to the broker seeking the collaborating broker, and the broker may select the broker to collaborate according to the recommended result.
Through steps 301 to 306, in this embodiment, a corresponding travel character string is determined based on the job travel data of the broker, different characters in the travel character string are used for representing different job events, so that the job travel of the broker is represented in a character string form, then, the travel character string of the broker to be collaborated is subjected to cluster analysis based on an improved DBSCAN algorithm, so as to obtain cluster division results, a sample in each cluster in the cluster division results is the broker with similar travel, and when the broker performs a collaboration alliance, the broker with similar travel to the cluster division results can be recommended to the broker requesting to perform the collaboration alliance according to the cluster division results, so as to achieve the purpose of optimal collaboration partner recommendation.
Corresponding to the embodiment of the data processing method, the disclosure also provides a corresponding embodiment of the data processing device.
Fig. 4 is a schematic structural diagram of an embodiment of a data processing apparatus according to the present disclosure, which is applied to a real estate transaction platform, as shown in fig. 4, the apparatus includes:
a character string determination module 41, configured to determine, based on the job schedule data of any broker among the brokers to be collaborated, a corresponding schedule character string, where the schedule character string is used to characterize a job event that occurs to the any broker at any time block within a set time period;
and the clustering module 42 is configured to perform clustering analysis on the travel character strings of the broker to be collaborated based on the improved DBSCAN algorithm to obtain a cluster division result, where the cluster division result is used to indicate a recommendation result of the broker.
Fig. 5 is a schematic structural diagram of a data processing apparatus according to another embodiment of the present disclosure, as shown in fig. 5, and based on the embodiment shown in fig. 4, in an embodiment, the clustering module 42 includes:
the first determining submodule 421 is configured to determine a travel character string of a broker to be collaborated as a clustering object of the improved DBSCAN algorithm;
and the clustering submodule 422 is configured to cluster the travel character strings of the broker to be collaborated based on the set neighborhood distance threshold and the number threshold of samples in the neighborhood, where the distance between different travel character strings is determined based on the edit distance between the travel character strings.
In an embodiment, the apparatus further comprises:
and the distance calculation module 43 is used for calculating the editing distance of any two journey character strings in the journey character strings of the brokers to be collaborated, wherein the editing distance is the minimum number of editing operations required for converting one journey character string into another journey character string.
In one embodiment, the string determining module 41 includes:
the normalization sub-module 411 is configured to perform normalization processing on the operation trip data of any broker according to a division manner of time blocks in a set time period, so as to obtain normalized trip data;
a second determining sub-module 412 for determining a corresponding trip string based on the normalized trip data.
In an embodiment, the apparatus further comprises:
a receiving module 44, configured to receive an operation instruction for triggering a cooperative alliance;
a cluster determining module 45, configured to determine, based on the identification information of the broker that triggers the operation instruction, a corresponding cluster from the cluster division result;
and an information output module 46, configured to output information of the brokers in the cluster.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the disclosed solution. One of ordinary skill in the art can understand and implement it without inventive effort.
In the following, an electronic device according to an embodiment of the present disclosure is described with reference to fig. 6, in which an apparatus implementing a method according to an embodiment of the present disclosure may be integrated. Fig. 6 is a block diagram of an electronic device according to an exemplary embodiment of the disclosure, and as shown in fig. 6, the electronic device 6 includes one or more processors 61, one or more memories 62 of a computer-readable storage medium, and a computer program stored on the memories and executable on the processors. The above-described data processing method can be implemented when the program of the memory 62 is executed.
In particular, in practical applications, the electronic device may further include an input device 63, an output device 64, and the like, which are interconnected via a bus system and/or other types of connection mechanisms (not shown). Those skilled in the art will appreciate that the configuration of the electronic device shown in fig. 6 is not intended to be limiting of the electronic device and may include more or fewer components than shown, or certain components, or a different arrangement of components. Wherein:
the processor 61 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities that performs various functions and processes data by running or executing software programs and/or modules stored in the memory 62 and invoking data stored in the memory 62 to thereby monitor the electronic device as a whole.
The memory 62 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer readable storage medium and executed by the processor 61 to implement the data processing methods of the various embodiments of the present disclosure above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.
The input device 63 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
The output device 64 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 64 may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.
The electronic device may further include a power supply for supplying power to the various components, and may be logically connected to the processor 61 via a power management system, so as to implement functions of managing charging, discharging, and power consumption via the power management system. The power supply may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
Of course, for simplicity, only some of the components of the electronic device 6 relevant to the present disclosure are shown in fig. 6, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device 6 may include any other suitable components, depending on the particular application.
In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising a computer program/instructions that, when executed by a processor, cause the processor to perform the steps in the data processing method according to various embodiments of the present disclosure described in the above-mentioned "exemplary methods" section of this specification.
The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in a data processing method according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.
A computer-readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
The description of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the disclosure in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.
Claims (10)
1. A method of data processing, the method comprising:
determining a corresponding travel character string based on the work travel data of any broker in brokers to be collaborated, wherein the travel character string is used for representing work events of any broker in any time block within a set time period;
and carrying out cluster analysis on the travel character strings of the broker to be collaborated based on an improved DBSCAN algorithm to obtain a cluster division result, wherein the cluster division result is used for indicating a recommendation result of a collaborator.
2. The method of claim 1, wherein the performing cluster analysis on the travel character string of the broker to be collaborated based on the improved DBSCAN algorithm to obtain a cluster division result comprises:
determining the travel character string of the broker to be collaborated as a clustering object of the improved DBSCAN algorithm;
and clustering the travel character strings of the broker to be cooperated based on a set neighborhood distance threshold and a sample number threshold in the neighborhood, wherein the distance between different travel character strings is determined based on the editing distance between the travel character strings.
3. The method of claim 2, further comprising:
and calculating the editing distance of any two journey character strings in the journey character strings of the broker to be cooperated, wherein the editing distance is the minimum number of editing operations required for converting one journey character string into another journey character string.
4. The method of claim 1, wherein determining a corresponding travel string based on job travel data of any one of the brokers to be collaborated comprises:
carrying out standard processing on the operation travel data of any broker according to the division mode of time blocks in a set time period to obtain standardized travel data;
based on the normalized trip data, a corresponding trip string is determined.
5. The method according to claim 1, wherein the performing cluster analysis on the travel character strings of the broker to collaborate based on the improved DBSCAN algorithm to obtain a cluster division result further comprises:
receiving an operation instruction for triggering the cooperative alliance;
determining a corresponding cluster from the cluster division result based on the identification information of the broker triggering the operation instruction;
outputting information of brokers in the cluster.
6. A data processing apparatus, characterized in that the apparatus comprises:
the system comprises a character string determining module, a processing module and a processing module, wherein the character string determining module is used for determining a corresponding travel character string based on the operation travel data of any broker in brokers to be collaborated, and the travel character string is used for representing the operation events of any broker in any time block in a set time period;
and the clustering module is used for carrying out clustering analysis on the travel character string of the broker to be collaborated based on the improved DBSCAN algorithm to obtain a cluster division result, and the cluster division result is used for indicating a recommendation result of the collaborator.
7. The apparatus of claim 6, wherein the clustering module comprises:
the first determining submodule is used for determining the travel character string of the broker to be cooperated as a clustering object of the improved DBSCAN algorithm;
and the clustering submodule is used for clustering the travel character strings of the broker to be cooperated based on a set neighborhood distance threshold and a sample number threshold in the neighborhood, wherein the distance between different travel character strings is determined based on the editing distance between the travel character strings.
8. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing a computer program stored in the memory, and when executed, implementing the method of any of the preceding claims 1-5.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of the preceding claims 1 to 5.
10. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the method of any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110644205.3A CN113379454A (en) | 2021-06-09 | 2021-06-09 | Data processing method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110644205.3A CN113379454A (en) | 2021-06-09 | 2021-06-09 | Data processing method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113379454A true CN113379454A (en) | 2021-09-10 |
Family
ID=77573329
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110644205.3A Pending CN113379454A (en) | 2021-06-09 | 2021-06-09 | Data processing method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113379454A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107798557A (en) * | 2017-09-30 | 2018-03-13 | 平安科技(深圳)有限公司 | Electronic installation, the service location based on LBS data recommend method and storage medium |
CN110263230A (en) * | 2019-04-25 | 2019-09-20 | 北京科技大学 | A kind of data cleaning method and device based on Density Clustering |
CN110348526A (en) * | 2019-07-15 | 2019-10-18 | 武汉绿色网络信息服务有限责任公司 | A kind of device type recognition methods and device based on semi-supervised clustering algorithm |
CN110633950A (en) * | 2019-07-19 | 2019-12-31 | 北京无限光场科技有限公司 | Task information processing method and device, electronic equipment and storage medium |
CN111930791A (en) * | 2020-05-28 | 2020-11-13 | 中南大学 | Similarity calculation method and system for vehicle track and storage medium |
CN112070577A (en) * | 2020-08-31 | 2020-12-11 | 深圳市卡牛科技有限公司 | Commodity recommendation method, system, equipment and medium |
CN112699955A (en) * | 2021-01-08 | 2021-04-23 | 广州新科佳都科技有限公司 | User classification method, device, equipment and storage medium |
-
2021
- 2021-06-09 CN CN202110644205.3A patent/CN113379454A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107798557A (en) * | 2017-09-30 | 2018-03-13 | 平安科技(深圳)有限公司 | Electronic installation, the service location based on LBS data recommend method and storage medium |
CN110263230A (en) * | 2019-04-25 | 2019-09-20 | 北京科技大学 | A kind of data cleaning method and device based on Density Clustering |
CN110348526A (en) * | 2019-07-15 | 2019-10-18 | 武汉绿色网络信息服务有限责任公司 | A kind of device type recognition methods and device based on semi-supervised clustering algorithm |
CN110633950A (en) * | 2019-07-19 | 2019-12-31 | 北京无限光场科技有限公司 | Task information processing method and device, electronic equipment and storage medium |
CN111930791A (en) * | 2020-05-28 | 2020-11-13 | 中南大学 | Similarity calculation method and system for vehicle track and storage medium |
CN112070577A (en) * | 2020-08-31 | 2020-12-11 | 深圳市卡牛科技有限公司 | Commodity recommendation method, system, equipment and medium |
CN112699955A (en) * | 2021-01-08 | 2021-04-23 | 广州新科佳都科技有限公司 | User classification method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2941754B1 (en) | Social media impact assessment | |
CN103513983B (en) | method and system for predictive alert threshold determination tool | |
CN109753356A (en) | A kind of container resource regulating method, device and computer readable storage medium | |
CN108292383B (en) | Automatic extraction of tasks associated with communications | |
CN110990445B (en) | Data processing method, device, equipment and medium | |
US10832169B2 (en) | Intelligent service negotiation using cognitive techniques | |
US20210365350A1 (en) | Determination method and storage medium | |
CN112070545A (en) | Method, apparatus, medium, and electronic device for optimizing information reach | |
Qiu et al. | Data mining–based disturbances prediction for job shop scheduling | |
CN113538154A (en) | Risk object identification method and device, storage medium and electronic equipment | |
CN112035749A (en) | User behavior characteristic analysis method, recommendation method and corresponding devices | |
US11507447B1 (en) | Supervised graph-based model for program failure cause prediction using program log files | |
Prakash et al. | Big data preprocessing for modern world: opportunities and challenges | |
CN111666191B (en) | Data quality monitoring method and device, electronic equipment and storage medium | |
CN114065063A (en) | Information processing method, information processing apparatus, storage medium, and electronic device | |
Liu et al. | A predictive analytics tool to provide visibility into completion of work orders in supply chain systems | |
CN115151926A (en) | Enhanced processing for communication workflows using machine learning techniques | |
CN112395517A (en) | House resource searching and displaying method and device and computer readable storage medium | |
CN113379454A (en) | Data processing method and device, electronic equipment and storage medium | |
CN113312513B (en) | Object recommendation method and device, electronic equipment and storage medium | |
CN116225848A (en) | Log monitoring method, device, equipment and medium | |
US10529002B2 (en) | Classification of visitor intent and modification of website features based upon classified intent | |
US10152693B2 (en) | Mechanism for monitoring collaboration | |
Leong | Mental modeling of entrepreneurial opportunity based on the principle of information visualization | |
CN112906723A (en) | Feature selection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |