CN112231392A - Civil aviation customer source data analysis method, electronic equipment and computer readable storage medium - Google Patents

Civil aviation customer source data analysis method, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN112231392A
CN112231392A CN202011178439.5A CN202011178439A CN112231392A CN 112231392 A CN112231392 A CN 112231392A CN 202011178439 A CN202011178439 A CN 202011178439A CN 112231392 A CN112231392 A CN 112231392A
Authority
CN
China
Prior art keywords
identification number
data
mobile phone
phone identification
place
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011178439.5A
Other languages
Chinese (zh)
Inventor
曾林华
冯景亮
江敏婷
林勖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Airport Baiyun Information Technology Co ltd
Original Assignee
Guangdong Airport Baiyun Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Airport Baiyun Information Technology Co ltd filed Critical Guangdong Airport Baiyun Information Technology Co ltd
Priority to CN202011178439.5A priority Critical patent/CN112231392A/en
Publication of CN112231392A publication Critical patent/CN112231392A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/244Grouping and aggregation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23211Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with adaptive number of clusters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Business, Economics & Management (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Fuzzy Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a civil aviation passenger source data analysis method, electronic equipment and a computer readable storage medium. The method comprises the following steps: s1, acquiring LBS data corresponding to a mobile phone identification number according to the mobile phone identification number of a historical passenger; s2, analyzing the place where each mobile phone identification number is normally held by taking the mobile phone identification number as a statistical dimension; s3, aggregating the flight data and the LBS data by taking the start-stop place and the start-stop time as aggregation conditions; s4, sorting the association degrees of the mobile phone identification number and the flight data, selecting the flight data with the top association degree sorting, and determining the corresponding relation between the flight data and the mobile phone identification number; and S5, forming a user portrait by carrying out data processing and model adaptation on the corresponding relation and suitable for a specific service scene. The invention forms the user portrait by aggregating the LBS data and the flight data and provides the user portrait for the airport, thereby not only facilitating the travel of passengers, but also improving the efficiency of resource scheduling and decision execution in all aspects of the airport and increasing the economic benefit.

Description

Civil aviation customer source data analysis method, electronic equipment and computer readable storage medium
Technical Field
The invention relates to analysis and processing of passenger source data, in particular to a passenger source data analysis method for civil aviation, electronic equipment and a computer readable storage medium.
Background
The current airport visualization system has a single analysis on flight data, and with the increasing civil aviation demands, the flight data generated by airports are increased remarkably, besides basic statistical analysis is carried out according to the data, the data can be aggregated and analyzed with other data, a multi-dimensional correlation analysis result is generated, and more data values are explored.
Disclosure of Invention
The invention provides a civil aviation customer source data analysis method, electronic equipment and a computer readable storage medium to solve or partially solve the defects of the prior art.
In order to realize the purpose, the technical scheme of the invention is as follows:
according to one aspect of the invention, a civil aviation passenger source data analysis method is provided, which comprises the following steps:
s1, acquiring LBS data corresponding to a mobile phone identification number according to the mobile phone identification number of a historical passenger;
s2, analyzing the place where each mobile phone identification number is normally held by taking the mobile phone identification number as a statistical dimension;
s3, aggregating the flight data and the LBS data by taking the start-stop place and the start-stop time as aggregation conditions;
s4, sorting the association degrees of the mobile phone identification number and the flight data, selecting the flight data with the top association degree sorting, and determining the corresponding relation between the flight data and the mobile phone identification number;
and S5, forming a user portrait by carrying out data processing and model adaptation on the corresponding relation and suitable for a specific service scene.
Further, in step S1, the LBS data and the flight history data are cleaned, and the data with abnormal values and missing values and redundant fields are filtered.
Further, in step S2, the method of analyzing the place of daily use information is to extract the place of daily use from the LBS data of the mobile phone identification number.
Further, the extracting a place for daily use from the LBS data of the mobile phone identification number further includes: and calculating the area with the most mobile phone identification number as a daily place based on the ISODATA algorithm.
Further, step S2 is to execute the ISODATA algorithm to obtain the LBS data with the strongest association with the mobile phone identification number, and corresponding to the specific area as the place of daily use, so as to obtain the association relationship between the mobile phone identification number and the place of daily use.
Further, in step S3, the method for aggregating flight data and LBS data includes: and aggregating the association relation of the mobile phone identification number-the place of residence obtained after the LBS data processing and the flight data according to the start-stop place and the time period as aggregation conditions to obtain a data item set of the mobile phone identification number-the flight data as an affair set T.
Further, in step S4, the sorting the association degree between the mobile phone identification number and the flight data further includes: and calculating the association degree of each flight data and the mobile phone identification number by adopting an Apriori algorithm, and sequencing based on the association degree.
Further, the Apriori algorithm comprises one or more of the following a to C:
A. in the transaction set T, traversing the frequent item set layer by layer, discovering the frequent item set by using a generation-test strategy, after each iteration, generating a new candidate item set by the frequent item set discovered in the previous iteration, then counting the support degree of each candidate, comparing the support degree with a minimum support degree threshold value, and removing the candidate with the support degree lower than the threshold value;
the b.i (a, B) metric is interpreted as: when I (a, B) ═ 1, a and B are independent; i (A, B) >1 is positively correlated; i (a, B) <1 is negative correlation;
apriori algorithm correlation is defined by pearson correlation coefficients.
In accordance with another aspect of the present invention, there is provided an electronic apparatus, wherein the electronic apparatus includes:
a processor; and the number of the first and second groups,
a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the method.
According to another aspect of the present invention, there is provided a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method.
The invention aggregates LBS data and flight data, finds out the corresponding relation between the mobile phone identification number and the passenger frequent residence and flight data, and associates and stores the mobile phone identification number with other valuable data to preliminarily form the user portrait. According to the positioning information of the user, the distribution of the customer source and the habits of the user in the aspects of travel and diet entertainment can be analyzed, and the portrait of the user is gradually improved; the data can be provided for each airport and airline company, and used as references of all aspects such as traffic scheduling, advertisement delivery, and peripheral merchant entrance, which not only facilitates the travel of passengers, but also improves the efficiency of resource scheduling and decision execution in all aspects and increases economic benefits.
The above description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the description and other objects, features, and advantages of the present invention more comprehensible.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like elements throughout the drawings.
In the drawings:
FIG. 1 illustrates a flow chart of an embodiment of the present invention for performing aggregation of flight data with LBS data;
FIG. 2 shows a flow chart of an implementation of steps S3-S5 in the method of the present invention;
FIG. 3 is a schematic structural diagram of an electronic device according to the present invention;
fig. 4 is a schematic structural diagram of a computer-readable storage medium according to the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The embodiment is implemented based on an electronic device, such as a computer device, where the electronic device includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the program, as shown in fig. 1 to 2, the following steps of analyzing and processing civil aviation passenger source data are implemented:
s1, acquiring LBS data corresponding to a historical passenger mobile phone identification number according to the mobile phone identification number, cleaning the LBS data and historical flight data, and filtering data and redundant fields with abnormal values and missing values;
s2, analyzing a daily place of each mobile phone identification number for storage by taking the mobile phone identification number as a statistical dimension;
s3, aggregating the flight data and the LBS data by taking the start-stop place and the start-stop time as aggregation conditions;
s4, sorting the association degrees of the mobile phone identification numbers and the flight data, selecting the first 5 flight data with the highest association degrees, and storing the corresponding relation between the flight data and the mobile phone identification numbers into a database;
and S5, storing the processed data in a warehouse, processing the data suitable for a specific service scene and adapting to a model to form a user image, and displaying the user image in a visual and visual chart form.
The LBS data and the flight data are historical data.
In the step of analyzing and processing civil aviation customer source data, the LBS data and the flight data are aggregated, the corresponding relation between the mobile phone identification number and the passenger frequent residence and flight data is excavated, and the mobile phone identification number and other valuable data are associated and stored to preliminarily form the user portrait. According to the positioning information of the user, the distribution of the customer source and the habits of the user in the aspects of travel and diet entertainment can be analyzed, and the portrait of the user is gradually improved; the data can be provided for each airport and airline company, and used as references of all aspects such as traffic scheduling, advertisement delivery, and peripheral merchant entrance, which not only facilitates the travel of passengers, but also improves the efficiency of resource scheduling and decision execution in all aspects and increases economic benefits.
As an embodiment, in step S2, the method for analyzing the information of the places of daily use is to extract the places of daily use from the LBS data of the mobile phone identification number, and the summary scheme is to calculate the area with the most mobile phone identification number as the places of daily use based on the ISODATA algorithm. The specific method comprises the following steps:
selecting N LBS data mode samples by taking the mobile phone identification number as a dimension:
{xi,i=1,2,...,N}
preselection of NcAn initial cluster center
Figure RE-GDA0002833539510000045
It may not equal the required number of cluster centers and its initial position may be arbitrarily chosen from the samples.
Defining:
k is the preselected number of cluster centers;
θNthe minimum number of samples in each cluster domain, less than this number, is not considered as oneIndependent clustering;
θSthe standard deviation of the sample distance distribution in one cluster domain;
θcif the distance is less than the minimum distance between the centers of the two clusters, merging the two clusters;
l is the maximum logarithm of the cluster centers that can be merged in one iteration;
i is this number of iterative operations.
Assigning N LBS data pattern samples to the nearest cluster SjIf Dj=min{||x-zi||,i=1,2,…NcI.e. | | x-zjIf the distance of | | is the minimum, x ∈ Sj
If S isjIs less than thetaNThen the subset of samples is cancelled, at which time NcMinus 1. The cluster centers are then corrected:
Figure RE-GDA0002833539510000041
computing each cluster domain SjAverage distance between the middle LBS data pattern sample and each cluster center:
Figure RE-GDA0002833539510000042
calculating the total average distance of all LBS data pattern samples and the corresponding cluster centers thereof:
Figure RE-GDA0002833539510000043
calculate the standard deviation vector of the sample distances in each cluster: sigmaj=(σ1j,σ2j,...,σnj)TWherein the components of the vector are:
Figure RE-GDA0002833539510000044
where i is 1,2, N is the dimension of the sample feature vector, and j is 1,2cIs the number of clusters, NjIs SjNumber of samples in (1).
Calculate each norm vector { σj,j=1,2,...,NcMaximum component of { sigma }, by { sigma }jmax,j=1,2,...,NcRepresents.
At any maximum component set { σjmax,j=1,2,...,NcIn the specification, if there is ajmax>θSAnd one of the following two conditions is satisfied at the same time:
1.
Figure RE-GDA0002833539510000055
and Nj>2(θN+1), i.e. SjThe total number of the middle samples exceeds more than half of the specified value;
2.
Figure RE-GDA0002833539510000051
then z will bejSplit into two new cluster centers, and NcAnd adding 1.
Calculating the distance of all cluster centers: dij=||zi-zj||,i=1,2,...,Nc-1,j=i+1,...,NcComparison DijAnd thetacA value of Dij<θcAre ordered in increasing order of minimum distance, i.e.:
Figure RE-GDA0002833539510000052
in the formula (I), the compound is shown in the specification,
Figure RE-GDA0002833539510000053
will be at a distance of
Figure RE-GDA0002833539510000056
Two cluster centers of
Figure RE-GDA0002833539510000057
And
Figure RE-GDA0002833539510000058
mixing to obtainGoing to the new center is:
Figure RE-GDA0002833539510000054
in the formula, the two cluster center vectors to be combined are weighted by the number of samples in the cluster domain, respectively, so that
Figure RE-GDA0002833539510000059
Is a true average vector.
And circulating the processing flow to obtain LBS data with the strongest association with the mobile phone identification number, corresponding to a specific area as a place of daily living, obtaining the association relation between the mobile phone identification number and the place of daily living, and storing the result into a database.
As an embodiment, in step S3, the method for generating the data item set to be analyzed by aggregating the flight data and the LBS data includes:
and aggregating the association relation of the mobile phone identification number-the place of residence obtained after the LBS data processing and the cleaned flight data according to the starting place and the stopping place and the time period as aggregation conditions to obtain a large number of data item sets of the mobile phone identification number-the flight data as an transaction set T.
Then, for an implication expression like X- > Y (X and Y are disjoint sets of terms), the strength of the association rule can be measured in terms of support (support) and confidence (confidence):
the support degree is as follows: s (X → Y) ═ δ (X utoy)/N, determining how frequently a rule can be used for a given dataset;
confidence coefficient: c (X → Y) ═ δ (X ═ Y)/δ (X), and determines how frequently U occurs in transactions containing X.
Based on this, in step S4, the association between the mobile phone identification number and the flight data is sorted by calculating the association between each piece of flight data and the mobile phone identification number by using Apriori algorithm, and sorting is performed based on the association. Specifically, the traversal and association rules in the algorithm are set as:
(1) traversal of the mobile phone identification number-flight data set is as follows:
in the transaction set T, the frequent item set is traversed layer by layer, and the frequent item set is discovered by using a generation-test strategy. After each iteration, a new candidate item set is generated from the frequent item set found in the previous iteration, then the support degree of each candidate is counted and compared with the minimum support degree threshold value, and the candidate with the support degree lower than the threshold value is removed.
In the traversal, the complexity influencing factors are as follows:
the computational complexity is: support threshold, number of entries, number of transactions, average width of transactions.
Time complexity: frequent 1-item set generation, candidate generation, support count.
(2) The promotion degree (namely the confidence degree of the rule and the support degree of the item set appearing in the rule back-part) is selected to be 1, namely the support degree of the rule back-part is greater than the confidence degree of the rule;
(3) the I (a, B) metric is interpreted as: when 1, a and B are independent; 1 is positively correlated; negative correlation at < 1;
(4) the correlation is defined by the pearson correlation coefficient.
And 4, based on the association rule, starting an Apriori algorithm to calculate the correlation between the mobile phone identification number and the flight data set, grouping and ordering the obtained result set according to the mobile phone identification number, taking out 5 groups of flight data sources with the highest correlation to combine with the mobile phone identification number to obtain a final corresponding relation, and returning the final corresponding relation to the result set for storage.
In step S5, the adaptation is to associate fields such as a monitoring area number, a departure date, an arrival date, a first arrival city, a transfer method, and the like of the airport with the mobile phone identification number.
It should be noted that:
the method used in this embodiment can be converted into program steps and apparatuses that can be stored in a computer storage medium, and the program steps and apparatuses are implemented by means of calling and executing by a controller, wherein the apparatuses should be understood as functional modules implemented by a computer program.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
For example, fig. 3 shows a schematic structural diagram of an electronic device according to an embodiment of the invention. The electronic device conventionally comprises a processor 31 and a memory 32 arranged to store computer-executable instructions (program code). The memory 32 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 32 has a storage space 33 storing program code 34 for performing any of the method steps in the embodiments. For example, the storage space 33 for the program code may comprise respective program codes 34 for implementing respective steps in the above method. The program code can be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a computer readable storage medium such as described in fig. 4. The computer readable storage medium may have memory segments, memory spaces, etc. arranged similarly to the memory 32 in the electronic device of fig. 3. The program code may be compressed, for example, in a suitable form. In general, the memory unit stores program code 41 for performing the steps of the method according to the invention, i.e. program code readable by a processor such as 31, which when run by an electronic device causes the electronic device to perform the individual steps of the method described above.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (10)

1. A civil aviation customer source data analysis method is characterized by comprising the following steps:
s1, acquiring LBS data corresponding to a mobile phone identification number according to the mobile phone identification number of a historical passenger;
s2, analyzing the place where each mobile phone identification number is normally held by taking the mobile phone identification number as a statistical dimension;
s3, aggregating the flight data and the LBS data by taking the start-stop place and the start-stop time as aggregation conditions;
s4, sorting the association degrees of the mobile phone identification number and the flight data, selecting the flight data with the top association degree sorting, and determining the corresponding relation between the flight data and the mobile phone identification number;
and S5, forming a user portrait by carrying out data processing and model adaptation on the corresponding relation and suitable for a specific service scene.
2. The method of claim 1, wherein in step S1, the LBS data and the flight history data are cleaned, and data with abnormal values and missing values and redundant fields are filtered.
3. The method of claim 1, wherein the method of analyzing the place of residence information in step S2 is to extract the place of residence from LBS data of the cell phone identification number.
4. The method of claim 3, wherein the extracting a place of daily use from the LBS data for the cell phone identification number, further comprises: and calculating the area with the most mobile phone identification number as a daily place based on the ISODATA algorithm.
5. The method according to claim 4, wherein step S2 is executed to obtain the LBS data with the strongest association with the mobile phone identification number corresponding to the specific region as the place of daily living, and further obtain the association relationship between the mobile phone identification number and the place of daily living.
6. The method of claim 5, wherein the method of aggregating flight data and LBS data in step S3 comprises: and aggregating the association relation of the mobile phone identification number-the place of residence obtained after the LBS data processing and the flight data according to the start-stop place and the time period as aggregation conditions to obtain a data item set of the mobile phone identification number-the flight data as an affair set T.
7. The method of claim 6, wherein the step of ranking the association of the mobile phone identification number with the flight data in step S4 further comprises: and calculating the association degree of each flight data and the mobile phone identification number by adopting an Apriori algorithm, and sequencing based on the association degree.
8. The method of claim 7, wherein the Apriori algorithm comprises one or more of the following a-C:
A. in the transaction set T, traversing the frequent item set layer by layer, discovering the frequent item set by using a generation-test strategy, after each iteration, generating a new candidate item set by the frequent item set discovered in the previous iteration, then counting the support degree of each candidate, comparing the support degree with a minimum support degree threshold value, and removing the candidate with the support degree lower than the threshold value;
the b.i (a, B) metric is interpreted as: when I (a, B) ═ 1, a and B are independent; i (A, B) >1 is positively correlated; i (a, B) <1 is negative correlation;
apriori algorithm correlation is defined by pearson correlation coefficients.
9. An electronic device, wherein the electronic device comprises:
a processor; and the number of the first and second groups,
a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method according to any one of claims 1 to 8.
10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-8.
CN202011178439.5A 2020-10-29 2020-10-29 Civil aviation customer source data analysis method, electronic equipment and computer readable storage medium Pending CN112231392A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011178439.5A CN112231392A (en) 2020-10-29 2020-10-29 Civil aviation customer source data analysis method, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011178439.5A CN112231392A (en) 2020-10-29 2020-10-29 Civil aviation customer source data analysis method, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN112231392A true CN112231392A (en) 2021-01-15

Family

ID=74110417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011178439.5A Pending CN112231392A (en) 2020-10-29 2020-10-29 Civil aviation customer source data analysis method, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112231392A (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150007090A (en) * 2013-07-10 2015-01-20 삼성전자주식회사 Method, electronic device and computer readable recording medium for providing location based services
US20160350770A1 (en) * 2015-06-01 2016-12-01 Xerox Corporation Method, system and processor-readable media for estimating airport usage demand
CN107729916A (en) * 2017-09-11 2018-02-23 湖南中森通信科技有限公司 A kind of interference source classification and identification algorithm and device based on ISODATA
CN107798557A (en) * 2017-09-30 2018-03-13 平安科技(深圳)有限公司 Electronic installation, the service location based on LBS data recommend method and storage medium
CN108595667A (en) * 2018-04-28 2018-09-28 广东电网有限责任公司 A kind of correlation analysis method of Network Abnormal data
CN108830655A (en) * 2018-06-19 2018-11-16 郑州云海信息技术有限公司 A kind of user's operation Relation acquisition method and relevant apparatus
CN108876406A (en) * 2018-06-28 2018-11-23 中国建设银行股份有限公司 Customer service behavior analysis method, device, server and readable storage medium storing program for executing
CN108961134A (en) * 2018-09-05 2018-12-07 北京工业大学 Airport passenger travelling OD recognition methods based on mobile phone signaling data
CN109634998A (en) * 2018-11-19 2019-04-16 北京通途永久科技有限公司 A kind of traffic journey characteristic analysis platform based on mobile phone signaling big data
CN109949004A (en) * 2019-03-01 2019-06-28 长沙理工大学 A kind of Electricity customers portrait new method of the positioning of client's fast failure and clustering algorithm
CN109978224A (en) * 2019-01-14 2019-07-05 南京大学 A method of analysis obtains the Trip Generation Rate of heterogeneity building
CN110968618A (en) * 2019-11-07 2020-04-07 华中科技大学 Method for mining quantitative association rule of welding parameters and application
CN111178932A (en) * 2019-11-26 2020-05-19 深圳壹账通智能科技有限公司 User geographic portrait generation method and device, computer equipment and storage medium
CN111353051A (en) * 2019-12-04 2020-06-30 江苏蓝河智能科技有限公司 K-means and Apriori-based algorithm maritime big data association analysis method
CN111459997A (en) * 2020-03-16 2020-07-28 中国科学院计算技术研究所 Frequent mode increment mining method of space-time trajectory data and electronic equipment

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150007090A (en) * 2013-07-10 2015-01-20 삼성전자주식회사 Method, electronic device and computer readable recording medium for providing location based services
US20160350770A1 (en) * 2015-06-01 2016-12-01 Xerox Corporation Method, system and processor-readable media for estimating airport usage demand
CN107729916A (en) * 2017-09-11 2018-02-23 湖南中森通信科技有限公司 A kind of interference source classification and identification algorithm and device based on ISODATA
CN107798557A (en) * 2017-09-30 2018-03-13 平安科技(深圳)有限公司 Electronic installation, the service location based on LBS data recommend method and storage medium
CN108595667A (en) * 2018-04-28 2018-09-28 广东电网有限责任公司 A kind of correlation analysis method of Network Abnormal data
CN108830655A (en) * 2018-06-19 2018-11-16 郑州云海信息技术有限公司 A kind of user's operation Relation acquisition method and relevant apparatus
CN108876406A (en) * 2018-06-28 2018-11-23 中国建设银行股份有限公司 Customer service behavior analysis method, device, server and readable storage medium storing program for executing
CN108961134A (en) * 2018-09-05 2018-12-07 北京工业大学 Airport passenger travelling OD recognition methods based on mobile phone signaling data
CN109634998A (en) * 2018-11-19 2019-04-16 北京通途永久科技有限公司 A kind of traffic journey characteristic analysis platform based on mobile phone signaling big data
CN109978224A (en) * 2019-01-14 2019-07-05 南京大学 A method of analysis obtains the Trip Generation Rate of heterogeneity building
CN109949004A (en) * 2019-03-01 2019-06-28 长沙理工大学 A kind of Electricity customers portrait new method of the positioning of client's fast failure and clustering algorithm
CN110968618A (en) * 2019-11-07 2020-04-07 华中科技大学 Method for mining quantitative association rule of welding parameters and application
CN111178932A (en) * 2019-11-26 2020-05-19 深圳壹账通智能科技有限公司 User geographic portrait generation method and device, computer equipment and storage medium
CN111353051A (en) * 2019-12-04 2020-06-30 江苏蓝河智能科技有限公司 K-means and Apriori-based algorithm maritime big data association analysis method
CN111459997A (en) * 2020-03-16 2020-07-28 中国科学院计算技术研究所 Frequent mode increment mining method of space-time trajectory data and electronic equipment

Similar Documents

Publication Publication Date Title
Khalili-Damghani et al. Hybrid soft computing approach based on clustering, rule mining, and decision tree analysis for customer segmentation problem: Real case of customer-centric industries
CN109033101B (en) Label recommendation method and device
CN109739844B (en) Data classification method based on attenuation weight
CN110995459B (en) Abnormal object identification method, device, medium and electronic equipment
CN112307860A (en) Image recognition model training method and device and image recognition method and device
CN113901236A (en) Target identification method and device based on artificial intelligence, electronic equipment and medium
JP5391637B2 (en) Data similarity calculation system, data similarity calculation method, and data similarity calculation program
CN110348516B (en) Data processing method, data processing device, storage medium and electronic equipment
CN111639077A (en) Data management method and device, electronic equipment and storage medium
CN115034315A (en) Business processing method and device based on artificial intelligence, computer equipment and medium
CN111159481A (en) Edge prediction method and device of graph data and terminal equipment
Struski et al. ProMIL: Probabilistic multiple instance learning for medical imaging
CN109409908A (en) Customer value classification method and device, computer-readable medium
EP4227855A1 (en) Graph explainable artificial intelligence correlation
CN112231392A (en) Civil aviation customer source data analysis method, electronic equipment and computer readable storage medium
CN112560474A (en) Express industry portrait generation method, device, equipment and storage medium
CN108830302B (en) Image classification method, training method, classification prediction method and related device
CN115034762A (en) Post recommendation method and device, storage medium, electronic equipment and product
CN115719244A (en) User behavior prediction method and device
CN112561569B (en) Dual-model-based store arrival prediction method, system, electronic equipment and storage medium
US20220374655A1 (en) Data summarization for training machine learning models
CN111127485B (en) Method, device and equipment for extracting target area in CT image
CN111401935B (en) Resource allocation method, device and storage medium
CN114723516A (en) User similarity calculation method and system based on form data
WO2021029835A1 (en) A method and system for clustering performance evaluation and increment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination