WO2024041399A1 - 一种数据处理方法、系统、电子设备及计算机存储介质 - Google Patents
一种数据处理方法、系统、电子设备及计算机存储介质 Download PDFInfo
- Publication number
- WO2024041399A1 WO2024041399A1 PCT/CN2023/112558 CN2023112558W WO2024041399A1 WO 2024041399 A1 WO2024041399 A1 WO 2024041399A1 CN 2023112558 W CN2023112558 W CN 2023112558W WO 2024041399 A1 WO2024041399 A1 WO 2024041399A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- freight
- product
- data
- feature
- products
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 16
- 238000000034 method Methods 0.000 claims abstract description 59
- 238000004458 analytical method Methods 0.000 claims abstract description 28
- 238000012545 processing Methods 0.000 claims abstract description 23
- 238000000605 extraction Methods 0.000 claims description 13
- 238000010276 construction Methods 0.000 claims description 7
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000007405 data analysis Methods 0.000 abstract description 4
- 238000012549 training Methods 0.000 description 14
- 239000011159 matrix material Substances 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 230000010354 integration Effects 0.000 description 3
- 238000003909 pattern recognition Methods 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000010006 flight Effects 0.000 description 2
- 238000013178 mathematical model Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0283—Price estimation or determination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
Definitions
- the present invention relates to the field of data processing technology, and in particular, to a data processing method, system, electronic equipment and computer storage medium.
- embodiments of the present invention provide a data processing method, system, electronic device, and computer storage medium to solve the problems in the prior art that processing and analysis are slow and data confusion is prone to occur.
- the first aspect of the embodiment of the present invention shows a data processing method.
- the method includes:
- the basic product information includes product definition and product attributes
- the freight product Analyze the freight rate data of the products to determine the value of each feature in the freight products;
- the freight rate data set and basic product information in the freight rate system are processed to determine the freight rate data corresponding to each freight rate product, including:
- Each freight rate data in the freight rate data set is classified and processed based on the product tag and the product attribute of the product basic information, and the freight rate data belonging to each freight rate product is determined.
- the freight data of the freight product is analyzed to determine each feature in the freight product.
- the value includes:
- the freight data for each freight product is based on the freight data of the freight product and the rules corresponding to the freight data to obtain the characteristics of the freight product, and the number of the characteristics is at least one ;
- Calculation is performed based on the quantity set of each value range of the feature to obtain the value of each feature of the freight product.
- determining a value that satisfies a preset threshold based on the value of each feature includes:
- determining the freight products of the same product type and aggregating the freight products of the same product type includes:
- the second aspect of the embodiment of the present invention shows a data processing system.
- the system includes:
- the product classification component is used to process the freight data set and basic product information in the freight system to determine the freight data corresponding to each freight product.
- the basic product information includes product definitions and product attributes;
- the feature extraction component is used to analyze the freight data of each freight product and the rules corresponding to the freight data, and determine each feature in the freight product. value; based on the value of each feature, determine the value that satisfies the preset threshold; use the feature corresponding to the value that satisfies the preset threshold as the decision-making feature;
- a product clustering component is used to determine the freight products of the same product type and aggregate the freight products of the same product type
- the business model analysis component is used to process the feature set of all freight products of the same product type and the freight data of all freight products of the same product type, and generate an analysis report corresponding to the freight products.
- the product classification component includes a product marking module, a product grouping module and a product family construction module;
- a product marking module configured to mark the product label of each freight rate data in the freight rate data set according to the product definition fields in the product basic information
- the product grouping module and the product family construction module are used to classify each freight rate data in the freight rate data set based on the product tag and the product attributes of the product basic information, and determine the freight rate data belonging to each freight rate product. Freight data.
- the product clustering component includes a feature matching module and a product merging module
- the feature matching module is used to compare each decision-making feature of each freight product. If there are freight products with the same decision-making characteristics, the freight products with the same decision-making characteristics will be regarded as the freight products of the same product type. product;
- the product merging module is used to aggregate the freight products of the same product type.
- the third aspect of the embodiment of the present invention shows an electronic device, the electronic device is used to run a program, wherein when the program is run, it performs any of the data processing shown in the first aspect of the embodiment of the present invention. method.
- the fourth aspect of the embodiment of the present invention shows a computer storage medium.
- the storage medium includes a stored program, wherein when the program is running, the device where the storage medium is located is controlled to execute the implementation of the present invention. Any one of the data processing methods described in the first aspect of the embodiment.
- a data processing method, system, electronic device and computer storage medium includes: processing the freight rate data set and basic product information in the freight rate system to determine each freight rate.
- the freight data corresponding to the freight products, the basic product information includes product definitions and product attributes; for the freight data of each freight product and the rules corresponding to the freight data, the freight data of the freight products Conduct analysis to determine the value of each feature in the freight product; determine the value that meets the preset threshold based on the value of each feature; use the feature corresponding to the value that meets the preset threshold as a decision feature ; Determine the freight products of the same product type, and aggregate the freight products of the same product type; Set the characteristics of all freight products of the same product type, and all freight products of the same product type Process the freight data of price products and generate analysis reports of corresponding freight products.
- the freight rate data set and other training set data use the feature extraction method in pattern recognition to calculate the decision-making characteristics of different freight rate products; determine the freight rate products of the same product type, and combine the same product type with the data in the training set. Aggregate the freight products; thereby analyzing the key functional points of freight products under different product types, as well as the potential business intentions of the definition of freight products.
- Figure 1 is a schematic structural diagram of a data processing system according to an embodiment of the present invention.
- FIG. 2 is a schematic flowchart of a data processing method according to an embodiment of the present invention.
- FIG. 1 a schematic structural diagram of a data processing system is shown according to an embodiment of the present invention.
- the system includes a product classification component 10 , a feature extraction component 20 , a product clustering component 30 , and a business model analysis component 40 .
- the product classification component 10 is connected to the feature extraction component 20 , the feature extraction component 20 is connected to the product clustering component 30 , and the product clustering component 30 is connected to the business model analysis component 40 .
- the product classification component 10 and the feature extraction component 20 are also connected to the training set, that is, the database.
- the training set is mainly used to store freight data, authority data, rule data and route data.
- freight data defines the issues of what to sell and how much to sell, mainly involving dimensions such as airlines, sales time range, origin, destination, cabin, price, freight basis, etc.; authority data determines It defines the issue of who will sell, mainly involving channels, groups, terminal configurations, agent accounts and other dimensions; rule data and route data define how to sell, to whom and when to sell, involving passenger identity, sales week , combination conditions, flights, transfer points, stopping points, cabins and other dimensions.
- the product classification component 10 is used to process the freight rate data set and basic product information in the freight rate system to determine the freight rate data corresponding to each freight rate product.
- the basic product information includes product definitions and product attributes.
- the product classification component 10 includes a product marking module 11 , a product grouping module 12 and a product family construction module 13 .
- the product marking module 11 marks the product label of each freight rate data in the freight rate data set according to the product definition field in the product basic information; then, the product grouping module 12 and the product family construction module 13 are based on the product Labels and product attributes of the basic product information are used to classify each freight rate data in the freight rate data set and determine the freight rate data belonging to each freight rate product.
- the product marking module 11 first obtains the freight data set of the freight system from the prepared training set, that is, the database, as well as the product definitions and product attributes related to the freight products; analyzes the freight data set, and based on the product definition
- the field marks the product label for each piece of freight data.
- the product grouping module 12 and the product family construction module 13 perform product classification processing on the freight data according to the product labels. Specifically, the freight data of the same product label of a freight product are grouped into one group, that is, the freight data of the same product label are grouped into one group.
- the data is regarded as the freight data of a freight product; and so on, until the freight data belonging to each freight product is determined, that is, according to the airline attributes published by the freight, all products containing freight data are classified as Product family for this airline.
- the freight data set includes multiple pieces of freight data.
- the freight data consists of identification ID, airline name, sales department, agreement number of freight products, origin, destination, ticket price and other data. .
- the freight data set refers to the freight data of the freight system in the same airline.
- Product labels correspond to product definitions, and product definitions are written based on the theoretical basis of freight products.
- the feature extraction component 20 is used to analyze the freight data of each freight product and the rules corresponding to the freight data, and determine the freight rate of each freight product.
- the value of the feature based on the value of each feature, determine the value that satisfies the preset threshold; use the feature corresponding to the value that satisfies the preset threshold as the decision-making feature.
- the feature extraction component 20 includes a data integration 21, a feature value analysis module 22 and a decision determination module 23.
- the characteristics of the freight rate product are obtained, and the characteristics of the freight rate product are obtained
- the quantity is at least one; the data integration 21 determines the quantity set of each value range of the feature based on the value range of the feature; the feature value analysis module 22 calculates based on the quantity set of each value range of the feature to obtain the freight price The value of each feature of the product. Based on the value of each feature, it is determined whether there is a value that satisfies the preset threshold. If it is determined that there is a value that satisfies the preset threshold, the feature corresponding to the value that satisfies the preset threshold is used as a decision feature.
- the decision determination module 23 first obtains the rules corresponding to the freight rate data in the rule data from the prepared training set, that is, the database, associates the freight rate data with the corresponding rules, and obtains the freight rate data from the freight rate data, and The characteristics of the freight product are obtained from the freight data and corresponding rules.
- Data integration 21 determines the value range and cardinality of each feature in the freight product based on the number of existing values for each feature in the freight product; combines the freight product, corresponding features, and value domains to form a correlation The amount of data is calculated, and the quantity set corresponding to each value range of the characteristic in the freight product is determined.
- the feature value analysis module 22 first calculates the standard deviation ⁇ ij of the quantity set Z ij of each value range of the feature, substitutes the standard deviation ⁇ ij and the quantity set Z ij of each value range of the feature into formula (1), and calculates The value ⁇ ij of each feature f j in the freight product S i . Determine whether the value of each feature is greater than or equal to a preset threshold, that is, screen out each valuable feature from each feature; use each valuable feature screened out as a decision feature .
- ⁇ ij is the value of each feature f j in the freight product S i
- L is the base of the value range
- max(Z ij ) is the maximum value in Z ij
- submax(Z ij ) is the submax in Z ij Large value.
- the product clustering component 30 is used to determine the freight products of the same product type and aggregate the freight products of the same product type.
- the product clustering component 30 includes a feature matching module 31 and a product merging module 32 .
- the feature matching module 31 compares each decision-making feature of each freight product. If there are freight products with the same decision-making features, the freight products with the same decision-making features will be regarded as the freight products of the same product type; Then the product merging module 32 aggregates the freight products of the same product type.
- the feature matching module 31 compares and matches the decision-making characteristics of all freight products within the same airline, and merges the freight products with the same decision-making characteristics into one product type; then the product merging module 32 merges the freight products into the same product type.
- the freight rate products are aggregated, so that there are multiple freight rate products under this product type, as well as the freight rate data of the freight rate products.
- a product feature set matrix can be generated based on the product type and its decision-making features.
- the product feature set matrix means that different product types involve different feature sets, so the product feature set matrix includes different product types and their corresponding different feature sets.
- the business model analysis component 40 is used to process the feature set of all freight products of the same product type and the freight data of all freight products of the same product type, and generate an analysis report corresponding to the freight products.
- the business model analysis component 40 includes a product information extraction module 41, a product definition module 42 and a product statistics module 43.
- the product information extraction module 41 analyzes the origin and destination of each freight rate data in the product type for the same product type; the product statistics module 43 counts the origin and destination with a high proportion of the product type, that is, Popular routes; the product definition module 42 determines the popular route product definition model based on the decision-making characteristics of all freight products under this product type, and then generates corresponding freight product analysis reports so that airlines can quickly adjust routes and freight data.
- the freight rate data set and other training set data use the feature extraction method in pattern recognition to calculate the decision-making characteristics of different freight rate products; determine the freight rate products of the same product type, and combine the same product type with the data in the training set. Aggregate the freight products; thereby analyzing the key functional points of freight products under different product types, as well as the potential business intentions of the definition of freight products.
- the embodiment of the present invention also shows a data processing method. As shown in Figure 2, it is a schematic flow chart of a data processing method shown in the embodiment of the present invention. Methods include:
- Step S201 Process the freight rate data set and basic product information in the freight rate system to determine the freight rate data corresponding to each freight rate product.
- step S201 the basic product information includes product definition and product attributes.
- step S201 includes the following steps:
- Step S11 Mark the product label of each freight rate data in the freight rate data set according to the product definition field in the product basic information.
- step S11 first obtain the freight rate data set of the freight rate system from the prepared training set, that is, the database, as well as the product definitions and product attributes related to the freight rate products; analyze the freight rate data set, and based on the product The defined fields mark the product label for each piece of freight data.
- the freight data set includes multiple pieces of freight data.
- the freight data consists of identification ID, airline name, sales department, agreement number of freight products, origin, destination, ticket price and other data. .
- the freight data set refers to the freight data of the freight system in the same airline.
- Step S12 Classify each freight rate data in the freight rate data set based on the product tag and the product attributes of the product basic information, and determine the freight rate data belonging to each freight rate product.
- step S12 product classification processing is performed on the freight data according to the product label. Specifically, the freight data of the same product label of a freight product is divided into a group, that is, the freight data of the same product label are grouped into one group. The price data is regarded as the freight data of a freight product; and so on, until the freight data belonging to each freight product is determined.
- product label corresponds to the product definition, and the product definition is written based on the theoretical basis of freight products.
- Table (1) includes 12 freight data sets A.
- the identification ID of the freight data set A, airline name, business department, agreement number, origin, destination, and ticket price are as shown in the table ( 1) shown.
- the product tag of the freight data of F21052114 is "1"
- mark the freight data of the freight product with the agreement number F21040212 as "3" that is to say, the product label of the freight data with the agreement number F21040212 is "3".
- Group the freight data with the same product label into one group that is, group the freight data 1, freight data 2, freight data 3, freight data 4 and freight data 5 with the agreement number F21052114 into one group.
- freight data 1, freight data 2, freight data 3, freight data 4 and freight data 5 are used as the freight data of freight product F21052114
- freight data 6 and freight data with agreement number CA080516H 7 and freight data 8 are a group, that is, freight data 6, freight data 7 and freight data 8 are used as the freight data of freight product CA080516H
- freight data 9 and freight data 10 of the agreement number F21040212 freight data 11 and freight data 12 are a group, that is, freight data 9, freight data 10, freight data 11 and freight data 12 are used as the freight data of freight product F21040212, as shown in Table (2) Show.
- Step S202 Analyze the freight data of each freight product and the rules corresponding to the freight data to determine the value of each feature in the freight product.
- step S202 includes the following steps:
- Step S21 For the freight data of each freight product, obtain the characteristics of the freight product based on the freight data of the freight product and the rules corresponding to the freight data.
- step S21 the number of features is at least one.
- step S21 the rules corresponding to the freight rate data in the rule data are obtained from the prepared training set, that is, the database, the freight rate data is associated with the corresponding rules, and the freight rate data is obtained from the database. , and obtain the characteristics of the freight product from the freight data and corresponding rules.
- the training set mainly includes freight data, authority data, rule data and route data.
- Freight data defines the issues of what to sell and how much to sell, mainly involving dimensions such as airlines, sales time range, origin, destination, cabin, price, freight basis and other dimensions.
- Permission data defines the issue of who will sell, mainly involving channels, groups, terminal configurations, agent accounts and other dimensions.
- Rule data and route data define how to sell, to whom, and when to sell, involving dimensions such as passenger identity, sales week, combination conditions, flights, transfer points, stops, cabins, and other dimensions.
- Step S22 Based on the value range of the feature in the freight product, determine the quantity set corresponding to each value range of the feature in the freight product.
- step S22 based on the number of existing values of each feature in the freight product, the value range and base of each feature in the freight product are determined; based on the freight product and the corresponding features, and The value ranges are combined to form relevant data volumes, the number of the data volumes is calculated, and the quantity set corresponding to each value range of the characteristic in the freight product is determined.
- the freight rate product S is a subset of the freight rate data set A. If there are I freight rate products, then a certain freight rate product and The value range of the feature fj in the freight product Si is marked as Dij, and
- i is less than or equal to I
- j is less than or equal to J
- J is the number of features.
- the data set related to a certain value range d l of the feature f j in the freight product S i is ⁇ (S i , f j , d l ).
- Simplify ⁇ (S i , f j , d l ) to ⁇ ij for example,
- the freight price product Si contains 5 pieces of freight data.
- a certain feature Si has two values, namely "1" and "2".
- the value range D ij of the feature f j in the freight product Si is ⁇ 1, 2 ⁇ , the base
- the freight price product S i , the corresponding feature f j , and the value range D ij they are combined to form relevant data volumes
- Step S23 Calculate based on the quantity set of each value range of the feature to obtain the value of each feature of the freight product.
- step S23 first calculate the standard deviation ⁇ ij of the quantity set Z ij of each value range of the feature, and substitute the standard deviation ⁇ ij and the quantity set Z ij of each value range of the feature into the formula ( 1), calculate the value ⁇ ij of each feature f j in the freight product S i .
- ⁇ i ⁇ f j
- ⁇ is a constant coefficient
- ⁇ is the value threshold
- ⁇ and ⁇ default to 1 can also be customized to other values.
- the rule B corresponding to the freight rate data set A in the rule data from the prepared training set, that is, the database, and associate the freight rate data set A with the corresponding rule B to form the following structural data, as shown in the table As shown in (3); the characteristics of the freight product are obtained from the 12 pieces of freight data, the freight data and the corresponding rule B, specifically involving 3 characteristics, team restrictions, advance sales restrictions, and week restrictions. .
- the f team limit has 2 values. That is to say, the 5 pieces of freight rate data are divided into 2 groups, containing 2 and 3 pieces of freight data respectively.
- the team limits the number of existing value ranges. That is to say, at this time, the freight rate can be calculated based on the pre-built mathematical model or parameter calculation method.
- the value range and base of each feature in the product are determined to be
- Step S203 Based on the value of each feature, determine whether there is a value that satisfies the preset threshold. If it is determined that there is a value that satisfies the preset threshold, perform steps S204 to step S206. If there is a value that does not meet the preset threshold, this Throw it away.
- step S203 includes the following steps:
- Step S31 Determine whether the value of each feature has a value greater than or equal to the preset threshold. If it exists, perform steps S204 to S206. If it does not exist, discard it.
- step S31 it is determined whether the value of each feature has a value greater than or equal to the preset threshold, that is, each valuable feature is screened out from each feature; if there is, then Execute step S204 to step S206. If it does not exist, discard it.
- the preset threshold is set in advance according to the actual situation. For example, it can be set to a positive integer greater than or equal to 1.
- Step S204 Use the features corresponding to the value that meets the preset threshold as decision-making features.
- each valuable feature screened out is used as a decision-making feature.
- the decision-making feature set of freight product F21052114 is ⁇ f team restriction , f advance sales restriction ⁇ .
- the decision-making feature set of freight product CA080516H is ⁇ f team limit , f advance sales limit ⁇
- the decision feature set of freight product F21040212 is ⁇ f team limit , f advance sales limit , f week limit ⁇ .
- Step S205 Determine the freight products of the same product type, and aggregate the freight products of the same product type.
- step S205 includes the following steps:
- Step S41 Compare each decision-making feature of each freight product. If there are freight products with the same decision-making features, use the freight products with the same decision-making features as the freight products of the same product type.
- step S41 within the same airline, the decision-making characteristics of all freight products are compared and matched, and the freight products with the same decision-making characteristics are merged into one product type.
- Step S42 Aggregate the freight products of the same product type.
- step S42 freight products that are merged into the same product type are aggregated, so that there are multiple freight products under the product type, as well as freight data of the freight products.
- the decision feature set of freight product F21052114 is ⁇ f team restriction , f advance sales limit ⁇ , and the decision of freight product CA080516H
- the feature set is ⁇ f team limit , f advance sales limit ⁇
- the decision feature set of freight product F21040212 is ⁇ f team limit , f advance sales limit , f week limit ⁇ .
- Step S206 Process the feature set of all freight products of the same product type and the freight data of all freight products of the same product type, and generate an analysis report corresponding to the freight products.
- step S206 For the same product type, analyze the origin and destination of each freight rate data in the product type, and count the origin and destination with a high proportion of the product type, that is, the popular routes; Based on the decision-making characteristics of all freight products under this product type, the popular route product definition model is determined, and then the corresponding freight product analysis report is generated so that airlines can quickly adjust routes and freight data.
- product type 1 includes freight product F21052114 and freight product CA080516H, containing a total of 8 pieces of freight data.
- the city pairs of the 7 freight data are CGQ (Changchun) and CAN (Guangzhou), only There is a city pair of freight data for CGQ (Changchun) and HAK (Haikou). Therefore, the hot route of airline product type 1 is Changchun to Guangzhou (vice versa), and the decision feature set of this product is ⁇ f team limit , f advance sales limit ⁇ .
- the freight rate data set and other training set data use the feature extraction method in pattern recognition to calculate the decision-making characteristics of different freight rate products; determine the freight rate products of the same product type, and combine the same product type with the data in the training set. Aggregate the freight products; thereby analyzing the key functional points of freight products under different product types, as well as the potential business intentions of the definition of freight products.
- An embodiment of the present invention also discloses an electronic device, which is used to run a database storage process, wherein the data processing method disclosed in Figure 2 is executed when the database storage process is run.
- An embodiment of the present invention also discloses a computer storage medium.
- the storage medium includes a storage database storage process, wherein when the database storage process is running, the device where the storage medium is located is controlled to execute the data processing method disclosed in Figure 2.
- computer storage media may be tangible media that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
- machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical academic storage device, magnetic storage device, or any suitable combination of the foregoing.
- RAM random access memory
- ROM read only memory
- EPROM or flash memory erasable programmable read only memory
- CD-ROM portable compact disk read-only memory
- optical academic storage device magnetic storage device, or any suitable combination of the foregoing.
Landscapes
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Primary Health Care (AREA)
- Tourism & Hospitality (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明提供一种数据处理方法、系统、电子设备及计算机存储介质,该方法包括:将运价系统中的运价数据集和产品基础信息进行处理,确定每一运价产品对应的运价数据;针对每一运价产品的运价数据和运价数据对应的规则,对运价产品的运价数据进行分析,确定运价产品中的每一特征的价值;基于每一特征的价值,确定满足预设阈值的价值;将满足预设阈值的价值所对应的特征作为决策特征;确定同一产品类型的运价产品;对同一产品类型的所有运价产品的特征集合,以及同一产品类型的所有运价产品的运价数据进行处理,生成对应运价产品的分析报告。通过上述方式不仅能够在规定的时间内完成数据处理,即能够提高处理分析的速度,且能够准确的进行数据分析。
Description
本申请要求于2022年8月23日提交中国专利局、申请号为202211014877.7、发明名称为“一种数据处理方法、系统、电子设备及计算机存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本发明涉及数据处理技术领域,尤其涉及一种数据处理方法、系统、电子设备及计算机存储介质。
随着航空服务业务的快速发展,机票业务发展迅猛,航空公司为了应对当前疫情情况下的市场变化,运价产品制定和销售的策略需要频繁变更。因此需要对现有的运价数据进行分析。
目前,常通过人工的方式将运价数据进行分析和组合,以能够完整、准确的描述一个运价产品,需要考虑功能点之间的关联关系,可谓“牵一发而动全身”的网状功能结构,从而导致通过人工的方式处理分析的速度较慢,且容易出现数据混乱。
发明内容
有鉴于此,本发明实施例提供一种数据处理方法、系统、电子设备及计算机存储介质,以解决现有技术中出现的处理分析的速度较慢,且容易出现数据混乱的问题。
为实现上述目的,本发明实施例提供如下技术方案:
本发明实施例第一方面示出了一种数据处理方法,所述方法包括:
将所述运价系统中的运价数据集和产品基础信息进行处理,确定每一运价产品对应的运价数据,所述产品基础信息包括产品定义和产品属性;
针对每一运价产品的运价数据和所述运价数据对应的规则,对所述运价产
品的运价数据进行分析,确定所述运价产品中的每一特征的价值;
基于所述每一特征的价值,确定满足预设阈值的价值;
将所述满足预设阈值的价值所对应的特征作为决策特征;
确定同一产品类型的所述运价产品,并将同一产品类型的所述运价产品进行聚合;
对所述同一产品类型的所有运价产品的特征集合,以及所述同一产品类型的所有运价产品的运价数据进行处理,生成对应运价产品的分析报告。
可选的,所述将所述运价系统中的运价数据集和产品基础信息进行处理,确定每一运价产品对应的运价数据,包括:
根据所述产品基础信息中的产品定义的字段,标记所述运价数据集中的每一运价数据的产品标签;
基于所述产品标签和所述产品基础信息的产品属性对所述运价数据集中的每一运价数据进行分类处理,确定属于每一运价产品的运价数据。
可选的,所述针对每一运价产品的运价数据和所述运价数据对应的规则,对所述运价产品的运价数据进行分析,确定所述运价产品中的每一特征的价值,包括:
所述针对每一运价产品的运价数据,基于所述运价产品的运价数据和所述运价数据对应的规则,获取所述运价产品的特征,所述特征的数量至少为一个;
基于所述特征的值域,确定所述特征各个值域的数量集合;
基于所述特征各个值域的数量集合进行计算,得到所述运价产品的每一特征的价值。
可选的,所述基于所述每一特征的价值,确定满足预设阈值的价值,包括:
判断所述每一特征的价值中是否存在大于等于预设阈值的价值;
若存在,则执行将所述满足预设阈值的价值所对应的特征作为决策特征这一步骤。
可选的,所述确定同一产品类型的所述运价产品,并将同一产品类型的所述运价产品进行聚合,包括:
比较每一所述运价产品的每一决策特征,若存在决策特征均相同的运价产品,则将决策特征均相同的运价产品作为同一产品类型的所述运价产品;
将同一产品类型的所述运价产品进行聚合。
本发明实施例第二方面示出了一种数据处理系统,所述系统包括:
产品分类组件,用于将所述运价系统中的运价数据集和产品基础信息进行处理,确定每一运价产品对应的运价数据,所述产品基础信息包括产品定义和产品属性;
特征提取组件,用于针对每一运价产品的运价数据和所述运价数据对应的规则,对所述运价产品的运价数据进行分析,确定所述运价产品中的每一特征的价值;基于所述每一特征的价值,确定满足预设阈值的价值;将所述满足预设阈值的价值所对应的特征作为决策特征;
产品聚类组件,用于确定同一产品类型的所述运价产品,并将同一产品类型的所述运价产品进行聚合;
业务模型分析组件,用于对所述同一产品类型的所有运价产品的特征集合,以及所述同一产品类型的所有运价产品的运价数据进行处理,生成对应运价产品的分析报告。
可选的,所述产品分类组件包括产品标记模块、产品分组模块和产品族构造模块;
产品标记模块,用于根据所述产品基础信息中的产品定义的字段,标记所述运价数据集中的每一运价数据的产品标签;
产品分组模块和产品族构造模块,用于基于所述产品标签和所述产品基础信息的产品属性对所述运价数据集中的每一运价数据进行分类处理,确定属于每一运价产品的运价数据。
可选的,所述产品聚类组件包括特征匹配模块和产品归并模块;
特征匹配模块,用于比较每一所述运价产品的每一决策特征,若存在决策特征均相同的运价产品,则将决策特征均相同的运价产品作为同一产品类型的所述运价产品;
产品归并模块,用于将同一产品类型的所述运价产品进行聚合。
本发明实施例第三方面示出了一种电子设备,所述电子设备用于运行程序,其中,所述程序运行时执行如本发明实施例第一方面示出的任一所述的数据处理方法。
本发明实施例第四方面示出了一种计算机存储介质,所述存储介质包括存储程序,其中,在所述程序运行时控制所述存储介质所在设备执行如本发明实
施例第一方面示出的任一所述的数据处理方法。
基于上述本发明实施例提供的一种数据处理方法、系统、电子设备及计算机存储介质,该方法包括:将所述运价系统中的运价数据集和产品基础信息进行处理,确定每一运价产品对应的运价数据,所述产品基础信息包括产品定义和产品属性;针对每一运价产品的运价数据和所述运价数据对应的规则,对所述运价产品的运价数据进行分析,确定所述运价产品中的每一特征的价值;基于所述每一特征的价值,确定满足预设阈值的价值;将所述满足预设阈值的价值所对应的特征作为决策特征;确定同一产品类型的所述运价产品,并将同一产品类型的所述运价产品进行聚合;对所述同一产品类型的所有运价产品的特征集合,以及所述同一产品类型的所有运价产品的运价数据进行处理,生成对应运价产品的分析报告。在本发明实施例中,运价数据集等训练集中数据通过模式识别中特征提取的方法,计算出不同运价产品的决策特征;确定同一产品类型的所述运价产品,并将同一产品类型的所述运价产品进行聚合;从而分析出不同产品类型下运价产品的关键功能点,以及运价产品定义的潜在业务意图。通过上述方式不仅能够在规定的时间内完成数据处理,即能够提高处理分析的速度,且能够准确的进行数据分析。
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本发明实施例示出的一种数据处理系统的结构示意图;
图2为本发明实施例示出的一种数据处理方法的流程示意图。
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造
性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
需要说明的是,在本发明中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本发明要求的保护范围之内。
在本申请中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
参见图1,为本发明实施例示出的一种数据处理系统的结构示意图,该系统包括产品分类组件10、特征提取组件20、产品聚类组件30、和业务模型分析组件40。
产品分类组件10与特征提取组件20连接,特征提取组件20连接与产品聚类组件30连接,产品聚类组件30和业务模型分析组件40连接。
产品分类组件10与特征提取组件20还与训练集,即数据库连接。
训练集主要用于存储运价数据、权限数据、规则数据和航线数据。
其中,运价数据定义了卖什么、卖多少的问题,主要涉及航空公司、可以销售时间范围、始发地、目的地、舱位、价格、运价基础等维度;权限数据定
义了谁来卖的问题,主要涉及渠道、群组、终端配置、代理人账号等维度;规则数据和航线数据定义了怎么卖、卖给谁、何时卖的问题,涉及乘客身份、销售星期、组合条件、航班、中转点、经停点、舱位等维度。
产品分类组件10,用于将所述运价系统中的运价数据集和产品基础信息进行处理,确定每一运价产品对应的运价数据,所述产品基础信息包括产品定义和产品属性。
其中,产品分类组件10包括产品标记模块11、产品分组模块12和产品族构造模块13。
产品标记模块11根据所述产品基础信息中的产品定义的字段,标记所述运价数据集中的每一运价数据的产品标签;接着,产品分组模块12和产品族构造模块13基于所述产品标签和所述产品基础信息的产品属性对所述运价数据集中的每一运价数据进行分类处理,确定属于每一运价产品的运价数据。
具体的,产品标记模块11首先从准备的训练集,即数据库中获取运价系统的运价数据集,以及运价产品相关的产品定义和产品属性;分析运价数据集,并根据产品定义的字段标记每条运价数据的产品标签。产品分组模块12和产品族构造模块13依据产品标签对运价数据进行产品分类处理,具体的,将一运价产品的同一产品标签的运价数据分为一组,即将同一产品标签的运价数据作为一运价产品的运价数据;依次类推,直至确定属于每一运价产品的运价数据,也就是说,按照运价发布的航空公司属性,将包含运价数据的所有产品归为此航空公司的产品族。
需要说明的是,运价数据集包括多条运价数据,运价数据由识别标识ID,航空公司名称、营业部,运价产品的协议编号,始发地,目的地,机票价格等数据组成。
运价数据集是指同一航空公司中运价系统的运价数据。
产品标签与产品定义相对应,且产品定义是指根据运价产品的理论基础编写的。
特征提取组件20,用于针对每一运价产品的运价数据和所述运价数据对应的规则,对所述运价产品的运价数据进行分析,确定所述运价产品中的每一特征的价值;基于所述每一特征的价值,确定满足预设阈值的价值;将所述满足预设阈值的价值所对应的特征作为决策特征。
其中,特征提取组件20包括数据集成21、特征价值分析模块22和决策确定模块23。
决策确定模块23所述针对每一运价产品的运价数据,基于所述运价产品的运价数据和所述运价数据对应的规则,获取所述运价产品的特征,所述特征的数量至少为一个;数据集成21基于所述特征的值域,确定所述特征各个值域的数量集合;特征价值分析模块22基于所述特征各个值域的数量集合进行计算,得到所述运价产品的每一特征的价值。基于所述每一特征的价值,确定是否存在满足预设阈值的价值,若确定存在满足预设阈值的价值,将所述满足预设阈值的价值所对应的特征作为决策特征。
具体的,决策确定模块23首先从准备的训练集,即数据库中获取规则数据中与所述运价数据对应的规则,将运价数据与对应的规则进行关联,从所述运价数据,和运价数据与对应的规则中获取所述运价产品的特征。数据集成21基于所述运价产品中每一特征存在值的数量,确定所述运价产品中每一特征的值域和基数;基于运价产品,以及对应特征,以及值域进行组合形成相关的数据量,计算所述数据量的个数,确定所述运价产品中特征各个值域对应的数量集合。特征价值分析模块22先计算所述特征各个值域的数量集合Zij的标准差σij,将所述标准差σij和所述特征各个值域的数量集合Zij代入公式(1),计算运价产品Si中每一特征fj的价值τij。确定所述每一特征的价值中是否存在大于等于预设阈值的价值,也就是说,从每一特征中筛选出每一有价值的特征;将筛选出的每一有价值的特征作为决策特征。
公式(1):
其中,τij为运价产品Si中每一特征fj的价值,L为值域的基数,max(Zij)是Zij中的最大值,submax(Zij)是Zij中的次大值。
需要说明的是,若σij=0,说明运价产品Si中特征fj能够将数据均衡分组,其价值设定为最大值MaxValue。如果|Dij|=1,运价产品Si中特征fj只有一个值,没有分组能力,其价值设定为0。
产品聚类组件30,用于确定同一产品类型的所述运价产品,并将同一产品类型的所述运价产品进行聚合。
其中,产品聚类组件30包括特征匹配模块31和产品归并模块32。
特征匹配模块31比较每一所述运价产品的每一决策特征,若存在决策特征均相同的运价产品,则将决策特征均相同的运价产品作为同一产品类型的所述运价产品;接着产品归并模块32将同一产品类型的所述运价产品进行聚合。
具体的,特征匹配模块31同一航空公司内,比较、匹配所有运价产品的决策特征,将具有相同的决策特征的运价产品归并为一个产品类型;接着产品归并模块32对归并为同一产品类型的运价产品进行聚合,使得该产品类型下存在多个运价产品,以及运价产品的运价数据。
可选的,对同一产品类型的运价产品进行聚合后,可根据产品类型及其决策特征,生成产品特征集合矩阵。
需要说明的是,产品特征集合矩阵是指不同的产品类型涉及不同的特征集合,因此产品特征集合矩阵包含不同产品类型,及其所对应的不同特征集合。
业务模型分析组件40,用于对所述同一产品类型的所有运价产品的特征集合,以及所述同一产品类型的所有运价产品的运价数据进行处理,生成对应运价产品的分析报告。
其中,业务模型分析组件40包括产品信息提取模块41、产品定义模块42和产品统计模块43。
具体的,产品信息提取模块41针对同一产品类型,分析该产品类型中各个运价数据的始发地和目的地;产品统计模块43统计产品类型中占比高的始发地和目的地,即热门航线;产品定义模块42根据该产品类型下所有运价产品的决策特征,确定热门航线产品定义模式,进而生成对应的运价产品分析报告,以便航司快速调整航线以及运价数据。
在本发明实施例中,运价数据集等训练集中数据通过模式识别中特征提取的方法,计算出不同运价产品的决策特征;确定同一产品类型的所述运价产品,并将同一产品类型的所述运价产品进行聚合;从而分析出不同产品类型下运价产品的关键功能点,以及运价产品定义的潜在业务意图。通过上述方式不仅能够在规定的时间内完成数据处理,即能够提高处理分析的速度,且能够准确的进行数据分析。
基于上述本发明实施例示出的数据处理系统,本发明实施例还对应示出了一种数据处理方法,如图2所示,为本发明实施例示出的一种数据处理方法的流程示意图,该方法包括:
步骤S201:将所述运价系统中的运价数据集和产品基础信息进行处理,确定每一运价产品对应的运价数据。
在步骤S201中,所述产品基础信息包括产品定义和产品属性。
需要说明的是,具体实现步骤S201的过程包括以下步骤:
步骤S11:根据所述产品基础信息中的产品定义的字段,标记所述运价数据集中的每一运价数据的产品标签。
在具体实现步骤S11的过程中,首先从准备的训练集,即数据库中获取运价系统的运价数据集,以及运价产品相关的产品定义和产品属性;分析运价数据集,并根据产品定义的字段标记每条运价数据的产品标签。
需要说明的是,运价数据集包括多条运价数据,运价数据由识别标识ID,航空公司名称、营业部,运价产品的协议编号,始发地,目的地,机票价格等数据组成。
运价数据集是指同一航空公司中运价系统的运价数据。
步骤S12:基于所述产品标签和所述产品基础信息的产品属性对所述运价数据集中的每一运价数据进行分类处理,确定属于每一运价产品的运价数据。
在具体实现步骤S12的过程中,首先,依据产品标签对运价数据进行产品分类处理,具体的,将一运价产品的同一产品标签的运价数据分为一组,即将同一产品标签的运价数据作为一运价产品的运价数据;依次类推,直至确定属于每一运价产品的运价数据。
需要说明的是,产品标签与产品定义相对应,且产品定义是指根据运价产品的理论基础编写的。
例如:表(1)中包括12条运价数据集合A,该运价数据集A的识别标识ID,航空公司名称、营业部,协议编号,始发地,目的地,以及机票价格如表(1)所示。分析运价数据集A,并根据产品定义的字段标记每条运价数据的产品标签,比如将运价产品的协议编号为F21052114的运价数据标记标为“1”,也就是说,协议编号为F21052114的运价数据的产品标签为“1”;将运价产品的协议编号为CA080516H的运价数据标记标为“2”,也就是说,协议编号为
CA080516H的运价数据的产品标签为“2”;将运价产品的协议编号为F21040212的运价数据标记标为“3”,也就是说,协议编号为F21040212的运价数据的产品标签为“3”。
将产品标签相同的运价数据分成一组,也就是说,将协议编号为F21052114的运价数据1、运价数据2、运价数据3、运价数据4和运价数据5分为一组,即将运价数据1、运价数据2、运价数据3、运价数据4和运价数据5作为运价产品F21052114的运价数据;将协议编号为CA080516H的运价数据6、运价数据7和运价数据8为一组,即将运价数据6、运价数据7和运价数据8作为运价产品CA080516H的运价数据;将协议编号为F21040212的运价数据9、运价数据10、运价数据11和运价数据12为一组,即将运价数据9、运价数据10、运价数据11和运价数据12作为运价产品F21040212的运价数据,如表(2)所示。
表(1):
表(2):
步骤S202:针对每一运价产品的运价数据和所述运价数据对应的规则,对所述运价产品的运价数据进行分析,确定所述运价产品中的每一特征的价值。
需要说明的是,具体实现步骤S202的过程包括以下步骤:
步骤S21:所述针对每一运价产品的运价数据,基于所述运价产品的运价数据和所述运价数据对应的规则,获取所述运价产品的特征。
在步骤S21中,所述特征的数量至少为一个。
在具体实现步骤S21的过程中,首先从准备的训练集,即数据库中获取规则数据中与所述运价数据对应的规则,将运价数据与对应的规则进行关联,从所述运价数据,和运价数据与对应的规则中获取所述运价产品的特征。
需要说明的是,训练集主要包括运价数据、权限数据、规则数据和航线数据。
运价数据定义了卖什么、卖多少的问题,主要涉及航空公司、可以销售时间范围、始发地、目的地、舱位、价格、运价基础等维度。
权限数据定义了谁来卖的问题,主要涉及渠道、群组、终端配置、代理人账号等维度。
规则数据和航线数据定义了怎么卖、卖给谁、何时卖的问题,涉及乘客身份、销售星期、组合条件、航班、中转点、经停点、舱位等维度。
步骤S22:基于所述运价产品中特征的值域,确定所述运价产品中特征各个值域对应的数量集合。
在具体实现步骤S22的过程中,基于所述运价产品中每一特征存在值的数量,确定所述运价产品中每一特征的值域和基数;基于运价产品,以及对应特征,以及值域进行组合形成相关的数据量,计算所述数据量的个数,确定所述运价产品中特征各个值域对应的数量集合。
需要说明的是,运价产品S是运价数据集A的子集,如果有I个运价产品,则某运价产品且运价产品Si中特征fj的值域记为Dij,|Dij|为运价产品Si中特征fj的值域的基数。
其中,i小于等于I,j小于等于J,J为特征的数量。
需要说明的是,|Dij|=L,与运价产品Si中特征fj的某个值域dl相关的数据集合为Δ(Si,fj,dl),为方便描述,将Δ(Si,fj,dl)简化为Δij,比如|Δijl|可为运价产品Si中特征fj的某个值域dl相关的数据量,
例如:运价产品Si包含5条运价数据,某个特征Si有2种值,分别为“1”和“2”,运价产品Si中特征fj的值域Dij为{1,2},基数|Dij|为2。基于运价产品Si,以及对应特征fj,以及值域Dij进行组合形成相关的数据量|Δijl|、|Δij2|...和|ΔijL|,计算所述数据量的个数,确定所述运价产品中特征各个值域对应的数量集合
Zij,其中,Zij={|Δijl|,|Δij2|...|ΔijL|}。
例如:运价产品Si中特征fj为dl=“1”的数据有3条,则|Δijl|为3。
步骤S23:基于所述特征各个值域的数量集合进行计算,得到所述运价产品的每一特征的价值。
在具体实现步骤S23的过程中,先计算所述特征各个值域的数量集合Zij的标准差σij,将所述标准差σij和所述特征各个值域的数量集合Zij代入公式(1),计算运价产品Si中每一特征fj的价值τij。
需要说明的是,若σij=0,说明运价产品Si中特征fj能够将数据均衡分组,其价值设定为最大值MaxValue。如果|Dij|=1,运价产品Si中特征fj只有一个值,没有分组能力,其价值设定为0。
可选的,还包括将某运价产品的有价值的特征集合为θi,θi可表示为θi={fj|ατij≥λ,i∈[1,I],j∈[1,n],|Dij|>1,fj∈F}。
其中,α为常量系数,λ为价值阈值,α和λ默认为1,也可以自定义为其它值。
例如:从准备的训练集,即数据库中获取规则数据中与所述运价数据集A对应的规则B,将运价数据集A与对应的规则B进行关联,形成如下的结构数据,如表(3)所示;从所述12条运价数据,和运价数据与对应的规则B中获取所述运价产品的特征,具体涉及3个特征,团队限制、提前销售限制、和星期限制。
对于运价产品F2105211,f团队限制有2个值,也就是说,将5条运价数据分为2组,分别包含2条和3条运价数据。基于所述运价产品中特征f团队限制存在值域的数量,也就是说,此时可根据预构建的数学模型或参数计算方法计算所述运价
产品中每一特征的值域和基数,即确定|D团队限制|为2,σ团队限制为0.5;通过公式(1)计算出来的价值为12.25。
表(3):
步骤S203:基于所述每一特征的价值,确定是否存在满足预设阈值的价值,若确定存在满足预设阈值的价值,执行步骤S204至步骤S206,若存在不满足预设阈值的价值,这将其抛弃。
需要说明的是,具体实现步骤S203的过程中,包括以下步骤:
步骤S31:判断所述每一特征的价值中是否存在大于等于预设阈值的价值,若存在,则执行步骤S204至步骤S206,若不存在,则这将其抛弃。
在具体实现步骤S31的过程中,确定所述每一特征的价值中是否存在大于等于预设阈值的价值,也就是说,从每一特征中筛选出每一有价值的特征;若存在,则执行步骤S204至步骤S206,若不存在,则这将其抛弃。
需要说明的是,预设阈值是预先根据实际情况设置的,比如,可设置为大于等于1的正整数。
步骤S204:将所述满足预设阈值的价值所对应的特征作为决策特征。
在具体实现步骤S204的过程中,将筛选出的每一有价值的特征作为决策特征。
基于上述步骤S203和步骤S204示出的具体内容,下面进行举例说明。
例如:基于上述示出表(3)示出的内容,确定|D团队限制|为2,σ团队限制为0.5,
通过公式(1)计算出来的价值为12.25,f团队限制的价值大于预设阈值1,
f团队限制被选择出来,作为决策特征。同理,可以选择出f提前销售限制作为决策特征。而f星期限制不具备数据区分能力,所以,不能作为决策特征。因此,运价产品F21052114的决策特征集合为{f团队限制,f提前销售限制}。
以此类推,运价产品CA080516H的决策特征集合为{f团队限制,f提前销售限制},运价产品F21040212的决策特征集合为{f团队限制,f提前销售限制,f星期限制}。
步骤S205:确定同一产品类型的所述运价产品,并将同一产品类型的所述运价产品进行聚合。
需要说明的是,具体实现步骤S205的过程,包括以下步骤:
步骤S41:比较每一所述运价产品的每一决策特征,若存在决策特征均相同的运价产品,则将决策特征均相同的运价产品作为同一产品类型的所述运价产品。
在具体实现步骤S41的过程中,同一航空公司内,比较、匹配所有运价产品的决策特征,将具有相同的决策特征的运价产品归并为一个产品类型。
步骤S42:将同一产品类型的所述运价产品进行聚合。
在具体实现步骤S42的过程中,对归并为同一产品类型的运价产品进行聚合,使得该产品类型下存在多个运价产品,以及运价产品的运价数据。
例如:基于上述表(2)示出的内容以及步骤S103和步骤S104示出的内容,确定运价产品F21052114的决策特征集合为{f团队限制,f提前销售限制},运价产品CA080516H的决策特征集合为{f团队限制,f提前销售限制},以及运价产品F21040212的决策特征集合为{f团队限制,f提前销售限制,f星期限制}。同一航空公司CA内,比较、匹配所有运价产品的决策特征,确定运价产品F21052114和运价产品CA080516H具有相同的决策特征集合{f团队限制,f提前销售限制},因此两个运价产品可以归并为一个产品类型。最终形成如下的结构,如表(4)所示,进而可生成对应的产品特征集合矩阵并存储。
表(4):
步骤S206:对所述同一产品类型的所有运价产品的特征集合,以及所述同一产品类型的所有运价产品的运价数据进行处理,生成对应运价产品的分析报告。
在具体实现步骤S206的过程中,针对同一产品类型,分析该产品类型中各个运价数据的始发地和目的地,统计产品类型中占比高的始发地和目的地,即热门航线;根据该产品类型下所有运价产品的决策特征,确定热门航线产品定义模式,进而生成对应的运价产品分析报告,以便航司快速调整航线以及运价数据。
例如:基于上述表(4)示出的内容,产品类型1包含运价产品F21052114和运价产品CA080516H,共包含8条运价数据,在考虑运价始发地和目的顺序的前提下,有7条运价数据的城市对为CGQ(长春)和CAN(广州),只
有一条运价数据的城市对为CGQ(长春)和HAK(海口),因此,航空公司产品类型1的热点航线是长春到广州(反之亦然),并且该产品的决策特征集合为{f团队限制,f提前销售限制}。从而可以确定该热点航线的运价产品主要是由“团队限制”和“提前销售限制”定义的。同理可以推导出,“航空公司产品类型2”主要是由“团队限制”、“提前销售限制”和“星期限制”定义的,基于上述分析生成对应的运价产品分析报告,以便航司快速调整航线以及运价数据。
可选的,上述步骤S201至步骤S206的具体实现过程可于以数学模型形式实现的。
在本发明实施例中,运价数据集等训练集中数据通过模式识别中特征提取的方法,计算出不同运价产品的决策特征;确定同一产品类型的所述运价产品,并将同一产品类型的所述运价产品进行聚合;从而分析出不同产品类型下运价产品的关键功能点,以及运价产品定义的潜在业务意图。通过上述方式不仅能够在规定的时间内完成数据处理,即能够提高处理分析的速度,且能够准确的进行数据分析。
本发明实施例还公开了一种电子设备,该电子设备用于运行数据库存储过程,其中,所述运行数据库存储过程时执行上述图2公开的数据处理方法。
本发明实施例还公开了一种计算机存储介质,所述存储介质包括存储数据库存储过程,其中,在所述数据库存储过程运行时控制所述存储介质所在设备执行上述图2公开的数据处理方法。
在本公开的上下文中,计算机存储介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光
学储存设备、磁储存设备、或上述内容的任何合适组合。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统或系统实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的系统及系统实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。
Claims (10)
- 一种数据处理方法,其特征在于,所述方法包括:将所述运价系统中的运价数据集和产品基础信息进行处理,确定每一运价产品对应的运价数据,所述产品基础信息包括产品定义和产品属性;针对每一运价产品的运价数据和所述运价数据对应的规则,对所述运价产品的运价数据进行分析,确定所述运价产品中的每一特征的价值;基于所述每一特征的价值,确定满足预设阈值的价值;将所述满足预设阈值的价值所对应的特征作为决策特征;确定同一产品类型的所述运价产品,并将同一产品类型的所述运价产品进行聚合;对所述同一产品类型的所有运价产品的特征集合,以及所述同一产品类型的所有运价产品的运价数据进行处理,生成对应运价产品的分析报告。
- 根据权利要求1所述的方法,其特征在于,所述将所述运价系统中的运价数据集和产品基础信息进行处理,确定每一运价产品对应的运价数据,包括:根据所述产品基础信息中的产品定义的字段,标记所述运价数据集中的每一运价数据的产品标签;基于所述产品标签和所述产品基础信息的产品属性对所述运价数据集中的每一运价数据进行分类处理,确定属于每一运价产品的运价数据。
- 根据权利要求1所述的方法,其特征在于,所述针对每一运价产品的运价数据和所述运价数据对应的规则,对所述运价产品的运价数据进行分析,确定所述运价产品中的每一特征的价值,包括:所述针对每一运价产品的运价数据,基于所述运价产品的运价数据和所述运价数据对应的规则,获取所述运价产品的特征,所述特征的数量至少为一个;基于所述特征的值域,确定所述特征各个值域的数量集合;基于所述特征各个值域的数量集合进行计算,得到所述运价产品的每一特征的价值。
- 根据权利要求1所述的方法,其特征在于,所述基于所述每一特征的价值,确定满足预设阈值的价值,包括:判断所述每一特征的价值中是否存在大于等于预设阈值的价值;若存在,则执行将所述满足预设阈值的价值所对应的特征作为决策特征这一步骤。
- 根据权利要求1所述的方法,其特征在于,所述确定同一产品类型的所述运价产品,并将同一产品类型的所述运价产品进行聚合,包括:比较每一所述运价产品的每一决策特征,若存在决策特征均相同的运价产品,则将决策特征均相同的运价产品作为同一产品类型的所述运价产品;将同一产品类型的所述运价产品进行聚合。
- 一种数据处理系统,其特征在于,所述系统包括:产品分类组件,用于将所述运价系统中的运价数据集和产品基础信息进行处理,确定每一运价产品对应的运价数据,所述产品基础信息包括产品定义和产品属性;特征提取组件,用于针对每一运价产品的运价数据和所述运价数据对应的规则,对所述运价产品的运价数据进行分析,确定所述运价产品中的每一特征的价值;基于所述每一特征的价值,确定满足预设阈值的价值;将所述满足预设阈值的价值所对应的特征作为决策特征;产品聚类组件,用于确定同一产品类型的所述运价产品,并将同一产品类型的所述运价产品进行聚合;业务模型分析组件,用于对所述同一产品类型的所有运价产品的特征集合,以及所述同一产品类型的所有运价产品的运价数据进行处理,生成对应运价产品的分析报告。
- 根据权利要求6所述的系统,其特征在于,所述产品分类组件包括产品标记模块、产品分组模块和产品族构造模块;产品标记模块,用于根据所述产品基础信息中的产品定义的字段,标记所述运价数据集中的每一运价数据的产品标签;产品分组模块和产品族构造模块,用于基于所述产品标签和所述产品基础信息的产品属性对所述运价数据集中的每一运价数据进行分类处理,确定属于每一运价产品的运价数据。
- 根据权利要求6所述的系统,其特征在于,所述产品聚类组件包括特征匹配模块和产品归并模块;特征匹配模块,用于比较每一所述运价产品的每一决策特征,若存在决策 特征均相同的运价产品,则将决策特征均相同的运价产品作为同一产品类型的所述运价产品;产品归并模块,用于将同一产品类型的所述运价产品进行聚合。
- 一种电子设备,其特征在于,所述电子设备用于运行程序,其中,所述程序运行时执行如权利要求1-5中任一所述的数据处理方法。
- 一种计算机存储介质,其特征在于,所述存储介质包括存储程序,其中,在所述程序运行时控制所述存储介质所在设备执行如权利要求1-5中任一所述的数据处理方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211014877.7A CN115409549B (zh) | 2022-08-23 | 2022-08-23 | 一种数据处理方法、系统、电子设备及计算机存储介质 |
CN202211014877.7 | 2022-08-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024041399A1 true WO2024041399A1 (zh) | 2024-02-29 |
Family
ID=84161364
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/112558 WO2024041399A1 (zh) | 2022-08-23 | 2023-08-11 | 一种数据处理方法、系统、电子设备及计算机存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115409549B (zh) |
WO (1) | WO2024041399A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115409549B (zh) * | 2022-08-23 | 2024-05-14 | 中国民航信息网络股份有限公司 | 一种数据处理方法、系统、电子设备及计算机存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8615422B1 (en) * | 2011-11-10 | 2013-12-24 | American Airlines, Inc. | Airline pricing system and method |
US20150310570A1 (en) * | 2014-04-28 | 2015-10-29 | Duetto Research, Inc. | Open pricing and pricing rules |
CN105654338A (zh) * | 2015-12-25 | 2016-06-08 | 中国民航信息网络股份有限公司 | 规则运价计算方法及装置、系统 |
CN107451872A (zh) * | 2017-08-10 | 2017-12-08 | 中国民航信息网络股份有限公司 | 航班运价的管理方法及装置 |
CN109978619A (zh) * | 2019-03-25 | 2019-07-05 | 携程旅游网络技术(上海)有限公司 | 机票定价策略筛选的方法、系统、设备以及介质 |
CN115409549A (zh) * | 2022-08-23 | 2022-11-29 | 中国民航信息网络股份有限公司 | 一种数据处理方法、系统、电子设备及计算机存储介质 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107683469A (zh) * | 2015-12-30 | 2018-02-09 | 中国科学院深圳先进技术研究院 | 一种基于深度学习的产品分类方法及装置 |
KR101805402B1 (ko) * | 2017-03-23 | 2018-01-11 | 대한민국(해양수산부장관) | 아시아 컨테이너 해상운임지수를 제공하기 위한 해상운임지수 산출 방법 및 그를 이용한 해운정보 중개서비스 방법 |
CN107480187A (zh) * | 2017-07-10 | 2017-12-15 | 北京京东尚科信息技术有限公司 | 基于聚类分析的用户价值分类方法和装置 |
CN108595566A (zh) * | 2018-04-13 | 2018-09-28 | 中国民航信息网络股份有限公司 | 信息聚类方法及装置 |
CN110737665B (zh) * | 2019-10-21 | 2023-07-18 | 中国民航信息网络股份有限公司 | 一种数据处理方法及装置 |
CN112307065B (zh) * | 2020-10-30 | 2024-06-07 | 中国民航信息网络股份有限公司 | 一种数据处理方法、装置及服务器 |
KR102639188B1 (ko) * | 2020-11-16 | 2024-02-22 | 씨제이올리브네트웍스 주식회사 | 딥러닝 기반의 동적 가격 산정 방법 및 동적 가격 산정 시스템 |
CN112434067A (zh) * | 2020-11-24 | 2021-03-02 | 携程旅游网络技术(上海)有限公司 | 国际运价的缓存数据处理方法、系统、设备及介质 |
CN113807456B (zh) * | 2021-09-26 | 2024-04-09 | 大连交通大学 | 一种基于互信息的特征筛选和关联规则多标记分类方法 |
CN114861084A (zh) * | 2022-03-29 | 2022-08-05 | 携程商旅信息服务(上海)有限公司 | 数据处理方法、装置及存储介质 |
-
2022
- 2022-08-23 CN CN202211014877.7A patent/CN115409549B/zh active Active
-
2023
- 2023-08-11 WO PCT/CN2023/112558 patent/WO2024041399A1/zh unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8615422B1 (en) * | 2011-11-10 | 2013-12-24 | American Airlines, Inc. | Airline pricing system and method |
US20150310570A1 (en) * | 2014-04-28 | 2015-10-29 | Duetto Research, Inc. | Open pricing and pricing rules |
CN105654338A (zh) * | 2015-12-25 | 2016-06-08 | 中国民航信息网络股份有限公司 | 规则运价计算方法及装置、系统 |
CN107451872A (zh) * | 2017-08-10 | 2017-12-08 | 中国民航信息网络股份有限公司 | 航班运价的管理方法及装置 |
CN109978619A (zh) * | 2019-03-25 | 2019-07-05 | 携程旅游网络技术(上海)有限公司 | 机票定价策略筛选的方法、系统、设备以及介质 |
CN115409549A (zh) * | 2022-08-23 | 2022-11-29 | 中国民航信息网络股份有限公司 | 一种数据处理方法、系统、电子设备及计算机存储介质 |
Non-Patent Citations (1)
Title |
---|
WANG WENFANG, LUO LIANGSHENG: "Research on the strategy of airline brand freight system based on the concept of differentiated service revenue", CHINA MANAGEMENT INFORMATIONIZATION., vol. 22, no. 14, 1 July 2019 (2019-07-01), pages 106 - 107, XP093144378, DOI: 10.3969/j.issn.1673-0194.2019.14.051 * |
Also Published As
Publication number | Publication date |
---|---|
CN115409549B (zh) | 2024-05-14 |
CN115409549A (zh) | 2022-11-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022126971A1 (zh) | 基于密度的文本聚类方法、装置、设备及存储介质 | |
WO2024041399A1 (zh) | 一种数据处理方法、系统、电子设备及计算机存储介质 | |
WO2020238677A1 (zh) | 数据处理方法、装置和计算机可读存储介质 | |
WO2018014610A1 (zh) | 基于c4.5决策树算法的特定用户挖掘系统及其方法 | |
CN101650746B (zh) | 一种对排序结果进行验证的方法和系统 | |
WO2020029412A1 (zh) | 标签推荐方法、装置、计算机设备及计算机可读存储介质 | |
CN109508373B (zh) | 企业舆情指数的计算方法、设备及计算机可读存储介质 | |
CN107368592B (zh) | 一种用于网络安全报告的文本特征模型建模方法及装置 | |
TW201835789A (zh) | 評分模型的建立、用戶信用的評估方法及裝置 | |
CN106682878A (zh) | 一种设计师匹配平台及方法 | |
CN106897459A (zh) | 一种基于半监督学习的文本敏感信息识别方法 | |
US20150379469A1 (en) | Consolidated client onboarding system | |
CN111563074B (zh) | 一种基于多维标签的数据质量检测方法和系统 | |
CN102129470A (zh) | 标签聚类方法和系统 | |
CN107729917A (zh) | 一种标题的分类方法及装置 | |
CN107248023B (zh) | 一种对标企业名单的筛选方法和装置 | |
CN107229614A (zh) | 用于分类数据的方法和装置 | |
CN108510396A (zh) | 投保校验的方法、装置、计算机设备及存储介质 | |
CN109858974A (zh) | 已购车用户识别模型构建方法及识别方法 | |
WO2018090643A1 (zh) | 客户分类方法、电子装置及存储介质 | |
CN109784352A (zh) | 一种评估分类模型的方法和装置 | |
Ma et al. | The approach to detect abnormal access behavior based on naive bayes algorithm | |
CN111210321B (zh) | 一种基于合同管理的风险预警方法及系统 | |
CN111625578A (zh) | 适用于文化科技融合领域时间序列数据的特征提取方法 | |
CN107992613A (zh) | 一种基于机器学习的文本挖掘技术消费维权指标分析方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23856489 Country of ref document: EP Kind code of ref document: A1 |