CN115375380A - Service data processing method and processing device based on attribute classification - Google Patents

Service data processing method and processing device based on attribute classification Download PDF

Info

Publication number
CN115375380A
CN115375380A CN202211314753.0A CN202211314753A CN115375380A CN 115375380 A CN115375380 A CN 115375380A CN 202211314753 A CN202211314753 A CN 202211314753A CN 115375380 A CN115375380 A CN 115375380A
Authority
CN
China
Prior art keywords
metadata
data
entity
server
root
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211314753.0A
Other languages
Chinese (zh)
Other versions
CN115375380B (en
Inventor
洪葵
胡盛利
钟天生
黄隆辉
龚晖
周涛
熊新宇
薛萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang Huilian Network Technology Co ltd
Original Assignee
Nanchang Huilian Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang Huilian Network Technology Co ltd filed Critical Nanchang Huilian Network Technology Co ltd
Priority to CN202211314753.0A priority Critical patent/CN115375380B/en
Publication of CN115375380A publication Critical patent/CN115375380A/en
Application granted granted Critical
Publication of CN115375380B publication Critical patent/CN115375380B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24573Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0203Market surveys; Market polls

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a service data processing method and a processing device based on attribute classification. The processing method is applied to an electronic commerce outsourcing platform which is accessed to a plurality of electronic commerce platforms to obtain source data, the main characteristics of a first entity part and the main characteristics of a second entity part are obtained by preprocessing multi-source service data, the first entity part is divided based on attribute weight, the second entity part is divided based on information entropy, and corresponding parameter lists are obtained respectively. The processing method includes the steps that different root characteristics are preset according to characteristics of multi-source business data, data classification is carried out on a parameter column through traversal of the root characteristics, results are stored in a metadata base of a front-end server of an electronic commerce platform in the form of an entity matching table, and SKU metadata, after-sales metadata, order metadata, stock metadata and popularization metadata are respectively constructed.

Description

Business data processing method and processing device based on attribute classification
Technical Field
The present invention relates to a technology for processing service information, and in particular, to a method and an apparatus for processing service data based on attribute classification.
Background
The extraction and fusion of data are basic tools in the application field of computer information technology, the data extraction based on a webpage end is generally carried out by adopting a Deep Web crawler and breadth-first search mode, the method is only used for obtaining data resources of a Web front end of the webpage, and for an electronic commerce platform under the condition of URL encryption, the data crawling efficiency is poor, and partial contents cannot be directly displayed. In the prior art, a plurality of system software manufacturers provide data exchange and fusion systems represented by DataStage, SQL Server DTS, primeton DI and the like, and the systems are too large in architecture, are often large-scale software engineering projects in application scenes, and are not suitable for software development plans of small-scale engineering. For a data extraction and fusion method of small engineering, the prior art focuses on data mining by using a distributed system, a neural network, and the like, for example, CN108804528B discloses a training model for fusing data, which can train in the data extraction process, and has an effect of converging data fusion. In business data oriented to the e-commerce outsourcing platform, the business data is used as data extraction and fusion application of a typical small software engineering project, in order to realize efficient data processing of the business data by the outsourcing platform, the prior art needs to be further improved, and corresponding business data processing methods and processing devices are constructed according to the characteristics of the business data, so that the e-commerce and the outsourcing platform can realize more efficient and deep application in a specific scene of data interaction.
Disclosure of Invention
Aiming at the problems, the invention provides a business data processing method and a processing device based on attribute classification, which divide the data related to an electronic commerce platform into five types, namely SKU data type, after-sales data type, order data type, inventory data type and promotion data type according to the demand type of an outsourcing platform. The method is divided into a digital type and a text type according to the data type, text data and digital data in a front-end server database are extracted to form a first entity part and a second entity part, the first entity part and the second entity part respectively construct corresponding parameter lists through attribute weight and information entropy classification, on the basis, the corresponding parameter lists are traversed based on text root characteristics and digital root characteristics preset in advance, different metadata are obtained through classification, the metadata comprise SKU metadata, after-sales metadata, order metadata, stock metadata and promotion metadata, metadata library index identifiers are distributed, and under the request of authorizing outsourcing server access, corresponding database calling permission is distributed according to the request identifiers of an API opened in advance.
The invention purpose of the application can be realized by the following technical scheme:
a business data processing method based on attribute classification comprises the following steps:
step 1: the e-commerce server issues a digital certificate for the outsourcing server and distributes a unique request identifier for the authorization terminal;
and 2, step: extracting service data of a front-end server to obtain a first entity part and a second entity part, and acquiring main features in the first entity part and the second entity part;
and 3, step 3: respectively calculating the attribute weight of the first entity part and the information entropy value of the second entity part, and setting a first parameter column K based on the attribute weight i Second parameter column K based on information entropy j
And 4, step 4: presetting the text root characteristics of the business data, traversing each first parameter column K based on the text root characteristics i The first parameter column accords with the text root characteristic standard and is included in a first entity matching table;
and 5: presetting the digital root characteristics of the service data, traversing each second parameter column K based on the digital root characteristics j Bringing the second parameter column into a second entity matching table according with the digital root characteristic standard;
step 6: storing the first entity matching table and the second entity matching table in a metadata database of a front-end server;
and 7: classifying a first entity matching table of the metadata base based on a naive Bayesian classifier, classifying a second entity matching table of the metadata base based on an entropy classifier, obtaining SKU metadata, after-sales metadata, order metadata, inventory metadata and promotion metadata, and providing a unique index identifier for any metadata;
and 8: the front-end server receives an access request from at least one outsourcing server, and verifies a request identifier;
and step 9: and opening the access right of the metadata base for the outsourcing server according to the request identifier, wherein the outsourcing server loads service flow information based on the metadata opened by the E-commerce server.
In the invention, the request identifier comprises an identity code of the outsourcing server and a hash abstract of a digital signature, wherein the identity code is a sub-table number in a parent table of a storage outsourcing platform database in an e-commerce platform database; the hash digest of the digital signature is a hash function value generated by the digital signature through a pseudo-random number.
In the invention, the first entity part is a text feature vector, and the second entity part is a digital feature vector.
In the invention, the text feature vector and the digital feature vector are feature vectors obtained by vector feature processing of the preprocessed text data and the preprocessed digital data respectively.
In the invention, the text root characteristic and the digital root characteristic are at least one characteristic value of the service data, and the text root characteristic and the digital root characteristic describe the repetition degree of the same parameter list and the similarity of different parameter lists.
In the present invention, the first parameter column K i And a second parameter column K j The data columns obtained under the attribute classification condition and the information entropy classification condition respectively contain at least one item of information of text data and digital data in an e-commerce platform database.
In the present invention, the data set D = { x ] in the second entity portion is extracted 1 ,x 2 ,…,x n And expressing the data set D by using the information entropy, and traversing each data set in the second entity part to obtain a corresponding second parameter column.
In the invention, the index identifier is formed by the identity code of the outsourcing platform and the ID field which is the primary key in the generated metadata parent table, and one index identifier corresponds to one unique request identifier.
A processing device of the service data processing method based on attribute classification comprises an e-commerce server, an outsourcing server and a front-end server, wherein the front-end server comprises an extraction unit, a first storage unit, a second storage unit, a judgment unit and a TCAM main control unit.
In the invention, the TCAM main control unit is respectively composed of an N-TCAM chip and a W-TCAM chip, and the second storage unit, the N-TCAM chip and the W-TCAM chip form a communication loop for full duplex communication.
The implementation of the business data processing method and the processing device based on attribute classification has the following beneficial effects: according to the characteristic that the e-commerce outsourcing platform acquires information, the service data are classified into a plurality of metadata, and an authorized metadata database interface is opened for the outsourcing platform, so that the outsourcing platform can only acquire data information related to outsourcing content on the basis that the e-commerce platform is authorized, and cannot acquire all information of the whole e-commerce platform merchant, the data safety of the e-commerce platform merchant is guaranteed, and the processing efficiency of the outsourcing platform for acquiring the service data is improved. In addition, for multi-source business data, considering that the types of electronic commerce platforms are different, the storage modes and data formats of one type of electronic commerce and two types of electronic commerce are different, and the coding modes of different commodities and the difference of database bottom layer design logics limit the data extraction and fusion efficiency of the electronic commerce platform.
Drawings
FIG. 1 is a flowchart of a method for processing service data based on attribute classification according to the present invention;
FIG. 2 is a schematic diagram of matching domain processing by information entropy values in accordance with the present invention;
FIG. 3 is a diagram illustrating a second entity portion after matching domain processing by information entropy;
FIG. 4 is a diagram of a first parameter column and a second parameter column according to the present invention;
fig. 5 is a hardware block diagram of a service data processing apparatus based on attribute classification according to the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
For the data extraction requirements of the e-commerce merchant and the outsourcing platform, from the cooperative perspective, for the outsourcing platform, the relevant information of the e-commerce platform merchant needs to be acquired, and the data information can be called and changed in real time, so that the dynamic sharing of data is realized. For the e-commerce merchant platform, limited partial data information is expected to be provided for the outsourcing platform, and the outsourcing platform can complete outsourcing work according to the provided partial data information on the premise of not revealing other information containing business confidential data. The electronic commerce outsourcing requirements mainly comprise application in five aspects of after-sale customer service outsourcing, order delivery outsourcing, commodity management outsourcing, commodity promotion outsourcing and inventory management outsourcing, a single outsourcing platform often needs to acquire source data of a plurality of electronic commerce platforms, the electronic commerce platforms are divided into a first-class electronic commerce platform and a second-class electronic commerce platform according to the types of the electronic commerce platforms, and the outsourcing platform needs to integrate the source data of the electronic commerce platforms of different types.
In the face of business data of electronic commerce platforms, because the bottom layer development logics of databases of different electronic commerce platforms are different, the design method of a database table has differences, and if a general method is adopted to call a multi-platform database for data processing and then a result is output, the problem of data error or data loss is probably caused. Starting from the basic requirement of the outsourcing platform for completing the electronic commerce outsourcing work, dividing multi-source data into two entity classes, dividing text entities and data entities into independent parameter columns, and presetting root features to extract effective data in the parameter columns according to the data types required by the electronic commerce outsourcing platform. This solution is applied to a service data processing method and processing apparatus based on attribute classification detailed in this embodiment, and with reference to fig. 1, includes the following steps:
step 1: the electronic commerce platform issues digital signature authentication for the outsourcing server, opens an API (application programming interface) interface and distributes a unique request identifier for the outsourcing server. The electronic commerce platform authorizes for an outsourcing platform, an identity code is distributed for the outsourcing platform according to the pseudorandom seed sequence, the hash function carries out digital signature authentication through the pseudorandom seed sequence to obtain a hash abstract, and the hash abstract and the identity code jointly form a request identifier. And writing a database table based on outsourcing platform identity codes into the e-commerce platform database, wherein the outsourcing platform can access the front-end database through the identity authentication of the API.
Step 2: and extracting the service data in the front-end server to obtain a first entity part and a second entity part, and acquiring the main features in the first entity part and the second entity part.
Because the e-commerce commodity information contains more information with identification meaning, the only identification information is entity information for the sale condition of any commodity. In this embodiment, the metadata database is specifically divided into a plurality of categories, and the metadata databases between different categories include two types, namely a text entity and a data entity. In the SKU metadata and the after-sales metadata, a text entity is needed; only digital entities are needed in the order unit data and the inventory metadata; the promotion metadata requires a text entity and a data entity. The method comprises the steps of dividing multi-source data information of an e-commerce platform into a first entity and a second entity, wherein the first entity is a text characteristic vector, and the second entity is a data characteristic vector.
And extracting the data point characteristics in the electronic commerce platform by adopting a layered extraction method. Firstly, various descriptive texts contained in a front-end server are extracted, the descriptive texts comprise numbers, letters, symbols and the like, the descriptive texts are stored in a data structure model _ des _ model, each phrase is segmented, and the word frequency value is preliminarily counted. Secondly, segmenting the descriptive text of the residual text in the extracted front-end database according to the word segmentation marks, splitting the text according to spaces and brackets, and storing the split text into a data structure commodity _ des _ keyword. And finally, performing word segmentation processing on the text content according to the grammar, labeling different parts of speech, and performing multiple cycles until the text fine granularity conforms to the subsequent text vectorization processing flow. In this embodiment, whether the length of the character after splitting is less than or equal to 6 is taken as a judgment standard, and if so, the circulation of word segmentation processing is not performed; if not, continuing to perform word segmentation processing circulation.
After text vectorization processing, text features are extracted through a TD-IDF method, in the process of word frequency statistics, word frequencies related to e-commerce classes are given higher weights, and under the condition that the word frequency of e-commerce class noun information is not high, the higher word frequency weights are still distributed, so that e-commerce class nouns are more prone to being selected as text feature vectors. In this embodiment, the e-commerce noun directory information is based on the acquisition of the word frequency of the 2019 version of Amazon data directory.
And 3, step 3: respectively calculating the attribute weight of the first entity part and the information entropy value of the second entity part, and setting a first parameter column K based on the attribute weight i Second parameter column K based on information entropy j
By applying to the second entity partThe data increase data packet header information entropy contains N preset related rules to form a rule set lambda, and different data header packets are matched through the N preset rules to obtain the data packet header information entropy value of
Figure 759045DEST_PATH_IMAGE001
Wherein p is i The probability of occurrence of a rule for an arbitrary data packet header. In this embodiment, if there is no significant difference in the requirements of different metadata classes for data, it is assumed that all rule weight distributions are equal, i.e. p i =1/N, the information entropy value of the data packet header can be calculated as H = lbN.
In the method for extracting the second parameter column from the information entropy value, which is preferred in this embodiment, the width of the data is reduced as much as possible by clipping the matching field, so that the data can be accurately classified into a plurality of metadata bases in the subsequent process. Taking the information entropy value of the data packet header rule as the standard for dividing the parameter list in the single matching domain, referring to fig. 2, in the process of dividing the second entity into the second parameter list, if the single M is considered 1 M alone 2 And M alone 3 Entropy H (M) of data packet header information when performing matching field clipping 1 )>H(M 3 )>H(M 2 ) Then consider M alone 1 Participating in the matching field clipping process may maximize the reduction of packet header information entropy. Thus, M 1 The parameter column is formed separately, apart from the first entity part, see FIG. 3, M 1 Is detached from the first entity portion, and M 2 And M 3 Middle loss of original M 1 The preset rule of (1).
The first parameter column includes a first level parameter and a second level parameter. In the electronic commerce platform information acquired by the outsourcing platform, the information attributes contained by different electronic commerce platforms are different, and the information attributes of different shops in the same electronic commerce platform are different. Referring to fig. 4, the first parameter column includes two columns of parameters, wherein the first-level parameter is the parameter name of the class information, and the second-level parameter is the specific parameter content under the class information.
In this embodiment, the first parameter column is classified based on the attribute weight, the attribute classification condition is divided into a fixed attribute, a variable attribute and a semi-variable attribute, and the fixed attribute is a commodity name, a specification parameter and a brand model; the variable attributes are SKU, commodity price and popularization information; the semi-variable attribute is after-sale information and inventory information.
And 4, step 4: presetting a text root characteristic of business data, traversing each first parameter column K based on the text root characteristic i And incorporating the first parameter column which accords with the text root characteristic standard into the first entity matching table.
In this embodiment, the text root feature is set by using an information gain method, and the text root feature is a high-frequency vocabulary in the information category required by the e-commerce outsourcing platform. For example, after an e-commerce platform merchant outsources a commodity on shelf, an authorized outsourcing platform needs to obtain SKU metadata, which includes but is not limited to "style", "color", "size", "brand", "model", "applicable group", and a text word of a certain category in the artificially selected e-commerce platform is a preset feature element. In this embodiment, based on an initially set root set, according to the type of database information required by the outsourcing platform, feature elements x are preset, a plurality of feature words are selected from the first entity part to form the root set, a weight of each feature word in the root set is calculated, and feature words with weights larger than a threshold standard are extracted as text root features. Threshold value for judging feature word as text root feature weight
Figure 82710DEST_PATH_IMAGE002
Wherein d is the number of the selected feature words, x is the type of the preset feature elements,
Figure 659185DEST_PATH_IMAGE003
the average of the weights of the various eigenvalues in the root set.
In this embodiment, the first entity matching table is the first parameter column K i The first entity matching table comprises a database parent table and a database child table, the database parent table is different electronic commerce information types, and the database child table is each electronic commerce informationThe characteristic words contained in the classes.
And 5: presetting the digital root characteristics of the service data, traversing each second parameter column K based on the digital root characteristics j And incorporating the second parameter column into a second entity matching table according to the number root characteristic standard.
Digital information of different platforms in multi-source electronic business information has a certain rule, and the extraction and fusion efficiency of the digital information can be improved by matching the corresponding data extraction and fusion method according to the rule. In this embodiment, the digital root feature is used to extract digital information from an order number, a logistics number, a commodity code, and a SKU code of an e-commerce platform, and is set by using an information gain method, which is the same as the above extraction method of the root feature, and is not described herein again.
In this embodiment, the second entity matching table is the second parameter column K j The second entity matching table comprises a database parent table and a database child table, the database parent table is of different electronic commerce information types, and the database child table is a feature number combination contained in each electronic commerce information type.
And 6: the first entity matching table and the second entity matching table are stored in a metadata base of the front-end server, and unique index identifiers are provided for different types of metadata base information.
The first entity matching table and the second entity matching table are arranged in a metadata base of the front-end server, any metadata type comprises a database parent table and a plurality of database child tables, a main key of each type in the database parent table is used as an index identifier, and the identity code of the outsourced server is added and is used for corresponding to the request identifier.
And 7: and carrying out metadata base classification on the first entity matching table based on a naive Bayesian classifier, and carrying out metadata base classification on the second entity matching table based on an entropy classifier to obtain SKU metadata, after-sales metadata, order metadata, inventory metadata and popularization metadata.
The design method of the HBase distributed database preferred in this embodiment provides multidimensional mapping for all database tables. In a metadata base, the database needs to be segmented based on HBase distributed database design, distributed different nodes are stored and adjusted, and a first entity table and a second entity table need to be horizontally sliced and vertically sliced in the metadata base. For example, a horizontal slicing mode is adopted in the classification process of the promotion metadata base, and the slicing a and the slicing b are constructed according to the creation time of the promotion order. The segment a is all promotion conditions in the current plan, including promotion state, promotion consumption amount, OCPX consumption amount and user-defined promotion consumption amount. The section b is the ROI calculated based on the section a and the basic attributes of the product, including promotion name, product number, product name, product state and ROI. The index identifier of any database table comprises a keyword and a timestamp, the timestamp can be marked under the condition that the storage information in the database is changed, and when the outsourcing platform is accessed, the time for extracting the data is consistent with the time marked by the timestamp.
And 8: a front-end server of the e-commerce platform receives an access request from at least one outsource server, the front-end server verifying a request identifier.
In this embodiment, the index identifier encodes the identity code and the digital signature exchanged by the request identifier, and if the identity code and the digital signature are checked to be consistent, the outsourced platform corresponding to the request identifier is allowed to call the metadata base information; if the check is wrong, refusing the request identifier information and sending refusing information to the outsourcing platform; and if the check is null, refusing to request the identifier information and sending the date information to the outsourcing platform. Preferably, the information exchange between the identifier and the index identifier is present in any request to access the front-end database.
And step 9: and opening the access authority of the metadata base for the outsourcing platform according to the request identifier, allowing the access request, and loading service flow information by the outsourcing server based on the metadata opened by the E-commerce server. And realizing data fusion and exchange according to the configuration of the butt joint between the page components.
The preferable outsourcing platform interface design mode of the embodiment is based on JSP technology, the data extraction capability and the man-machine interaction of an outsourcing platform are improved by utilizing a B/S framework, the system framework is divided into a presentation layer, a business logic layer, a data access layer and a metadata database, the front-end business logic of the system is compiled by adopting a React framework, the visualization of a page assembly is built, the multi-source business data processing result is built by utilizing a MyBatis framework in the presentation layer, the system can switch login identities, outsourcing work of a plurality of electronic commerce platforms is processed in the same page assembly, and the business flow exchange of the processed multi-source business data among different page assemblies is not influenced by the modification of the platform database.
Example two
The processing device for the business data comprises an e-commerce server, an outsourcing server and a front-end server, wherein the front-end server comprises an extraction unit, a first storage unit, a second storage unit, a judgment unit and a TCAM main control unit. In this embodiment, the TCAM main control unit is respectively composed of an N-TCAM chip and a W-TCAM chip, and the second storage unit, the N-TCAM chip and the W-TCAM chip form a communication loop to perform full duplex communication.
Referring to fig. 5, the TCAM main control unit is composed of an N-TCAM and a W-TCAM, where the N-TCAM stores a flow table obtained after the matching field is cut, and the W-TCAM storing the original flow table width can store a first entity matching table and a second entity matching table. The main features of the first entity part and the second entity part are stored in a first storage unit, after a first parameter list and a second parameter list are formed through a matching domain extraction circuit, extraction and classification are carried out through a TCAM main control unit, an obtained first entity matching table and an obtained second entity matching table are stored in a second storage unit, and corresponding metadata base classification is output through a judgment module.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principles of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A service data processing method based on attribute classification is characterized by comprising the following steps:
step 1: the e-commerce server issues a digital certificate for the outsourcing server and distributes a unique request identifier for the authorization terminal;
step 2: extracting service data of a front-end server to obtain a first entity part and a second entity part, and acquiring main features in the first entity part and the second entity part;
and step 3: respectively calculating the attribute weight of the first entity part and the information entropy value of the second entity part, and setting a first parameter column K based on the attribute weight i Second parameter column K based on information entropy j
And 4, step 4: presetting the text root characteristics of the business data, traversing each first parameter column K based on the text root characteristics i The first parameter column accords with the text root characteristic standard and is included in a first entity matching table;
and 5: presetting the digital root characteristics of the service data, traversing each second parameter column K based on the digital root characteristics j Incorporating a second parameter column meeting the numerical root characterization criteria into a second entity matching table;
step 6: storing the first entity matching table and the second entity matching table to a metadata base of a front-end server;
and 7: classifying a first entity matching table of the metadata base based on a naive Bayesian classifier, classifying a second entity matching table of the metadata base based on an entropy classifier, obtaining SKU metadata, after-sales metadata, order metadata, inventory metadata and promotion metadata, and providing a unique index identifier for any metadata;
and step 8: the front-end server receives an access request from at least one outsourcing server, and verifies a request identifier;
and step 9: and opening a metadata base access right for the outsourcing server according to the request identifier, wherein the outsourcing server loads service flow information based on metadata opened by the E-business server.
2. The method for processing business data based on attribute classification as claimed in claim 1, wherein the request identifier comprises an identity code of the outsource server, a hash digest of the digital signature, the identity code being a child table number in a parent table of a stored outsource platform database in the e-commerce platform database; the hash digest of the digital signature is a hash function value generated by the digital signature through a pseudo-random number.
3. The method as claimed in claim 1, wherein the first entity part is a text feature vector and the second entity part is a digital feature vector.
4. The method according to claim 3, wherein the text feature vector and the digital feature vector are feature vectors obtained by vector feature processing of preprocessed text data and digital data, respectively.
5. The method of claim 1, wherein the textual root feature and the numeric root feature are at least one feature value of the business data, and the textual root feature and the numeric root feature describe a repetition degree of a same parameter column and a similarity degree of different parameter columns.
6. The method of claim 1, wherein the first parameter column K is a first parameter column i And a second parameter column K j The data columns obtained under the attribute classification condition and the information entropy classification condition respectively contain at least one item of information of text data and digital data in an e-commerce platform database.
7. The method of claim 1, wherein the data set D = { x } in the second entity portion is extracted 1 ,x 2 ,…,x n Expressing the data set D by information entropy, traversing each data set in the second entity part to obtain a corresponding second parameterAnd (4) columns.
8. The method of claim 1, wherein the index identifier is formed by an identity code of the outsourcing platform and an ID field of the primary key in the generated metadata parent table, and one index identifier corresponds to only one request identifier.
9. The processing device of the attribute classification-based service data processing method according to claim 1, comprising an e-commerce server, an outsourcing server, and a front-end server, wherein the front-end server comprises an extraction unit, a first storage unit, a second storage unit, a decision unit, and a TCAM main control unit.
10. The processing apparatus according to claim 9, wherein the TCAM main control unit is respectively composed of an N-TCAM chip and a W-TCAM chip, and the second storage unit, the N-TCAM chip and the W-TCAM chip form a communication loop for full duplex communication.
CN202211314753.0A 2022-10-26 2022-10-26 Service data processing method and processing device based on attribute classification Active CN115375380B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211314753.0A CN115375380B (en) 2022-10-26 2022-10-26 Service data processing method and processing device based on attribute classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211314753.0A CN115375380B (en) 2022-10-26 2022-10-26 Service data processing method and processing device based on attribute classification

Publications (2)

Publication Number Publication Date
CN115375380A true CN115375380A (en) 2022-11-22
CN115375380B CN115375380B (en) 2023-02-03

Family

ID=84073504

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211314753.0A Active CN115375380B (en) 2022-10-26 2022-10-26 Service data processing method and processing device based on attribute classification

Country Status (1)

Country Link
CN (1) CN115375380B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794827A (en) * 2022-11-29 2023-03-14 广发银行股份有限公司 Data table structure management system and method
CN116304886A (en) * 2023-05-12 2023-06-23 江苏网进科技股份有限公司 Metadata intelligent classification method based on machine learning and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070061303A1 (en) * 2005-09-14 2007-03-15 Jorey Ramer Mobile search result clustering
CN105653732A (en) * 2016-02-02 2016-06-08 优品财富管理有限公司 Automatic table establishment method and system based on metadata
CN108921631A (en) * 2018-04-18 2018-11-30 长沙九行天下电子商务有限公司 E-commerce platform system implementation method and terminal
CN112800298A (en) * 2021-02-01 2021-05-14 广州威创信息技术有限公司 Internet-based electronic commerce data processing method and system
CN112905845A (en) * 2021-03-17 2021-06-04 重庆大学 Multi-source unstructured data cleaning method for discrete intelligent manufacturing application
CN113268500A (en) * 2021-06-08 2021-08-17 中国联合网络通信集团有限公司 Service processing method and device and electronic equipment
WO2021210992A1 (en) * 2020-04-15 2021-10-21 Xero Limited Systems and methods for determining entity attribute representations
CN114462384A (en) * 2022-04-12 2022-05-10 北京大学 Metadata automatic generation device for digital object modeling
CN114756207A (en) * 2022-04-13 2022-07-15 北京沃东天骏信息技术有限公司 Business system development method, PaaS platform and related equipment
CN114969484A (en) * 2022-05-18 2022-08-30 中国平安财产保险股份有限公司 Service data searching method, device, equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070061303A1 (en) * 2005-09-14 2007-03-15 Jorey Ramer Mobile search result clustering
CN105653732A (en) * 2016-02-02 2016-06-08 优品财富管理有限公司 Automatic table establishment method and system based on metadata
CN108921631A (en) * 2018-04-18 2018-11-30 长沙九行天下电子商务有限公司 E-commerce platform system implementation method and terminal
WO2021210992A1 (en) * 2020-04-15 2021-10-21 Xero Limited Systems and methods for determining entity attribute representations
CN112800298A (en) * 2021-02-01 2021-05-14 广州威创信息技术有限公司 Internet-based electronic commerce data processing method and system
CN112905845A (en) * 2021-03-17 2021-06-04 重庆大学 Multi-source unstructured data cleaning method for discrete intelligent manufacturing application
CN113268500A (en) * 2021-06-08 2021-08-17 中国联合网络通信集团有限公司 Service processing method and device and electronic equipment
CN114462384A (en) * 2022-04-12 2022-05-10 北京大学 Metadata automatic generation device for digital object modeling
CN114756207A (en) * 2022-04-13 2022-07-15 北京沃东天骏信息技术有限公司 Business system development method, PaaS platform and related equipment
CN114969484A (en) * 2022-05-18 2022-08-30 中国平安财产保险股份有限公司 Service data searching method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
VOJTECH VALENTA; JAN PIDANIC; ONDREJ NEMEC: ""The Process of Metadata Management for Radar Target Classification Algorithm Development"", 《2020 NEW TRENDS IN SIGNAL PROCESSING (NTSP)》 *
刘业峰: "基于BS结构的B2C电子商务管理系统设计与开发", 《沈阳工程学院学报(自然科学版)》 *
刘晗; 刘宁; 王伟; 张云鹏; 禚俊杰: ""基于电力大数据的数据统一管理方法研究"", 《全国第四届"智能电网"会议论文集》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794827A (en) * 2022-11-29 2023-03-14 广发银行股份有限公司 Data table structure management system and method
CN115794827B (en) * 2022-11-29 2023-07-21 广发银行股份有限公司 Data table structure management system and method
CN116304886A (en) * 2023-05-12 2023-06-23 江苏网进科技股份有限公司 Metadata intelligent classification method based on machine learning and storage medium

Also Published As

Publication number Publication date
CN115375380B (en) 2023-02-03

Similar Documents

Publication Publication Date Title
US11574077B2 (en) Systems and methods for removing identifiable information
CN115375380B (en) Service data processing method and processing device based on attribute classification
CN107391687B (en) Local log website-oriented hybrid recommendation system
US10467664B2 (en) Method for detecting spam reviews written on websites
US8190621B2 (en) Method, system, and computer readable recording medium for filtering obscene contents
KR20150010740A (en) On-line product search method and system
CN101496003A (en) Compatibility scoring of users in a social network
CN112836130A (en) Context-aware recommendation system and method based on federated learning
CN110909182A (en) Multimedia resource searching method and device, computer equipment and storage medium
CN105787025A (en) Network platform public account classifying method and device
CN105741121B (en) It is a kind of based on entry reference product traceability information write and storage method
CN105574200A (en) User interest extraction method based on historical record
CN113010701A (en) Video-centered fused media content recommendation method and device
CN111026858A (en) Project information processing method and device based on project recommendation model
CN112506925A (en) Data retrieval system and method based on block chain
CN105426744A (en) Method and apparatus for setting password protection question
CN105989125A (en) Searching method and system for carrying out label identification on resultless word
CN107391650A (en) A kind of structuring method for splitting of document, apparatus and system
CN116992052B (en) Long text abstracting method and device for threat information field and electronic equipment
CN116823410B (en) Data processing method, object processing method, recommending method and computing device
CN105512334A (en) Data mining method based on search words
CN103164407A (en) Information searching method and system
CN113034231B (en) Multi-supply chain commodity intelligent recommendation system and method based on SaaS cloud service
CN114282119A (en) Scientific and technological information resource retrieval method and system based on heterogeneous information network
CN114022233A (en) Novel commodity recommendation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant