CN107025233B - Data feature processing method and device - Google Patents

Data feature processing method and device Download PDF

Info

Publication number
CN107025233B
CN107025233B CN201610066847.9A CN201610066847A CN107025233B CN 107025233 B CN107025233 B CN 107025233B CN 201610066847 A CN201610066847 A CN 201610066847A CN 107025233 B CN107025233 B CN 107025233B
Authority
CN
China
Prior art keywords
feature
plaintext
sample
field
fields
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610066847.9A
Other languages
Chinese (zh)
Other versions
CN107025233A (en
Inventor
张研
杨冠军
蒋程诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen yunwangwandian e-commerce Co.,Ltd.
Original Assignee
Suning Cloud Computing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Cloud Computing Co Ltd filed Critical Suning Cloud Computing Co Ltd
Priority to CN201610066847.9A priority Critical patent/CN107025233B/en
Publication of CN107025233A publication Critical patent/CN107025233A/en
Application granted granted Critical
Publication of CN107025233B publication Critical patent/CN107025233B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a data feature processing method and device, relates to the technical field of big data processing, and can reduce the cost of data extraction and improve the accuracy of data extraction. The method of the invention comprises the following steps: obtaining a plaintext sample from a service log, wherein the plaintext sample at least comprises a special field and a characteristic field, and the special field comprises a field for representing an execution command and an operation command; according to a pre-configured feature class, obtaining a feature plaintext from the feature field, and recording a sample signature, wherein special fields with the same content correspond to the same sample signature; extracting a special field corresponding to the sample signature, and splicing the obtained characteristic plaintext to the special field to obtain a spliced field; and outputting the spliced field as a feature sample. The method is suitable for data feature extraction in big data processing.

Description

Data feature processing method and device
Technical Field
The present invention relates to the field of big data processing technologies, and in particular, to a method and an apparatus for processing data characteristics.
Background
With the development of internet technology, the data volume of online data increases exponentially, and in order to deal with the processing of massive data, many big data processing schemes are developed to extract required information from massive data.
For data in different fields and different types, due to the large difference in data dimensions, formats and the like, the data sources are also complicated, so that a lot of computing resources are occupied to screen and extract required information from massive data. In the existing scheme, effective data features are extracted through a certain programming language mainly in a text processing or data table mode, so that data extraction is realized.
However, the data characteristics of the data table are single, and it is difficult to accurately describe the profile of the data really required by the user, thereby affecting the effects of subsequent data analysis and modeling. Particularly, in a service data processing system with a high refresh frequency, such as an advertisement system, frequent updating and modeling of large-scale and multidimensional advertisement data are required, the cost is high, but the accuracy of data extraction is still low.
Disclosure of Invention
Embodiments of the present invention provide a data feature processing method and apparatus, which can reduce data extraction cost and improve data extraction accuracy.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
in a first aspect, an embodiment of the present invention provides a method for processing data characteristics, including:
obtaining a plaintext sample from a service log, wherein the plaintext sample at least comprises a special field and a characteristic field, and the special field comprises a field for representing an execution command and an operation command;
according to a pre-configured feature class, obtaining a feature plaintext from the feature field, and recording a sample signature, wherein special fields with the same content correspond to the same sample signature;
extracting a special field corresponding to the sample signature, and splicing the obtained characteristic plaintext to the special field to obtain a spliced field;
and outputting the spliced field as a feature sample.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the obtaining a plaintext sample from a traffic log includes:
reading a plaintext field in the service log;
culling a first type field from the plaintext fields; and/or converting characters of a second type field in the plaintext field into a specified form;
and storing the fields subjected to the elimination and/or conversion processing into a memory in a Map mode through a MapReduce framework.
With reference to the first aspect, in a second possible implementation manner of the first aspect, the obtaining a feature plaintext from the feature field according to a pre-configured feature class includes:
sequentially reading fields in the feature class, wherein the content of the fields in the feature class is the same as that of at least one field in the plaintext sample;
according to the content of the fields in the feature class, sequentially reading the fields with the same content from the plaintext sample as the feature fields;
recording the feature fields sequentially read from the plaintext samples in a feature set.
With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner, the outputting the spliced field as a feature sample includes:
importing the feature sample and the feature set into a Reduce stage through a MapReduce framework;
the recording the characteristic fields sequentially read from the plaintext sample in a characteristic set includes: outputting the same feature fields read from the plaintext samples to the same compute node.
With reference to the first aspect, in a fourth possible implementation manner of the first aspect, the method further includes:
reading a basic feature class and updating the basic feature class through a reflection mechanism;
and taking the basic feature class which is updated last time as the pre-configured feature class.
In a second aspect, an embodiment of the present invention provides a data feature processing apparatus, including:
the system comprises an extraction unit, a processing unit and a control unit, wherein the extraction unit is used for acquiring a plaintext sample from a service log, the plaintext sample at least comprises a special field and a characteristic field, and the special field comprises a field for representing an execution command and an operation command;
the identification unit is used for acquiring a feature plaintext from the feature field according to a pre-configured feature class and recording a sample signature, wherein special fields with the same content correspond to the same sample signature;
the splicing unit is used for extracting a special field corresponding to the sample signature, and splicing the acquired feature plaintext to the special field to obtain a spliced field;
and the output unit is used for outputting the spliced field as a characteristic sample.
With reference to the second aspect, in a first possible implementation manner of the second aspect, the apparatus further includes a preprocessing unit, configured to read a plaintext field in the service log; and eliminating a first type field from the plaintext field; and/or converting characters of a second type field in the plaintext field into a specified form; and storing the field subjected to the elimination and/or conversion processing into a memory in a Map mode through a MapReduce framework.
With reference to the second aspect, in a second possible implementation manner of the second aspect, the identifying unit is specifically configured to sequentially read fields in the feature class, where the fields in the feature class have the same content as at least one field in the plaintext sample; reading fields with the same content from the plaintext sample in sequence as the characteristic fields according to the content of the fields in the characteristic class; and recording the characteristic fields read from the plaintext samples in a characteristic set.
With reference to the second possible implementation manner of the second aspect, in a third possible implementation manner, the output unit is specifically configured to import the feature sample and the feature set into a Reduce stage through a MapReduce framework; and outputting the same characteristic field read from the plaintext sample to the same compute node.
With reference to the second aspect, in a fourth possible implementation manner of the second aspect, the system further includes a feature class management unit, configured to read a basic feature class and update the basic feature class through a reflection mechanism; and using the basic feature class updated most recently as the pre-configured feature class.
According to the data feature processing method and device provided by the embodiment of the invention, according to the pre-configured feature class, the feature plaintext is obtained from the feature field of the plaintext sample, the sample signature is recorded, a special field corresponding to the sample signature is extracted, the feature plaintext and the special field are spliced, and the spliced field is output as the feature sample and used as the feature sample for data extraction. Compared with the prior art, the method and the device have the advantages that the required features are extracted from the mass data, the problem that large-scale and multi-dimensional data are difficult to extract in the prior art, and the problem that modeling needs to be updated frequently is solved, so that the cost of data extraction is reduced, and the accuracy of data extraction is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a system architecture according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for processing data characteristics according to an embodiment of the present invention;
fig. 3a, fig. 3b and fig. 3c are schematic structural diagrams of a data feature processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items. It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The present embodiment may adopt a MapReduce-based distributed processing framework (may also be referred to as a MapReduce framework), where a specific architecture of the MapReduce framework used in the present embodiment may be as shown in fig. 1. In the execution process, the data to be processed is stored in the memory in a map mode. If a MapReduce framework based on hadoop is adopted, extracting and outputting a characteristic field and a special field of data in a map stage for characteristic extraction, and accumulating the same characteristic field in a reduce stage; for the samples, sample extraction is carried out in the map stage, and the characteristic samples recorded with sample signatures are output in the reduce stage.
An embodiment of the present invention provides a data feature processing method, as shown in fig. 2, including:
and S1, acquiring a plaintext sample from the service log.
Wherein the plaintext sample comprises at least a special field and a characteristic field, and the special field comprises a field for representing an execution command and an operation command. The service log may be log data recorded when the service system runs, for example: log data recorded while the advertisement delivery system is running. The plaintext sample may be an unencrypted character in the service log, and the obtained plaintext sample may specifically be in a text form conforming to tab separation, and includes special fields for indicating "presence presentation" and "click", such as: "show" and "clk".
The processes S1-S4 may be specifically executed by a server in the map phase in the MapReduce framework.
And S2, acquiring feature plaintext from the feature field according to the pre-configured feature class, and recording a sample signature.
In this embodiment, the server at the map stage reads a pre-configured feature class, where the feature class includes fields configured in a sequence to the feature class, and the content of a field in the feature class is the same as that of at least one field in the plaintext sample. And the server at the map stage reads the input plaintext sample in a key-value mode according to the pre-configured feature class, and stores the plaintext sample in the memory in the map mode. The memory described in this embodiment may specifically be a memory of a local device of a user, or a memory of a server at a map stage.
The server in the map stage can strip the special fields for indicating 'presence presentation' and 'click' in the plaintext sample; and sequentially extracting the characteristic fields from the plaintext samples according to the field contents recorded in the preset characteristic class. The sample signature corresponds to the plaintext sample, and the special fields used for indicating "presence presentation" and "click" in the plaintext sample are repeated for many times, so that the special fields with the same content in the same plaintext sample correspond to the same sample signature. The sample signature may be distributed by the server when the plaintext sample is stored in the memory in a map manner, or may be preconfigured in the plaintext sample.
And S3, extracting a special field corresponding to the sample signature, and splicing the obtained characteristic plaintext to the special field to obtain a spliced field.
For example: for the plaintext samples: "show clk A …, show clk B …, show clk C …, show clkD",
wherein, the special field is "show clk", and the characteristic field is "a B C D", so that the characteristic can be obtained: splicing A show clk, B show clk, C show clk and D show clk to obtain spliced fields: "show clk feaAfeB feaC feaD".
And S4, outputting the spliced field as a feature sample.
Wherein, the server in the map phase can output the characteristic sample to the server in the reduce phase.
In this embodiment, for feature extraction, feature plaintext needs to be acquired from a feature field at a map stage according to a pre-configured feature class, and the pre-configured feature class can be acquired through a reflection mechanism in java, so that a user does not need to develop a feature extraction program based on a data table in the prior art for general requirements when extracting features; for special requirements, the required features are extracted from the mass data according to the pre-configured feature classes by only using the feature extraction framework (i.e., the MapReduce framework for running the execution flow of the embodiment) of the present embodiment.
The reflection mechanism employed in this embodiment includes: at the time of compiling, it is not determined which class needs to be loaded, but a specific class is loaded when the program runs, so that the structural attribute of the class is obtained. Classes that are not known at compile time are used. Such as: when a Class is loaded, the Java virtual machine automatically generates a Class object, and obtains information such as a method and a member corresponding to the Class object loaded in the virtual machine, and statement and definition of a construction method through the Class object. Specifically, for example, the process of obtaining the pre-configured feature class through the reflection mechanism in java may include:
using the java reflection mechanism, a Feature class factory class (Feature) is defined, as shown in the following code:
Figure BDA0000917831340000071
Figure BDA0000917831340000081
and configuring the feature class name under personal service configuration when extracting the features, wherein the configuration of a plurality of slots and a plurality of features is supported. And no early loading is required.
And then when the user configuration file is called, analyzing the user configuration file to obtain a feature class name according to the slot number and reflecting a feature analysis class for a feature extraction program to use to extract features. The method comprises the steps of adding any type of feature extraction service classes according to specific service requirements, configuring feature class names in configuration files, and using feature classes written by users for different slots during feature extraction. Further, the preprocessing class processing also defines a preprocessing factory class separately to utilize the reflection mechanism of java.
According to the data characteristic processing method provided by the embodiment of the invention, according to the preset characteristic class, the characteristic plaintext is obtained from the characteristic field of the plaintext sample, the sample signature is recorded, a special field corresponding to the sample signature is extracted, the characteristic plaintext and the special field are spliced, and the spliced field is output as the characteristic sample and is used as the characteristic sample for data extraction. Compared with the prior art, the method and the device have the advantages that the required features are extracted from the mass data, the problem that large-scale and multi-dimensional data are difficult to extract in the prior art, and the problem that modeling needs to be updated frequently is solved, so that the cost of data extraction is reduced, and the accuracy of data extraction is improved.
In this embodiment, the server at the map stage may also perform preprocessing on the plaintext sample stored in the memory in the map manner or on the field in the plaintext sample before the plaintext sample is stored in the memory, for example: the characters based on the encoding modes of URL-ENCODE, base64 and the like can be subjected to preprocessing such as half-angle full-angle conversion, English capital and small case conversion and the like, and can also comprise a user-defined preprocessing process. Thus, the obtaining of the plaintext sample from the service log includes:
and reading a plaintext field in the service log. And eliminating the first type field in the plaintext field. And/or converting characters of a second type field in the plaintext field into a specified form. And storing the fields subjected to the elimination and/or conversion processing into a memory in a Map mode through a MapReduce framework.
Wherein, the first type field refers to a field which has data error and can not be read, or a character for indicating specific content (for example, the character for indicating specific content may include a character for indicating modification date, a separator, etc.); the second type field refers to that a conversion can be made, such as: and the character is subjected to half-angle full-angle conversion or English capital and small case conversion, and the converted character form is a specified form preset by a user or a form prestored in a server at a map stage.
In this embodiment, the obtaining the feature plaintext from the feature field according to the pre-configured feature class includes:
and sequentially reading fields in the feature classes. And according to the content of the fields in the characteristic class, sequentially reading the fields with the same content from the plaintext sample as the characteristic fields. And recording the characteristic fields read from the plaintext samples in a characteristic set.
Wherein the content of the field in the feature class is the same as the content of at least one field in the plaintext sample. Specifically, the server at the map stage obtains a new plaintext sample set, initializes the pre-configured feature classes to be extracted, and calls the feature classes one by one for feature extraction according to the configured features to be extracted. For example:
the plaintext samples are: "show clk A B C D";
the pre-configured feature classes include:
Feaclass=featureclass1;dpd=A;slot=1,
Feaclass=featureclass2;dpd=B;slot=2,
Feaclass=featureclass3;dpd=C;slot=3,
Feaclass=featureclass4;dpd=D;slot=4,
the server can initialize featureclas 1, featureclas 2, featureclas 3 and featureclas 4, and then sequentially extract features feaA and feaB to feaD according to the configuration sequence. The server extracts feature sets { feaA, feaB, feaC, feaD }, and plaintext samples show clk a B C D, and completes the splicing process according to the relationship between the special fields and the feature fields, where the relationship between the fields may include: { feaA show clk … }, and finally completing the splicing to obtain a feature sample: show clkfeaAfeaBfeaCfeaD.
In this embodiment, the outputting the spliced field as a feature sample includes:
and importing the feature sample and the feature set into a Reduce phase through a MapReduce framework. The recording the characteristic fields sequentially read from the plaintext sample in a characteristic set includes: outputting the same feature fields read from the plaintext samples to the same compute node.
For example: the embodiment may adopt a MapReduce framework of hadoop, and the server in the map phase executes S1-S4, and then outputs the execution result (the execution result includes the feature sample and the feature set) to the server in the reduce phase. Specifically, if the feature sample is the feature sample, the feature sample is directly output to the reduce without being processed; and if the feature set is the feature set, the same features are distributed into the same computing nodes by utilizing the bucket distribution principle of a MapReduce framework. The server in the reduce stage directly outputs the characteristic sample when receiving the characteristic sample; and accumulating the show clk value corresponding to the feature set and outputting the show clk value after receiving the feature set.
In this embodiment, the method further includes:
the basic feature class is read and updated by a reflection mechanism.
And taking the basic feature class which is updated last time as the pre-configured feature class.
An embodiment of the present invention further provides a processing apparatus for data characteristics, which may specifically operate in a server at a map stage if the processing apparatus is applied to a MapReduce framework, as shown in fig. 3a, and the processing apparatus includes:
the extraction unit is used for obtaining a plaintext sample from the service log, wherein the plaintext sample at least comprises a special field and a characteristic field, and the special field comprises a field used for representing an execution command and an operation command.
And the identification unit is used for acquiring a characteristic plaintext from the characteristic field according to a preset characteristic class and recording a sample signature, wherein the special fields with the same content correspond to the same sample signature.
And the splicing unit is used for extracting a special field corresponding to the sample signature, and splicing the acquired feature plaintext to the special field to obtain a spliced field.
And the output unit is used for outputting the spliced field as a characteristic sample.
In this embodiment, the identification unit is specifically configured to sequentially read fields in the feature class, where the content of a field in the feature class is the same as that of at least one field in the plaintext sample. And according to the content of the fields in the characteristic class, sequentially reading the fields with the same content from the plaintext sample as the characteristic fields. And recording the characteristic fields read from the plaintext samples in a characteristic set.
In this embodiment, the output unit is specifically configured to import the feature sample and the feature set into a Reduce phase through a MapReduce framework. And outputting the same characteristic field read from the plaintext sample to the same compute node.
Further, as shown in fig. 3b, the method further includes: and the preprocessing unit is used for reading a plaintext field in the service log. And culling a first type field from the plaintext fields. And/or converting characters of a second type field in the plaintext field into a specified form. And storing the field subjected to the elimination and/or conversion processing into a memory in a Map mode through a MapReduce framework.
Further, as shown in fig. 3c, the system further includes a feature class management unit, configured to read a basic feature class and update the basic feature class through a reflection mechanism. And using the basic feature class updated most recently as the pre-configured feature class.
According to the data feature processing device provided by the embodiment of the invention, according to the pre-configured feature class, the feature plaintext is obtained from the feature field of the plaintext sample, the sample signature is recorded, a special field corresponding to the sample signature is extracted, the feature plaintext is spliced with the special field, and the spliced field is output as the feature sample and used as the feature sample for data extraction. Compared with the prior art, the method and the device have the advantages that the required features are extracted from the mass data, the problem that large-scale and multi-dimensional data are difficult to extract in the prior art, and the problem that modeling needs to be updated frequently is solved, so that the cost of data extraction is reduced, and the accuracy of data extraction is improved.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like. The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A method for processing data features, comprising:
obtaining a plaintext sample from a service log, wherein the plaintext sample at least comprises a special field and a characteristic field, and the special field comprises a field for representing an execution command and an operation command;
according to a pre-configured feature class, obtaining a feature plaintext from the feature field, and recording a sample signature, wherein special fields with the same content correspond to the same sample signature;
extracting a special field corresponding to the sample signature, and splicing the obtained characteristic plaintext to the special field to obtain a spliced field;
outputting the spliced field as a feature sample;
further comprising:
the plaintext sample is an unencrypted character in the service log, and the obtained plaintext sample is expressed in a text form conforming to tab separation and comprises special fields for expressing 'presence showing' and 'clicking';
reading a pre-configured feature class by a server at a map stage, wherein the feature class comprises fields configured according to a sequence into the feature class, and the content of the fields in the feature class is the same as that of at least one field in the plaintext sample; the server at the map stage reads an input plaintext sample in a key-value mode according to a pre-configured feature class, and stores the plaintext sample in a memory in the map mode;
the server in the map stage strips special fields used for representing 'presence presentation' and 'click' in the plaintext sample; and sequentially extracting characteristic fields from the plaintext samples according to field contents recorded in a preset characteristic class.
2. The method according to claim 1, wherein the obtaining the feature plaintext from the feature field according to the pre-configured feature class comprises:
sequentially reading fields in the feature class, wherein the content of the fields in the feature class is the same as that of at least one field in the plaintext sample;
according to the content of the fields in the feature class, sequentially reading the fields with the same content from the plaintext sample as the feature fields;
recording the feature fields sequentially read from the plaintext samples in a feature set.
3. The method of claim 2, wherein outputting the concatenated field as a feature sample comprises:
importing the feature sample and the feature set into a Reduce stage through a MapReduce framework;
the recording the characteristic fields sequentially read from the plaintext sample in a characteristic set includes: outputting the same feature fields read from the plaintext samples to the same compute node.
4. The method of claim 1, further comprising:
reading a basic feature class and updating the basic feature class through a reflection mechanism;
and taking the basic feature class which is updated last time as the pre-configured feature class.
5. An apparatus for processing data features, comprising:
the system comprises an extraction unit, a processing unit and a control unit, wherein the extraction unit is used for acquiring a plaintext sample from a service log, the plaintext sample at least comprises a special field and a characteristic field, and the special field comprises a field for representing an execution command and an operation command;
the identification unit is used for acquiring a feature plaintext from the feature field according to a pre-configured feature class and recording a sample signature, wherein special fields with the same content correspond to the same sample signature;
the splicing unit is used for extracting a special field corresponding to the sample signature, and splicing the acquired feature plaintext to the special field to obtain a spliced field;
the output unit is used for outputting the spliced fields as characteristic samples;
further comprising:
the plaintext sample is an unencrypted character in the service log, and the obtained plaintext sample is expressed in a text form conforming to tab separation and comprises special fields for expressing 'presence showing' and 'clicking';
reading a pre-configured feature class by a server at a map stage, wherein the feature class comprises fields configured according to a sequence into the feature class, and the content of the fields in the feature class is the same as that of at least one field in the plaintext sample; the server at the map stage reads an input plaintext sample in a key-value mode according to a pre-configured feature class, and stores the plaintext sample in a memory in the map mode;
the server in the map stage strips special fields used for representing 'presence presentation' and 'click' in the plaintext sample; and sequentially extracting characteristic fields from the plaintext samples according to field contents recorded in a preset characteristic class.
6. The apparatus according to claim 5, wherein the identifying unit is specifically configured to sequentially read fields in the feature class, where the fields in the feature class have the same content as at least one field in the plaintext sample; reading fields with the same content from the plaintext sample in sequence as the characteristic fields according to the content of the fields in the characteristic class; and recording the characteristic fields read from the plaintext samples in a characteristic set.
7. The apparatus according to claim 6, wherein the output unit is configured to import the feature sample and the feature set into a Reduce phase, in particular via a MapReduce framework; and outputting the same characteristic field read from the plaintext sample to the same compute node.
8. The apparatus according to claim 5, further comprising a feature class management unit for reading a basic feature class and updating the basic feature class through a reflection mechanism; and using the basic feature class updated most recently as the pre-configured feature class.
CN201610066847.9A 2016-01-29 2016-01-29 Data feature processing method and device Active CN107025233B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610066847.9A CN107025233B (en) 2016-01-29 2016-01-29 Data feature processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610066847.9A CN107025233B (en) 2016-01-29 2016-01-29 Data feature processing method and device

Publications (2)

Publication Number Publication Date
CN107025233A CN107025233A (en) 2017-08-08
CN107025233B true CN107025233B (en) 2020-04-28

Family

ID=59524525

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610066847.9A Active CN107025233B (en) 2016-01-29 2016-01-29 Data feature processing method and device

Country Status (1)

Country Link
CN (1) CN107025233B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111224743B (en) * 2018-11-23 2022-11-15 中兴通讯股份有限公司 Detection method, terminal and computer readable storage medium
CN109934628B (en) * 2019-03-08 2021-03-19 智者四海(北京)技术有限公司 Feature processing method and device
CN111461253A (en) * 2020-04-17 2020-07-28 浙江百应科技有限公司 Automatic feature extraction system and method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079074A (en) * 2007-07-26 2007-11-28 杭州华三通信技术有限公司 Data storage and retrieving method and system
CN101483553A (en) * 2009-02-24 2009-07-15 中兴通讯股份有限公司 Audit apparatus and method for customer network behavior
CN103473306A (en) * 2013-09-10 2013-12-25 北京思特奇信息技术股份有限公司 Method and system for adopting structured query language (SQL) mark substitution method to achieve data self-extraction
CN104050269A (en) * 2014-06-23 2014-09-17 上海帝联信息科技股份有限公司 Log compression method and device and log decompression method and device
CN104717085A (en) * 2013-12-16 2015-06-17 中国移动通信集团湖南有限公司 Log parsing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079074A (en) * 2007-07-26 2007-11-28 杭州华三通信技术有限公司 Data storage and retrieving method and system
CN101483553A (en) * 2009-02-24 2009-07-15 中兴通讯股份有限公司 Audit apparatus and method for customer network behavior
CN103473306A (en) * 2013-09-10 2013-12-25 北京思特奇信息技术股份有限公司 Method and system for adopting structured query language (SQL) mark substitution method to achieve data self-extraction
CN104717085A (en) * 2013-12-16 2015-06-17 中国移动通信集团湖南有限公司 Log parsing method and device
CN104050269A (en) * 2014-06-23 2014-09-17 上海帝联信息科技股份有限公司 Log compression method and device and log decompression method and device

Also Published As

Publication number Publication date
CN107025233A (en) 2017-08-08

Similar Documents

Publication Publication Date Title
US11762926B2 (en) Recommending web API's and associated endpoints
US10606450B2 (en) Method and system for visual requirements and component reuse driven rapid application composition
CN110580147A (en) application program development method and device
US9418241B2 (en) Unified platform for big data processing
CN114424257A (en) Automatic rendering and extraction of form data using machine learning
KR102033416B1 (en) Method for generating data extracted from document and apparatus thereof
CN107025233B (en) Data feature processing method and device
CN109783138A (en) Method for splitting, device, terminal and the medium that application package is constituted
US20120185584A1 (en) Recording application consumption details
CN112925523B (en) Object comparison method, device, equipment and computer readable medium
US20190138965A1 (en) Method and system for providing end-to-end integrations using integrator extensible markup language
EP3605353B1 (en) Method and system for data transfer between databases
WO2016093839A1 (en) Structuring of semi-structured log messages
WO2021129812A1 (en) Method and system for running artificial intelligence application, and device
Körner et al. Mastering Azure Machine Learning: Perform large-scale end-to-end advanced machine learning in the cloud with Microsoft Azure Machine Learning
US10169316B2 (en) Method and system to convert document source data to XML via annotation
Settle et al. aMatReader: Importing adjacency matrices via Cytoscape Automation
AU2018313995B2 (en) Systems and methods for providing globalization features in a service management application interface
CN113821211A (en) Command analysis method and device, storage medium and computer equipment
US20150324333A1 (en) Systems and methods for automatically generating hyperlinks
US9471569B1 (en) Integrating information sources to create context-specific documents
US20220114189A1 (en) Extraction of structured information from unstructured documents
CN116700840B (en) File execution method, device, electronic equipment and readable storage medium
CN116955209B (en) WebAsssembly virtual machine testing method and device
CN111930607B (en) Method and system for generating change test case of combined Web service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200326

Address after: 210042 No. 1-1 Suning Avenue, Xuzhuang Software Park, Xuanwu District, Nanjing City, Jiangsu Province

Applicant after: Suning Cloud Computing Co.,Ltd.

Address before: 210042 Nanjing Province, Xuanwu District, Jiangsu Suning Avenue, Suning headquarters, No. 1

Applicant before: SUNING COMMERCE GROUP Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210602

Address after: 518001 unit 3510-131, Luohu business center, 2028 Shennan East Road, Chengdong community, Dongmen street, Luohu District, Shenzhen City, Guangdong Province

Patentee after: Shenzhen yunwangwandian e-commerce Co.,Ltd.

Address before: No.1-1 Suning Avenue, Xuzhuang Software Park, Xuanwu District, Nanjing, Jiangsu Province, 210042

Patentee before: Suning Cloud Computing Co.,Ltd.