CN105183914A - Data characteristic formatting method and device - Google Patents

Data characteristic formatting method and device Download PDF

Info

Publication number
CN105183914A
CN105183914A CN201510660660.7A CN201510660660A CN105183914A CN 105183914 A CN105183914 A CN 105183914A CN 201510660660 A CN201510660660 A CN 201510660660A CN 105183914 A CN105183914 A CN 105183914A
Authority
CN
China
Prior art keywords
attribute
feature
format
characteristic
configuration file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510660660.7A
Other languages
Chinese (zh)
Inventor
章岑
杨田
雷龙艳
周盛
潘柏宇
王冀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
1Verge Internet Technology Beijing Co Ltd
Original Assignee
1Verge Internet Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 1Verge Internet Technology Beijing Co Ltd filed Critical 1Verge Internet Technology Beijing Co Ltd
Priority to CN201510660660.7A priority Critical patent/CN105183914A/en
Publication of CN105183914A publication Critical patent/CN105183914A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data mining, and particularly discloses a data characteristic formatting method and device. The method comprises the following steps: acquiring a first configuration file, determining attributes that need to be processed for the formatting and the formatting sequence of each attribute according to switch setting in the first configuration file; acquiring a second configuration file, determining the characteristic sequence and the characteristic value meaning of characteristics to be formatted in the attributes according to the characteristic configuration of the attributes in the second configuration file; determining the characteristic sequence number of each characteristic according to the formatting sequence of each attribute and the characteristic sequence of the characteristics to be formatted in the attributes; determining the characteristic value of the corresponding characteristic according to the attribute value and the characteristic value meaning of practical samples; formatting the practical samples into characteristic vectors according to the characteristic sequence numbers and the characteristic values. According to the data characteristic formatting method and device, provided by the technical scheme of the invention, each characteristic does not need to be set with a definitive sequence to assign the characteristic sequence number, the processed attributes/characteristics can be added and deleted at any time, and the characteristic formatting efficiency can be greatly improved.

Description

Data characteristics formatting method and device
Technical field
The present invention relates to data mining technology field, particularly a kind of data characteristics formatting method and device.
Background technology
Under the large data environment of network, the main task of data mining work is exactly from magnanimity information, find that the common trait of data is to carry out data statistics and analysis.It is obviously worthless for relying on the data mining manually carrying out large data, and relies on the data mining that machine carries out on discrimination, have natural defect; Therefore mainly through improving the discrimination of automatic mining based on the machine learning of model training in prior art.In the process of associated machine study, often need from raw data, extract some features to represent a sample, then the characteristic set of each sample is expressed as the form that algorithm can identify, so that algorithm can read these sample characteristics to carry out model training.
At present, existing machine learning algorithm storehouse, as libsvm, xgboost, sparkmllib etc., all formats training data based on common recognition form.In common recognition form, first to whole feature-set sequence number, carry out each feature of digitized representations and record sample subsequently in " feature sequence number: eigenwert " mode.For saving space, usually only need store the feature that eigenwert is not 0, but correspondingly, the sequence number of each feature and implication must be fixed, can determine the real meaning of feature by sequence number.
But, in Practical Project, because feature space dimension is very large, (hundreds of is thousands of, even trillion dimensional features are also very common), the difficulty being feature-set one both definite sequence of each sample before format is very large, and also likely newly-increased feature or delete feature at any time in real data processing procedure, so adopt the common recognition form of prior art determination feature to need the time and efforts of at substantial, how carrying out characteristic format is efficiently a more difficult problem.
Summary of the invention
Based on the defect of prior art, the object of this invention is to provide a kind of data characteristics formatting method and device, to carry out the characteristic format of data efficiently.
According to an aspect of the present invention, provide a kind of data characteristics formatting method, comprise step:
Obtain the first configuration file, determine that this format needs the format order of attribute to be processed and each attribute according to the switch-linear hybrid in described first configuration file;
Obtain the second configuration file, in the feature configuration determination attribute according to attribute in described second configuration file, treat characteristic sequence and the eigenwert implication of stylized facts;
Format according to each attribute described sequentially and in described attribute treats that the characteristic sequence of stylized facts determines the feature sequence number of each feature, according to the property value of actual sample and the eigenwert of described eigenwert implication determination character pair;
Actual sample described in each is formatted as proper vector according to described feature sequence number and described eigenwert.
Preferably, described switch-linear hybrid comprises: attribute switch labels or attribute record situation; Described format order according to described actual sample raw data natural quality order or freely specify according to the needs of model training.
Preferably, described feature configuration comprises: the format mode of discretize switch and described attribute.
Preferably, described discretize switch and described attribute format mode perceived model training algorithm model demand and freely arrange.
Preferably, in described proper vector a selected characteristic value be not 0 feature store.
According to another aspect of the present invention, additionally provide a kind of data characteristics formatting mechanism, comprising:
According to the switch-linear hybrid in described first configuration file, first configuration module, for obtaining the first configuration file, determines that this format needs the format order of attribute to be processed and each attribute;
Second configuration module, for obtaining the second configuration file, treats characteristic sequence and the eigenwert implication of stylized facts in the feature configuration determination attribute according to attribute in described second configuration file;
Feature processing block, sequentially and in described attribute treat that the characteristic sequence of stylized facts determines the feature sequence number of each feature, according to the property value of actual sample and the eigenwert of described eigenwert implication determination character pair for the format according to each attribute described;
Formatting module, for being formatted as proper vector by actual sample described in each according to described feature sequence number and described eigenwert.
Preferably, described first configuration module comprises:
Attribute switch module, for determining that according to attribute switch labels or attribute record situation this format needs attribute to be processed;
Attribute sequent modular, sequentially or according to the order that the needs of model training are freely specified determines the format of each attribute sequentially for the natural quality according to described actual sample raw data.
Preferably, described second configuration module comprises:
Discretize switch module, needs to carry out discretize for determining whether according to discretize switch;
Format configuration module, for configuring the format mode of described attribute.
Preferably, described discretize switch module and described format configuration module perceived model training algorithm model demand and freely arrange.
Preferably, described formatting module comprises: Vector Processing module, and the feature not being 0 for only selected characteristic value generates described proper vector and stores.
Embodiments provide a kind of data characteristics formatting method and device, its technical scheme can the free setting attribute that need process and character representation form thereof by two stage arrangement, thus the characteristic format that performs as required and model training can be realized, because the technical scheme of the embodiment of the present invention is without the need to fixing feature sequence number for each feature arranges set order in advance, process attribute/feature can also carry out additions and deletions at any time, thus can significantly lifting feature format efficiency.
Accompanying drawing explanation
Fig. 1 is the basic procedure schematic diagram of data characteristics formatting method in one embodiment of the invention;
Fig. 2 is the modular structure schematic diagram of data characteristics formatting mechanism in one embodiment of the invention.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with embodiment also with reference to accompanying drawing, the present invention is described in more detail.Should be appreciated that, these describe just exemplary, and do not really want to limit the scope of the invention.In addition, in the following description, the description to known features and technology is eliminated, to avoid unnecessarily obscuring concept of the present invention.
In model training, a data characteristics format requisite step often, only has the data through characteristic format could be identified fast and efficiently, sort out and analyze at model training.The characteristic formatization of prior art is mainly carried out based on common recognition form, as its name suggests, the prerequisite of common recognition form is used to be need just all features to reach common understanding, namely use before need to identify whole feature and for each feature arrangement sequence number, this brings great pressure to characteristic format virtually, has had a strong impact on the efficiency of data characteristics format.
In embodiments of the present invention, by using aspect configuration file to carry out aid identification feature, determining the application mode of feature and attribute thereof, thus feature can be selected neatly to carry out relatively formaing freely, improve the efficiency of data characteristics format.As shown in Figure 1, in embodiments of the present invention, data characteristics formatting method comprises step:
S1, obtains the first configuration file, determines that this format needs the format order of attribute to be processed and each attribute according to the switch-linear hybrid in described first configuration file;
S2, obtains the second configuration file, treats characteristic sequence and the eigenwert implication of stylized facts in the feature configuration determination attribute according to attribute in described second configuration file;
S3, the format according to each attribute described sequentially and in described attribute treats that the characteristic sequence of stylized facts determines the feature sequence number of each feature, according to the property value of actual sample and the eigenwert of described eigenwert implication determination character pair;
S4, is formatted as proper vector by actual sample described in each according to described feature sequence number and described eigenwert.
Particularly, in embodiments of the present invention, need multiple actual sample to be formatted as multiple proper vector respectively.The raw data of each actual sample adopts the multiple attribute representations with specific object value, and the such as raw data of sample " user A " is " sex: man's age: 24 client types: PC holds "; Each proper vector is then comprise the digitized representations that multiple form is " feature sequence number: eigenwert ", and the proper vector after such as sample " user A " format may be " 2:1.06:1.013:1.0 ".The format completing a sample needs to realize the conversion of raw data to digitized representations, and the format having unified whole sample then needs to determine unified conversion regime.
First, in step S1, the first configuration file is preferably feature switchgear distribution file, which provides in sample the switch needing attribute to be processed.It can be the switch of whole attribute in this first configuration file, such as in initialization procedure, need the switch labels of attribute to be processed to be set to open mode (such as putting 1) this format, this switch labels formatd without the need to the attribute of process is set to closed condition (such as setting to 0); Also only can record this and format attribute to be processed, Unrecorded, be considered as without the need to process.Also being provided with the format order of each attribute in first configuration file simultaneously, according to this order, feature permutation being become proper vector when formaing; Format order according to the natural quality order of sample raw data, also freely can be specified according to the needs of model training.
In step S2, the second configuration file is preferably in the aspect configuration file of each attribute.First the feature designating this attribute in aspect configuration file, the need of discretize (such as discretize switch being put 1), designates the format mode of this attribute: the internal sequence of characteristic dimension, property value character pair implication and character pair further when needing discretize.Such as, for " sex " attribute, first the feature designating this attribute in its aspect configuration file needs discretize, when next designates discretize, the feature of this attribute accounts for 3 dimensions, wherein 0 represents that women, 1 represents that the male sex, 2 represents unknown, according to sample actual attribute, corresponding dimension is put 1 during generating feature vector.If without the need to discretize (such as being set to 0 by discretize switch), then the format mode of this attribute is: feature only has 1 dimension (characteristic sequence be 0 or override), and property value is actual characteristic value; Such as, when " age ", attribute was without the need to discretize, the eigenwert at " age: 24 " is " 24 "; If desired discretize, its discretize of further hypothesis accounts for 8 dimensions, wherein, 0 for cannot segmentation, 1 be under-18s, 2 for 18-24 year, 3 for 25-29 year, 4 for 30-34 year, 5 for 35-39 year, 6 for 40-49 year, 7 be more than 50 years old, then the eigenwert at " age: 24 " is and dimension 2 (i.e. the 3rd dimension) is put 1.
In step S3, according to characteristic sequence assigned characteristics sequence number successively in the format of each attribute order and each attribute, the eigenwert of property value with specific features sequence number is associated with the corresponding relation of eigenwert implication according to property value simultaneously.Such as, suppose that attribute formatization order is for " sex " → " age " → " client type ", three equal discretizes of attribute, the discretize mode at " sex " and " age " as described above, " client type " discretize accounts for 3 dimensions, and 0 is mobile App end, 1 is that PC end, 2 is for unknown; Then in proper vector, 1-3 dimension is the feature of " sex " attribute, corresponding assigned characteristics sequence number 1-3,4-11 dimension is the feature of " age " attribute, corresponding assigned characteristics sequence number 4-11,12-14 dimension is the feature of " client type " attribute, corresponding assigned characteristics sequence number 12-14; The eigenwert of individual features sequence number represents when putting 1 that actual property value conforms to this feature sequence number/dimension.
In step S4, in the manner described above each sample format is turned to proper vector.Particularly, such as above-mentioned " user A " sample, by the property value of " sex: man ", the eigenwert of the 2nd dimension (i.e. feature sequence number 2) is put 1, by the property value at " age: 24 ", the eigenwert of the 6th dimension (i.e. feature sequence number 6) is put 1, by the property value of " client type: PC holds ", the eigenwert of the 13rd dimension (i.e. feature sequence number 13) is put 1; Selected characteristic value be not 0 feature store, then the proper vector after above-mentioned " user A " sample format is expressed as " 2:1.06:1.013:1.0 ".
In embodiments of the present invention, the character representation form needing attribute to be processed and attribute can be formatd by free setting, thus statistical study can be carried out according to the demand unrestricted choice special characteristic of model training.The more important thing is, in the embodiment of the present invention, without the need in advance for each feature arranges set order and fixes feature sequence number, the attribute/feature of process can also carry out additions and deletions at any time, thus can the efficiency of significantly lifting feature format.
Particularly, may use a variety of attributive character in a lot of Machine Learning Problems such as clicking rate prediction model, some attribute is natural has discrete nature, such as " sex " attribute; Some attribute then possesses Continuous property, such as the attribute such as " age " or " video duration ".Continuous feature is formatd and needs the selection depending on algorithm model and make different changes, describe the format mode different to continuous feature respectively for " video duration " attribute here: the first, need discretize; Such as ad material duration is not generally at 5 seconds to 1 minute etc., can with 5 seconds, to carry out segmentation discrete for a bit of by duration, arrange in the second configuration file (i.e. the aspect configuration file of this attribute) feature discrete time dimension, feature implication and internal sequence: wherein, 0 be 0-4 second, 1 be 5-9 second, 2 be 10-14 second, 3 be 15-19 second ..., 11 be 55-59 second, 12 be more than 1 minute, the feature of this attribute final occupies 13 dimensions in characteristic vector space, and each sample only has the eigenwert of a dimension to be 1 in these 13 dimensions.The second, does not need discretize; The direct eigenwert as a characteristic dimension adds in proper vector by material duration in this case, can write material ID and the corresponding table of length, search this table obtain concrete material duration characteristics value when doing feature extraction in configuration file.
Discretize can be beneficial to carries out statistic of classification when model training, but not the continuous feature of discretize then can accurate analysis sample, can reduce the dimension of proper vector simultaneously.Particularly, if use linear model such as Logic Regression Models, be then necessary to carry out discretize to continuous feature; If use nonlinear such as tree-model, then discretize can not be carried out.Whether carry out discretize further by free setting in configuration file in the embodiment of the present invention and how to carry out discretize, can carry out formaing and model training for different algorithm requirements, also significantly improve degree of freedom and the applicability of characteristic format.
As shown in Figure 2, the embodiment of the present invention also provides a kind of data characteristics formatting mechanism 1 simultaneously, comprising:
According to the switch-linear hybrid in described first configuration file, first configuration module 101, for obtaining the first configuration file, determines that this format needs the format order of attribute to be processed and each attribute;
Second configuration module 102, for obtaining the second configuration file, treats characteristic sequence and the eigenwert implication of stylized facts in the feature configuration determination attribute according to attribute in described second configuration file;
Feature processing block 103, sequentially and in described attribute treat that the characteristic sequence of stylized facts determines the feature sequence number of each feature, according to the property value of actual sample and the eigenwert of described eigenwert implication determination character pair for the format according to each attribute described;
Formatting module 104, for being formatted as proper vector by actual sample described in each according to described feature sequence number and described eigenwert.
Relevant technical staff in the field be appreciated that with said method correspondingly, also there is each functional module corresponding with various method steps in the device of the embodiment of the present invention, this is no longer going to repeat them simultaneously.In actual applications, above-mentioned data characteristics formatting mechanism can be independently computing equipment, also can be the separate functional unit loaded by computing equipment, can also be computing equipment directly realize virtual/solid element.Equally, each module in device all can by being arranged in the central processor CPU of computing equipment, microprocessor MPU, the realization such as digital signal processor DSP or on-site programmable gate array FPGA, and the realization rate of said apparatus and module should not be considered as the restriction to the specific embodiment of the invention.
Embodiments provide a kind of data characteristics formatting method and device, its technical scheme can the free setting attribute that need process and character representation form thereof by two stage arrangement, thus the characteristic format that performs as required and model training can be realized, because the technical scheme of the embodiment of the present invention is without the need to fixing feature sequence number for each feature arranges set order in advance, process attribute/feature can also carry out additions and deletions at any time, thus can significantly lifting feature format efficiency.
Should be understood that, above-mentioned embodiment of the present invention only for exemplary illustration or explain principle of the present invention, and is not construed as limiting the invention.Therefore, any amendment made when without departing from the spirit and scope of the present invention, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.In addition, claims of the present invention be intended to contain fall into claims scope and border or this scope and border equivalents in whole change and modification.

Claims (10)

1. a data characteristics formatting method, is characterized in that, described method comprises step:
Obtain the first configuration file, determine that this format needs the format order of attribute to be processed and each attribute according to the switch-linear hybrid in described first configuration file;
Obtain the second configuration file, in the feature configuration determination attribute according to attribute in described second configuration file, treat characteristic sequence and the eigenwert implication of stylized facts;
Format according to each attribute described sequentially and in described attribute treats that the characteristic sequence of stylized facts determines the feature sequence number of each feature, according to the property value of actual sample and the eigenwert of described eigenwert implication determination character pair;
Actual sample described in each is formatted as proper vector according to described feature sequence number and described eigenwert.
2. method according to claim 1, is characterized in that, described switch-linear hybrid comprises: attribute switch labels or attribute record situation;
Described format order according to described actual sample raw data natural quality order or freely specify according to the needs of model training.
3. method according to claim 1, is characterized in that, described feature configuration comprises: the format mode of discretize switch and described attribute.
4. method according to claim 3, is characterized in that, the demand of the algorithm model of the format mode perceived model of described discretize switch and described attribute training and freely arranging.
5. method according to claim 1, is characterized in that, in described proper vector a selected characteristic value be not 0 feature store.
6. a data characteristics formatting mechanism, is characterized in that, described device comprises:
According to the switch-linear hybrid in described first configuration file, first configuration module, for obtaining the first configuration file, determines that this format needs the format order of attribute to be processed and each attribute;
Second configuration module, for obtaining the second configuration file, treats characteristic sequence and the eigenwert implication of stylized facts in the feature configuration determination attribute according to attribute in described second configuration file;
Feature processing block, sequentially and in described attribute treat that the characteristic sequence of stylized facts determines the feature sequence number of each feature, according to the property value of actual sample and the eigenwert of described eigenwert implication determination character pair for the format according to each attribute described;
Formatting module, for being formatted as proper vector by actual sample described in each according to described feature sequence number and described eigenwert.
7. device according to claim 6, is characterized in that, described first configuration module comprises:
Attribute switch module, for determining that according to attribute switch labels or attribute record situation this format needs attribute to be processed;
Attribute sequent modular, sequentially or according to the order that the needs of model training are freely specified determines the format of each attribute sequentially for the natural quality according to described actual sample raw data.
8. device according to claim 6, is characterized in that, described second configuration module comprises:
Discretize switch module, needs to carry out discretize for determining whether according to discretize switch;
Format configuration module, for configuring the format mode of described attribute.
9. device according to claim 8, is characterized in that, described discretize switch module and described format configuration module perceived model training algorithm model demand and freely arrange.
10. device according to claim 6, is characterized in that, described formatting module comprises:
Vector Processing module, the feature not being 0 for only selected characteristic value generates described proper vector and stores.
CN201510660660.7A 2015-10-14 2015-10-14 Data characteristic formatting method and device Pending CN105183914A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510660660.7A CN105183914A (en) 2015-10-14 2015-10-14 Data characteristic formatting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510660660.7A CN105183914A (en) 2015-10-14 2015-10-14 Data characteristic formatting method and device

Publications (1)

Publication Number Publication Date
CN105183914A true CN105183914A (en) 2015-12-23

Family

ID=54905995

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510660660.7A Pending CN105183914A (en) 2015-10-14 2015-10-14 Data characteristic formatting method and device

Country Status (1)

Country Link
CN (1) CN105183914A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110262329A (en) * 2019-06-11 2019-09-20 华强方特文化科技集团股份有限公司 Manned device data acquisition system and data format storage method
CN110995815A (en) * 2019-11-27 2020-04-10 大连民族大学 Information transmission method based on Gaia big data analysis system
CN113610239A (en) * 2016-09-27 2021-11-05 第四范式(北京)技术有限公司 Feature processing method and feature processing system for machine learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059151A1 (en) * 2006-09-01 2008-03-06 Microsoft Corporation Identifying language of origin for words using estimates of normalized appearance frequency
CN101655914A (en) * 2008-08-18 2010-02-24 索尼(中国)有限公司 Training device, training method and detection method
CN102629904A (en) * 2012-02-24 2012-08-08 安徽博约信息科技有限责任公司 Detection and determination method of network navy
CN103942191A (en) * 2014-04-25 2014-07-23 中国科学院自动化研究所 Horrific text recognizing method based on content
CN104239539A (en) * 2013-09-22 2014-12-24 中科嘉速(北京)并行软件有限公司 Microblog information filtering method based on multi-information fusion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059151A1 (en) * 2006-09-01 2008-03-06 Microsoft Corporation Identifying language of origin for words using estimates of normalized appearance frequency
CN101655914A (en) * 2008-08-18 2010-02-24 索尼(中国)有限公司 Training device, training method and detection method
CN102629904A (en) * 2012-02-24 2012-08-08 安徽博约信息科技有限责任公司 Detection and determination method of network navy
CN104239539A (en) * 2013-09-22 2014-12-24 中科嘉速(北京)并行软件有限公司 Microblog information filtering method based on multi-information fusion
CN103942191A (en) * 2014-04-25 2014-07-23 中国科学院自动化研究所 Horrific text recognizing method based on content

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄祥林: "《图像检索原理与实践》", 30 June 2014, 中国传媒大学出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610239A (en) * 2016-09-27 2021-11-05 第四范式(北京)技术有限公司 Feature processing method and feature processing system for machine learning
CN113610239B (en) * 2016-09-27 2024-04-12 第四范式(北京)技术有限公司 Feature processing method and feature processing system for machine learning
CN110262329A (en) * 2019-06-11 2019-09-20 华强方特文化科技集团股份有限公司 Manned device data acquisition system and data format storage method
CN110995815A (en) * 2019-11-27 2020-04-10 大连民族大学 Information transmission method based on Gaia big data analysis system
CN110995815B (en) * 2019-11-27 2022-08-05 大连民族大学 Information transmission method based on Gaia big data analysis system

Similar Documents

Publication Publication Date Title
CN110532369A (en) A kind of generation method of question and answer pair, device and server
CN109857803B (en) Data synchronization method, device, equipment, system and computer readable storage medium
CN111523324B (en) Named entity recognition model training method and device
CN107392655A (en) Reward voucher method for pushing, system, storage medium, electronic equipment and shunt method
CN107343223A (en) The recognition methods of video segment and device
CN109783624A (en) Answer generation method, device and the intelligent conversational system in knowledge based library
CN110929520B (en) Unnamed entity object extraction method and device, electronic equipment and storage medium
CN105975466A (en) Method and device for machine manuscript writing aiming at short newsflashes
CN107516516B (en) Instrument intelligent control method and system based on interactive voice
CN109446689A (en) DC converter station electrical secondary system drawing recognition methods and system
CN110275963A (en) Method and apparatus for output information
CN113094512B (en) Fault analysis system and method in industrial production and manufacturing
CN109933671A (en) Construct method, apparatus, computer equipment and the storage medium of personal knowledge map
CN111611239A (en) Method, device, equipment and storage medium for realizing automatic machine learning
CN111444677A (en) Reading model optimization method, device, equipment and medium based on big data
CN105183914A (en) Data characteristic formatting method and device
CN113190694A (en) Knowledge management platform of knowledge graph
CN115757124A (en) Test case generation method based on neural network
CN113312924A (en) Risk rule classification method and device based on NLP high-precision analysis label
CN107491484A (en) A kind of data matching method, device and equipment
CN109785818A (en) A kind of music music method and system based on deep learning
CN117609278A (en) Multi-mode power data management method and system based on deep measurement learning
CN114842982B (en) Knowledge expression method, device and system for medical information system
CN111221967A (en) Language data classification storage system based on block chain architecture
CN105468658B (en) Data cleaning method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20151223

WD01 Invention patent application deemed withdrawn after publication