CN109657247A - The customized grammer implementation method and device of machine learning - Google Patents

The customized grammer implementation method and device of machine learning Download PDF

Info

Publication number
CN109657247A
CN109657247A CN201811566818.4A CN201811566818A CN109657247A CN 109657247 A CN109657247 A CN 109657247A CN 201811566818 A CN201811566818 A CN 201811566818A CN 109657247 A CN109657247 A CN 109657247A
Authority
CN
China
Prior art keywords
grammer
executive plan
customized
machine learning
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811566818.4A
Other languages
Chinese (zh)
Other versions
CN109657247B (en
Inventor
郭庆
宋怀明
谢莹莹
蒋丹东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Dawning International Information Industry Co Ltd
Original Assignee
Zhongke Dawning International Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Dawning International Information Industry Co Ltd filed Critical Zhongke Dawning International Information Industry Co Ltd
Priority to CN201811566818.4A priority Critical patent/CN109657247B/en
Publication of CN109657247A publication Critical patent/CN109657247A/en
Application granted granted Critical
Publication of CN109657247B publication Critical patent/CN109657247B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The present invention provides the customized grammer implementation method and device of a kind of machine learning.The described method includes: carrying out morphological analysis and syntactic analysis to customized grammer, abstract syntax tree is converted to;Semantic analysis is carried out based on the abstract syntax tree, constructs the logic executive plan of grammer;Based on the logic executive plan, the distribution situation of reference data constructs distributed physics executive plan;Based on the distributed physics executive plan, correlation machine learning database is called by reflection mechanism, is calculated by distributed memory, carries out the training and test of model.The present invention can reduce the development cost of the threshold that uses of machine learning, reduction coding and user.

Description

The customized grammer implementation method and device of machine learning
Technical field
The present invention relates to the customized grammer implementation method of field of artificial intelligence more particularly to a kind of machine learning and Device.
Background technique
Machine learning is a branch of artificial intelligence, had developed into a multi-field cross discipline at nearly more than 30 years, relates to And the multiple subjects such as probability theory, statistics, Approximation Theory, convextiry analysis, computational complexity theory.Machine learning algorithm is a kind of from number It is automatically analyzed in and obtains the algorithm that regular and assimilated equations predict unknown data.Machine learning is widely used to Data mining, computer vision, natural language processing, living things feature recognition, search engine, medical diagnosis, detection credit card are taken advantage of The fields such as swindleness.
Common machine learning algorithm needs to learn specific programming language, specific compiler, and carries out complicated volume Code is realized, more demanding for the code capacity of researcher, needs that higher time cost study correlation computer is spent to know Know.
Summary of the invention
The customized grammer implementation method and device of machine learning provided by the invention, can reduce the use of machine learning Threshold reduces coding and the development cost of user.
In a first aspect, the present invention provides a kind of customized grammer implementation method of machine learning, comprising:
Morphological analysis and syntactic analysis are carried out to customized grammer, are converted to abstract syntax tree;
Semantic analysis is carried out based on the abstract syntax tree, constructs the logic executive plan of grammer;
Based on the logic executive plan, the distribution situation of reference data constructs distributed physics executive plan;
Based on the distributed physics executive plan, correlation machine learning database is called by reflection mechanism, passes through distribution Formula memory calculates, and carries out the training and test of model.
Optionally, described to carry out semantic analysis based on the abstract syntax tree, the logic executive plan for constructing grammer includes: Abstract syntax tree is analyzed, patrolling for grammer is constructed using Java Virtual Machine reflection function by customized reflection rule Collect executive plan.
Optionally, the morphological analysis are as follows: character string is converted into flag sequence.
Optionally, the syntactic analysis are as follows: according to given formal grammar to the input text being made of word sequence into Row is analyzed and determines syntactic structure.
Second aspect, the present invention provide a kind of customized grammer realization device of machine learning, comprising:
Converting unit is converted to abstract syntax tree for carrying out morphological analysis and syntactic analysis to customized grammer;
First construction unit, for carrying out semantic analysis based on the abstract syntax tree, the logic for constructing grammer executes meter It draws;
Second construction unit, for being based on the logic executive plan, the distribution situation of reference data is constructed distributed Physics executive plan;
Computing unit calls correlation machine by reflection mechanism for being based on the distributed physics executive plan Library is practised, is calculated by distributed memory, carries out the training and test of model.
Optionally, first construction unit is advised for analyzing abstract syntax tree by customized reflection Then, using Java Virtual Machine reflection function, the logic executive plan of grammer is constructed
Optionally, the morphological analysis are as follows: character string is converted into flag sequence.
Optionally, the syntactic analysis are as follows: according to given formal grammar to the input text being made of word sequence into Row is analyzed and determines syntactic structure.
The customized grammer implementation method and device of machine learning provided in an embodiment of the present invention, by customized a kind of new Grammer, cover machine learning algorithms most in use, user need to only input several sentences, can be achieved with most of machine learning algorithm Building, trained and interpretation of result, so as to reduce the threshold that uses of machine learning, the study of reduction coding and researcher And development cost.
Detailed description of the invention
Fig. 1 is the flow chart of the customized grammer implementation method of machine learning provided in an embodiment of the present invention;
Fig. 2 is the execution block diagram of the customized grammer implementation method of machine learning provided in an embodiment of the present invention;
Fig. 3 is the structural schematic diagram of the customized grammer realization device of machine learning provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only It is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill Personnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
The embodiment of the present invention provides a kind of customized grammer implementation method of machine learning, as shown in Figure 1, the method packet It includes:
S11, morphological analysis and syntactic analysis are carried out to customized grammer, is converted to abstract syntax tree.
S12, semantic analysis is carried out based on the abstract syntax tree, constructs the logic executive plan of grammer.
S13, it is based on the logic executive plan, the distribution situation of reference data constructs distributed physics executive plan.
S14, it is based on the distributed physics executive plan, correlation machine learning database is called by reflection mechanism, is passed through Distributed memory calculates, and carries out the training and test of model.
Wherein, the reflection mechanism can be Java reflection mechanism, but be not limited only to this.
The machine learning library can be spark correlation machine learning database, but be not limited only to this.
The customized grammer implementation method of machine learning provided in an embodiment of the present invention passes through the new language of customized one kind Method covers machine learning algorithms most in use, and user need to only input several sentences, can be achieved with the structure of most of machine learning algorithm Build, train and interpretation of result, use threshold so as to reduce machine learning, reduce coding and researcher study and Development cost.
The customized grammer implementation method of machine learning of the embodiment of the present invention is described in detail below.
As shown in Fig. 2, this programme is converted to abstract syntax by carrying out morphological analysis, syntactic analysis to customized grammer Tree.Semantic analysis is carried out based on abstract syntax tree, constructs the logic plan of grammer, the distribution situation of reference data, building distribution The physics executive plan of formula, and spark correlation machine learning database is called by Java principle of reflection, pass through distributed memory meter It calculates, carries out the training and test of model.
Wherein, the morphological analysis is that character string is converted to the process of label (token) sequence in computer science.
Certain given formal grammar is constituted to by word sequence (such as English word sequence) according to the syntactic analysis Input text carry out analyze and determine its syntactic structure a kind of process.
The abstract syntax tree is a kind of common tree structure of syntactic analysis, is usually used in storing the result of syntactic analysis.
The process for using of customized grammer is relatively simple, and user needs the specified data set for needing to operate first, then refers to Fixed corresponding machine learning algorithm, is trained data, initial data can also be divided into test set and training set, pass through instruction Practice data set training pattern, and passes through the effect of test set test model.
This programme is based on Antlr4 progress morphology and syntactic analysis, customized syntactic structure are as follows:
It is described that semantic analysis is carried out based on the abstract syntax tree, construct the logic executive plan of grammer specifically: to pumping As syntax tree is analyzed, by customized reflection rule, using JVM, (Java Virtual Machine, Java is virtual Machine) reflection function, construct the logic executive plan of grammer.Reflection rule is as follows, by can with the configuration file of flowering structure The node in customized syntax tree to be reflected into the power function in machine learning library (such as spark mlib):
-
func.name:PCA
func.path:"org.apache.spark.ml.feature.PCA"
func.args:
-
arg.spark.funcName:setInputCol
arg.ausname:inputCol
arg.nullable:false
arg.type:"java.lang.String"
-
arg.spark.funcName:setOutputCol
arg.ausname:outputCol
arg.nullable:false
arg.type:"java.lang.String"
-
arg.spark.funcName:setK
arg.nullable:false
arg.ausname:k
arg.type:int
This programme logic-based executive plan, the distribution situation of reference data construct distributed physics executive plan, protect Card count deposit in be effectively carried out.
Advantage of this programme compared to Scikit-learn is it is obvious that eliminate a large amount of coding, the distribution based on bottom Memory computing platform supports the training and test of mass data;Compared to spark mlib, customized grammer is more terse, is not required to It wants user to intervene distributed computing, reduces User Exploitation cost.
The embodiment of the present invention also provides a kind of customized grammer realization device of machine learning, as shown in figure 3, described device Include:
Converting unit 11 is converted to abstract syntax tree for carrying out morphological analysis and syntactic analysis to customized grammer;
First construction unit 12, for carrying out semantic analysis based on the abstract syntax tree, the logic for constructing grammer is executed Plan;
Second construction unit 13, for being based on the logic executive plan, the distribution situation of reference data, building distribution Physics executive plan;
Computing unit 14 calls correlation machine by reflection mechanism for being based on the distributed physics executive plan Learning database is calculated by distributed memory, carries out the training and test of model.
Optionally, first construction unit 12 is advised for analyzing abstract syntax tree by customized reflection Then, using Java Virtual Machine reflection function, the logic executive plan of grammer is constructed
Optionally, the morphological analysis are as follows: character string is converted into flag sequence.
Optionally, the syntactic analysis are as follows: according to given formal grammar to the input text being made of word sequence into Row is analyzed and determines syntactic structure.
The customized grammer realization device of machine learning provided in an embodiment of the present invention passes through the new language of customized one kind Method covers machine learning algorithms most in use, and user need to only input several sentences, can be achieved with the structure of most of machine learning algorithm Build, train and interpretation of result, use threshold so as to reduce machine learning, reduce coding and researcher study and Development cost.
The device of the present embodiment can be used for executing the technical solution of above method embodiment, realization principle and technology Effect is similar, and details are not described herein again.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above method embodiment, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by those familiar with the art, all answers It is included within the scope of the present invention.Therefore, protection scope of the present invention should be subject to the protection scope in claims.

Claims (8)

1. a kind of customized grammer implementation method of machine learning characterized by comprising
Morphological analysis and syntactic analysis are carried out to customized grammer, are converted to abstract syntax tree;
Semantic analysis is carried out based on the abstract syntax tree, constructs the logic executive plan of grammer;
Based on the logic executive plan, the distribution situation of reference data constructs distributed physics executive plan;
Based on the distributed physics executive plan, correlation machine learning database is called by reflection mechanism, by distribution Calculating is deposited, the training and test of model are carried out.
2. the method according to claim 1, wherein it is described based on the abstract syntax tree carry out semantic analysis, The logic executive plan for constructing grammer includes: to analyze abstract syntax tree, by customized reflection rule, uses Java Virtual machine reflection function constructs the logic executive plan of grammer.
3. method according to claim 1 or 2, which is characterized in that the morphological analysis are as follows: character string is converted to mark Remember sequence.
4. method according to claim 1 or 2, which is characterized in that the syntactic analysis are as follows: according to given formal grammar The input text being made of word sequence analyze and determines syntactic structure.
5. a kind of customized grammer realization device of machine learning characterized by comprising
Converting unit is converted to abstract syntax tree for carrying out morphological analysis and syntactic analysis to customized grammer;
First construction unit constructs the logic executive plan of grammer for carrying out semantic analysis based on the abstract syntax tree;
Second construction unit, for being based on the logic executive plan, the distribution situation of reference data constructs distributed physics Executive plan;
Computing unit calls correlation machine learning database by reflection mechanism for being based on the distributed physics executive plan, It is calculated by distributed memory, carries out the training and test of model.
6. device according to claim 5, which is characterized in that first construction unit, for abstract syntax tree into Row analysis, constructs the logic executive plan of grammer using Java Virtual Machine reflection function by customized reflection rule.
7. device according to claim 5 or 6, which is characterized in that the morphological analysis are as follows: character string is converted to mark Remember sequence.
8. device according to claim 5 or 6, which is characterized in that the syntactic analysis are as follows: according to given formal grammar The input text being made of word sequence analyze and determines syntactic structure.
CN201811566818.4A 2018-12-19 2018-12-19 Method and device for realizing self-defined grammar of machine learning Active CN109657247B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811566818.4A CN109657247B (en) 2018-12-19 2018-12-19 Method and device for realizing self-defined grammar of machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811566818.4A CN109657247B (en) 2018-12-19 2018-12-19 Method and device for realizing self-defined grammar of machine learning

Publications (2)

Publication Number Publication Date
CN109657247A true CN109657247A (en) 2019-04-19
CN109657247B CN109657247B (en) 2023-05-23

Family

ID=66115308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811566818.4A Active CN109657247B (en) 2018-12-19 2018-12-19 Method and device for realizing self-defined grammar of machine learning

Country Status (1)

Country Link
CN (1) CN109657247B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112001500A (en) * 2020-08-13 2020-11-27 星环信息科技(上海)有限公司 Model training method, device and storage medium based on longitudinal federated learning system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090144229A1 (en) * 2007-11-30 2009-06-04 Microsoft Corporation Static query optimization for linq
US20090328012A1 (en) * 2008-06-27 2009-12-31 Microsoft Corporation Compiler in a managed application context
US20100241828A1 (en) * 2009-03-18 2010-09-23 Microsoft Corporation General Distributed Reduction For Data Parallel Computing
US20120226639A1 (en) * 2011-03-01 2012-09-06 International Business Machines Corporation Systems and Methods for Processing Machine Learning Algorithms in a MapReduce Environment
US20150378696A1 (en) * 2014-06-27 2015-12-31 International Business Machines Corporation Hybrid parallelization strategies for machine learning programs on top of mapreduce
US20170177312A1 (en) * 2015-12-18 2017-06-22 International Business Machines Corporation Dynamic recompilation techniques for machine learning programs
CN106970819A (en) * 2017-03-28 2017-07-21 清华大学 A kind of c program code specification check device based on the regular description languages of PRDL

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090144229A1 (en) * 2007-11-30 2009-06-04 Microsoft Corporation Static query optimization for linq
US20090328012A1 (en) * 2008-06-27 2009-12-31 Microsoft Corporation Compiler in a managed application context
US20100241828A1 (en) * 2009-03-18 2010-09-23 Microsoft Corporation General Distributed Reduction For Data Parallel Computing
US20120226639A1 (en) * 2011-03-01 2012-09-06 International Business Machines Corporation Systems and Methods for Processing Machine Learning Algorithms in a MapReduce Environment
US20150378696A1 (en) * 2014-06-27 2015-12-31 International Business Machines Corporation Hybrid parallelization strategies for machine learning programs on top of mapreduce
US20170177312A1 (en) * 2015-12-18 2017-06-22 International Business Machines Corporation Dynamic recompilation techniques for machine learning programs
CN106970819A (en) * 2017-03-28 2017-07-21 清华大学 A kind of c program code specification check device based on the regular description languages of PRDL

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
K R NEERAJ ET AL: "A domain specific language for business transaction processing", 《 2017 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, INFORMATICS, COMMUNICATION AND ENERGY SYSTEMS》 *
MIKE INNES ET AL: "On Machine Learning and Programming Languages", 《SYSML》 *
ZOLTAN A. KOCSIS ET AL: "Automatic Improvement of Apache Spark Queries using Semantics-preserving Program Reduction", 《 PROCEEDINGS OF THE 2016 ON GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION》 *
梁国蓉: "一个基于Dataflow的大数据Query Engine系统的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112001500A (en) * 2020-08-13 2020-11-27 星环信息科技(上海)有限公司 Model training method, device and storage medium based on longitudinal federated learning system

Also Published As

Publication number Publication date
CN109657247B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
Ren et al. Lego: Latent execution-guided reasoning for multi-hop question answering on knowledge graphs
Ma et al. Prompt for extraction? PAIE: Prompting argument interaction for event argument extraction
CN109033063B (en) Machine inference method based on knowledge graph, electronic device and computer readable storage medium
CN109783618A (en) Pharmaceutical entities Relation extraction method and system based on attention mechanism neural network
CN109920540A (en) Construction method, device and the computer equipment of assisting in diagnosis and treatment decision system
CN116049831A (en) Software vulnerability detection method based on static analysis and dynamic analysis
CN111797241B (en) Event Argument Extraction Method and Device Based on Reinforcement Learning
EP3968245A1 (en) Automatically generating a pipeline of a new machine learning project from pipelines of existing machine learning projects stored in a corpus
EP3968244A1 (en) Automatically curating existing machine learning projects into a corpus adaptable for use in new machine learning projects
Levy et al. Learning to align the source code to the compiled object code
CN108595165A (en) A kind of code completion method, apparatus and storage medium based on code intermediate representation
CN115146279A (en) Program vulnerability detection method, terminal device and storage medium
CN110428907A (en) A kind of text mining method and system based on unstructured electronic health record
CN116580849A (en) Medical data acquisition and analysis system and method thereof
Jha et al. Does data augmentation improve generalization in NLP?
CN116956896A (en) Text analysis method, system, electronic equipment and medium based on artificial intelligence
CN109657247A (en) The customized grammer implementation method and device of machine learning
EP3965024A1 (en) Automatically labeling functional blocks in pipelines of existing machine learning projects in a corpus adaptable for use in new machine learning projects
Patrick et al. An active learning process for extraction and standardisation of medical measurements by a trainable FSA
CN115130545A (en) Data processing method, electronic device, program product, and medium
CN113761875A (en) Event extraction method and device, electronic equipment and storage medium
KR20210051252A (en) Apparatus and method for providing translation for a word with multiple meanings
Araujo A parallel evolutionary algorithm for stochastic natural language parsing
Cottone et al. Gl-learning: an optimized framework for grammatical inference
Mimi et al. Text Prediction Zero Probability Problem Handling with N-gram Model and Laplace Smoothing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant