CN108491887A - A kind of commodity tax incorporates the acquisition methods of code into own forces - Google Patents

A kind of commodity tax incorporates the acquisition methods of code into own forces Download PDF

Info

Publication number
CN108491887A
CN108491887A CN201810273206.XA CN201810273206A CN108491887A CN 108491887 A CN108491887 A CN 108491887A CN 201810273206 A CN201810273206 A CN 201810273206A CN 108491887 A CN108491887 A CN 108491887A
Authority
CN
China
Prior art keywords
data
acquisition methods
commodity
forces
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810273206.XA
Other languages
Chinese (zh)
Inventor
李海波
陆军
李正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ANHUI AISINO Co Ltd
Original Assignee
ANHUI AISINO Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ANHUI AISINO Co Ltd filed Critical ANHUI AISINO Co Ltd
Priority to CN201810273206.XA priority Critical patent/CN108491887A/en
Publication of CN108491887A publication Critical patent/CN108491887A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses the acquisition methods that a kind of commodity tax incorporates code into own forces, the acquisition methods include the following steps:Invoice data, commodity tax in the account of electronics bottom are received into coded data by synchronization program and are synchronized to invoice data platform, synchronizes newly-increased data daily;Data in electronic ledger are acquired, denoising, conversion, loading procedure processing;The online incremental learning of model is carried out based on spark engines.Classifying quality of the present invention is preferable, and error rate is relatively low, can ensure that the accurate performance of tax revenue coding reduces the number of mistake classification, improve nicety of grading, and be suitable for inseparable noise situations.

Description

A kind of commodity tax incorporates the acquisition methods of code into own forces
Technical field
The invention belongs to the acquisition methods that code is incorporated in tax revenue field more particularly to a kind of commodity tax into own forces.
Background technology
Since the special ticket of value-added tax and general ticket data are in the account system of electronics bottom, need data being synchronized to invoice data and put down Platform carries out load storage.Current existing product mainly uses bayesian algorithm, and the principle of classification of Bayes classifier is profit With the prior probability of each classification, Bayesian formula and independence assumption is recycled to calculate the class probability and object of attribute Posterior probability, i.e. the object belongs to certain a kind of probability, select the class with maximum a posteriori probability as belonging to the object Classification.Prior probability of the commodity for tax revenue sorting code number is namely found out according to historical data, then calculates current commodity For the posterior probability of each coding, judge that the affiliated tax revenue of the commodity encodes according to probability size.
But traditional algorithm has the disadvantages that:
1) theoretically, model-naive Bayesian has minimum error rate compared with other sorting techniques.But actually Not such was the case with, this is because in the case of the given output class of model-naive Bayesian is other, it is assumed that between attribute independently of each other, This assumes to be often invalid in practical applications, when relatively mostly or between attribute correlation is larger for attribute number, Classifying quality is bad..
2) need to know prior probability, and prior probability many times depends on assuming, it is assumed that model can have very much Kind, thus sometimes can as it is assumed that prior model the reason of cause prediction effect bad.
3) since we are to determine posterior probability by priori and data to determine to classify, so categorised decision is deposited In certain error rate.
Invention content
It is an object of the invention to overcome problem above of the existing technology, the acquisition that a kind of commodity tax incorporates code into own forces is provided Method ensures the accuracy of tax revenue coding.
To realize above-mentioned technical purpose and the technique effect, the invention is realized by the following technical scheme:
A kind of commodity tax incorporates the acquisition methods of code into own forces, and the acquisition methods include the following steps:
Step 1:Data synchronize:Invoice data, commodity tax in the account of electronics bottom are received coded data and synchronized by synchronization program To invoice data platform, newly-increased data are synchronized daily;
Step 2:Data processing:Data in electronic ledger are acquired, denoising, conversion, loading procedure processing;
Step 3:The online incremental learning of model is carried out based on spark engines.
Further, in the step 1 in data synchronization process, stop data synchronization after models mature, when coding has Turn-on data synchronizes when update.
Further, the data processing of the step 2 reduces characteristic dimension on feature is selected by the way of dimensionality reduction, It rejects and centainly arrives noise, processing step is as follows:
The first step:Commodity dictionary is built, Word Intelligent Segmentation then is carried out to invoice trade name;
Second step:Word frequency is counted to calculate;
Third walks:Implement feature Hash.
Further, in the step 3 online incremental learning the specific steps are:
S1:Training data is received in a sequential manner, is then learnt the first batch data and is obtained a learning model;
S2:The second batch data is obtained, according to model or rule, makes a policy, provides result;
S3:According to true as a result, correction model weight vectors W;
S4:Then third batch data is received again, recycles S2 and S3 steps.
The beneficial effects of the invention are as follows:
Classifying quality of the present invention is preferable, and error rate is relatively low, can ensure that the accurate performance of tax revenue coding reduces mistake classification Number, improve nicety of grading, and be suitable for inseparable noise situations.
Description of the drawings
Attached drawing described herein is used to provide further understanding of the present invention, and is constituted part of this application, this hair Bright illustrative embodiments and their description are not constituted improper limitations of the present invention for explaining the present invention.In the accompanying drawings:
Fig. 1 is the flow diagram of the present invention;
Fig. 2 is the principle of the present invention schematic diagram.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained all other without creative efforts Embodiment shall fall within the protection scope of the present invention.
A kind of commodity tax as shown in Figure 1 incorporates the acquisition methods of code into own forces, and acquisition methods include the following steps:
Step 1:Data synchronize:Invoice data, commodity tax in the account of electronics bottom are received coded data and synchronized by synchronization program To invoice data platform, newly-increased data are synchronized daily, are stopped data after models mature and are synchronized, number is opened when coding has update According to synchronization;
Step 2:Data processing:Data in electronic ledger are acquired, denoising, conversion, loading procedure processing, number Characteristic dimension is reduced by the way of dimensionality reduction on feature is selected according to processing, rejects and centainly arrives noise, processing step is as follows:
The first step:Commodity dictionary is built, Word Intelligent Segmentation then is carried out to invoice trade name;
Second step:Word frequency is counted to calculate;
Third walks:Implement feature Hash;
Step 3:The online incremental learning of model is carried out based on spark engines.
Study the specific steps are:
S1:Training data is received in a sequential manner, is then learnt the first batch data and is obtained a learning model;
S2:The second batch data is obtained, according to model or rule, makes a policy, provides result;
S3:According to true as a result, correction model weight vectors W;
S4:Then third batch data is received again, recycles S2 and S3 steps.
Using online active attack type learning algorithm, PA algorithms originate from the big edge theory in SVM, for popular, SVM A kind of two classification model, basic model is defined as the maximum linear classifier in the interval on feature space, that is, support to The learning strategy of amount machine is margin maximization (big edge theoretical), can finally be converted into asking for convex quadratic programming problem Solution;The often step update of PA algorithms is obtained by the parsing of a simple optimization problem.The problem does not require nothing more than updated weight Vector can correctly classify current sample, and it is desirable that updated weight vector neighbour close to before update, it is final it is expected Obtained weight vector modulus value is minimum, that is, meet big edge theory in SVM (be exactly to find such a optimal classification surface, make from The geometric distance of the nearest point of optimal classification surface is maximum), obtain preferable bustling performance.Correction model mainly corrects weights, A parameter Tt is increased, when predicting correct, weights need not be adjusted, when prediction error, active accommodation weights, advantage is to subtract The number of few mistake classification, improves nicety of grading, and be suitable for inseparable noise situations.
In the description of this specification, the description of reference term " one embodiment ", " example ", " specific example " etc. means Particular features, structures, materials, or characteristics described in conjunction with this embodiment or example are contained at least one implementation of the present invention In example or example.In the present specification, schematic expression of the above terms may not refer to the same embodiment or example. Moreover, particular features, structures, materials, or characteristics described can be in any one or more of the embodiments or examples to close Suitable mode combines.
The basic principles, main features and advantages of the present invention have been shown and described above.The technology of the industry Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and the above embodiments and description only describe this The principle of invention, without departing from the spirit and scope of the present invention, various changes and improvements may be made to the invention, these changes Change and improvement all fall within the protetion scope of the claimed invention.

Claims (4)

1. a kind of commodity tax incorporates the acquisition methods of code into own forces, which is characterized in that the acquisition methods include the following steps:
Step 1:Data synchronize:Invoice data, commodity tax in the account of electronics bottom are received into coded data by synchronization program and are synchronized to hair Ticket data platform synchronizes newly-increased data daily;
Step 2:Data processing:Data in electronic ledger are acquired, denoising, conversion, loading procedure processing;
Step 3:The online incremental learning of model is carried out based on spark engines.
2. a kind of commodity tax according to claim 1 incorporates the acquisition methods of code into own forces, it is characterised in that:Number in the step 1 According in synchronizing process, stop data synchronization after models mature, turn-on data synchronizes when coding has update.
3. a kind of commodity tax according to claim 1 incorporates the acquisition methods of code into own forces, it is characterised in that:The number of the step 2 Characteristic dimension is reduced by the way of dimensionality reduction on feature is selected according to processing, rejects and centainly arrives noise, processing step is as follows:
The first step:Commodity dictionary is built, Word Intelligent Segmentation then is carried out to invoice trade name;
Second step:Word frequency is counted to calculate;
Third walks:Implement feature Hash.
4. a kind of commodity tax according to claim 1 incorporates the acquisition methods of code into own forces, it is characterised in that:In the step 3 Line incremental learning the specific steps are:
S1:Training data is received in a sequential manner, is then learnt the first batch data and is obtained a learning model;
S2:The second batch data is obtained, according to model or rule, makes a policy, provides result;
S3:According to true as a result, correction model weight vectors W;
S4:Then third batch data is received again, recycles S2 and S3 steps.
CN201810273206.XA 2018-03-29 2018-03-29 A kind of commodity tax incorporates the acquisition methods of code into own forces Pending CN108491887A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810273206.XA CN108491887A (en) 2018-03-29 2018-03-29 A kind of commodity tax incorporates the acquisition methods of code into own forces

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810273206.XA CN108491887A (en) 2018-03-29 2018-03-29 A kind of commodity tax incorporates the acquisition methods of code into own forces

Publications (1)

Publication Number Publication Date
CN108491887A true CN108491887A (en) 2018-09-04

Family

ID=63317391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810273206.XA Pending CN108491887A (en) 2018-03-29 2018-03-29 A kind of commodity tax incorporates the acquisition methods of code into own forces

Country Status (1)

Country Link
CN (1) CN108491887A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871861A (en) * 2018-12-27 2019-06-11 航天信息股份有限公司 It is a kind of for providing the system and method for coding for target data
CN110287218A (en) * 2019-06-26 2019-09-27 浙江诺诺网络科技有限公司 A kind of matched method of tax revenue sorting code number, system and equipment
CN110443313A (en) * 2019-08-08 2019-11-12 山东浪潮商用系统有限公司 Invoice product name collecting method based on machine learning algorithm
CN111210329A (en) * 2019-12-31 2020-05-29 航天信息软件技术有限公司 Accounting document generation method and device, storage medium and electronic equipment
CN111275476A (en) * 2018-12-05 2020-06-12 北京京东尚科信息技术有限公司 Logistics storage service quotation method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7764830B1 (en) * 2000-03-02 2010-07-27 Science Applications International Corporation Machine learning of document templates for data extraction
US20110317700A1 (en) * 2010-06-28 2011-12-29 Avaya Inc. Method for real-time synchronization of arp record in rsmlt cluster
US20130262069A1 (en) * 2012-03-29 2013-10-03 Platte River Associates, Inc. Targeted site selection within shale gas basins
CN103617466A (en) * 2013-12-13 2014-03-05 李敬泉 Comprehensive evaluation method for commodity demand predication model
CN104573740A (en) * 2014-12-22 2015-04-29 山东鲁能软件技术有限公司 SVM classification model-based equipment fault diagnosing method
CN106933814A (en) * 2015-12-28 2017-07-07 航天信息股份有限公司 Tax data exception analysis method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7764830B1 (en) * 2000-03-02 2010-07-27 Science Applications International Corporation Machine learning of document templates for data extraction
US20110317700A1 (en) * 2010-06-28 2011-12-29 Avaya Inc. Method for real-time synchronization of arp record in rsmlt cluster
US20130262069A1 (en) * 2012-03-29 2013-10-03 Platte River Associates, Inc. Targeted site selection within shale gas basins
CN103617466A (en) * 2013-12-13 2014-03-05 李敬泉 Comprehensive evaluation method for commodity demand predication model
CN104573740A (en) * 2014-12-22 2015-04-29 山东鲁能软件技术有限公司 SVM classification model-based equipment fault diagnosing method
CN106933814A (en) * 2015-12-28 2017-07-07 航天信息股份有限公司 Tax data exception analysis method and system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275476A (en) * 2018-12-05 2020-06-12 北京京东尚科信息技术有限公司 Logistics storage service quotation method and device
CN111275476B (en) * 2018-12-05 2023-11-03 北京京东振世信息技术有限公司 Quotation method and device for logistics storage service
CN109871861A (en) * 2018-12-27 2019-06-11 航天信息股份有限公司 It is a kind of for providing the system and method for coding for target data
CN109871861B (en) * 2018-12-27 2023-05-23 航天信息股份有限公司 System and method for providing coding for target data
CN110287218A (en) * 2019-06-26 2019-09-27 浙江诺诺网络科技有限公司 A kind of matched method of tax revenue sorting code number, system and equipment
CN110443313A (en) * 2019-08-08 2019-11-12 山东浪潮商用系统有限公司 Invoice product name collecting method based on machine learning algorithm
CN111210329A (en) * 2019-12-31 2020-05-29 航天信息软件技术有限公司 Accounting document generation method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN108491887A (en) A kind of commodity tax incorporates the acquisition methods of code into own forces
Liu et al. A generic first-order algorithmic framework for bi-level programming beyond lower-level singleton
Grandvalet et al. Support vector machines with a reject option
Pham et al. An incremental K-means algorithm
CN111401300B (en) Face clustering archiving method and device and storage medium
CN114841257B (en) Small sample target detection method based on self-supervision comparison constraint
CN103886396A (en) Method for determining mixing optimizing of artificial fish stock and particle swarm
Mukhopadhyay et al. Multiobjective genetic clustering with ensemble among pareto front solutions: Application to MRI brain image segmentation
CN107578101B (en) Data stream load prediction method
CN116503676A (en) Picture classification method and system based on knowledge distillation small sample increment learning
CN109842614B (en) Network intrusion detection method based on data mining
CN111695011A (en) Tensor expression-based dynamic hypergraph structure learning classification method and system
CN108416168B (en) Terrain adaptive area selection scheme based on layered decision
CN107666403A (en) The acquisition methods and device of a kind of achievement data
Liu et al. A fast information-theoretic approximation of joint mutual information feature selection
US20210365617A1 (en) Design and optimization algorithm utilizing multiple networks and adversarial training
CN112347842B (en) Offline face clustering method based on association graph
CN107704969A (en) A kind of Forecast of Logistics Demand method based on Weighted naive bayes algorithm
Kliegr Quantitative CBA: Small and Comprehensible Association Rule Classification Models
Lan et al. A new model of combining multiple classifiers based on neural network
CN109886340B (en) Remote sensing image classification method
CN113763710A (en) Short-term traffic flow prediction method based on nonlinear adaptive system
CN108595843B (en) Dynamically self-adaptive crowd-sourced design scheme data optimization method
Lvovich et al. Algorithmic procedures for selection control options for electric power systems
Liu et al. Synthetic aperture radar image target recognition based on improved fusion of R-FCN and SRC

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180904

RJ01 Rejection of invention patent application after publication