CN108491887A - A kind of commodity tax incorporates the acquisition methods of code into own forces - Google Patents
A kind of commodity tax incorporates the acquisition methods of code into own forces Download PDFInfo
- Publication number
- CN108491887A CN108491887A CN201810273206.XA CN201810273206A CN108491887A CN 108491887 A CN108491887 A CN 108491887A CN 201810273206 A CN201810273206 A CN 201810273206A CN 108491887 A CN108491887 A CN 108491887A
- Authority
- CN
- China
- Prior art keywords
- data
- acquisition methods
- commodity
- forces
- code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention discloses the acquisition methods that a kind of commodity tax incorporates code into own forces, the acquisition methods include the following steps:Invoice data, commodity tax in the account of electronics bottom are received into coded data by synchronization program and are synchronized to invoice data platform, synchronizes newly-increased data daily;Data in electronic ledger are acquired, denoising, conversion, loading procedure processing;The online incremental learning of model is carried out based on spark engines.Classifying quality of the present invention is preferable, and error rate is relatively low, can ensure that the accurate performance of tax revenue coding reduces the number of mistake classification, improve nicety of grading, and be suitable for inseparable noise situations.
Description
Technical field
The invention belongs to the acquisition methods that code is incorporated in tax revenue field more particularly to a kind of commodity tax into own forces.
Background technology
Since the special ticket of value-added tax and general ticket data are in the account system of electronics bottom, need data being synchronized to invoice data and put down
Platform carries out load storage.Current existing product mainly uses bayesian algorithm, and the principle of classification of Bayes classifier is profit
With the prior probability of each classification, Bayesian formula and independence assumption is recycled to calculate the class probability and object of attribute
Posterior probability, i.e. the object belongs to certain a kind of probability, select the class with maximum a posteriori probability as belonging to the object
Classification.Prior probability of the commodity for tax revenue sorting code number is namely found out according to historical data, then calculates current commodity
For the posterior probability of each coding, judge that the affiliated tax revenue of the commodity encodes according to probability size.
But traditional algorithm has the disadvantages that:
1) theoretically, model-naive Bayesian has minimum error rate compared with other sorting techniques.But actually
Not such was the case with, this is because in the case of the given output class of model-naive Bayesian is other, it is assumed that between attribute independently of each other,
This assumes to be often invalid in practical applications, when relatively mostly or between attribute correlation is larger for attribute number,
Classifying quality is bad..
2) need to know prior probability, and prior probability many times depends on assuming, it is assumed that model can have very much
Kind, thus sometimes can as it is assumed that prior model the reason of cause prediction effect bad.
3) since we are to determine posterior probability by priori and data to determine to classify, so categorised decision is deposited
In certain error rate.
Invention content
It is an object of the invention to overcome problem above of the existing technology, the acquisition that a kind of commodity tax incorporates code into own forces is provided
Method ensures the accuracy of tax revenue coding.
To realize above-mentioned technical purpose and the technique effect, the invention is realized by the following technical scheme:
A kind of commodity tax incorporates the acquisition methods of code into own forces, and the acquisition methods include the following steps:
Step 1:Data synchronize:Invoice data, commodity tax in the account of electronics bottom are received coded data and synchronized by synchronization program
To invoice data platform, newly-increased data are synchronized daily;
Step 2:Data processing:Data in electronic ledger are acquired, denoising, conversion, loading procedure processing;
Step 3:The online incremental learning of model is carried out based on spark engines.
Further, in the step 1 in data synchronization process, stop data synchronization after models mature, when coding has
Turn-on data synchronizes when update.
Further, the data processing of the step 2 reduces characteristic dimension on feature is selected by the way of dimensionality reduction,
It rejects and centainly arrives noise, processing step is as follows:
The first step:Commodity dictionary is built, Word Intelligent Segmentation then is carried out to invoice trade name;
Second step:Word frequency is counted to calculate;
Third walks:Implement feature Hash.
Further, in the step 3 online incremental learning the specific steps are:
S1:Training data is received in a sequential manner, is then learnt the first batch data and is obtained a learning model;
S2:The second batch data is obtained, according to model or rule, makes a policy, provides result;
S3:According to true as a result, correction model weight vectors W;
S4:Then third batch data is received again, recycles S2 and S3 steps.
The beneficial effects of the invention are as follows:
Classifying quality of the present invention is preferable, and error rate is relatively low, can ensure that the accurate performance of tax revenue coding reduces mistake classification
Number, improve nicety of grading, and be suitable for inseparable noise situations.
Description of the drawings
Attached drawing described herein is used to provide further understanding of the present invention, and is constituted part of this application, this hair
Bright illustrative embodiments and their description are not constituted improper limitations of the present invention for explaining the present invention.In the accompanying drawings:
Fig. 1 is the flow diagram of the present invention;
Fig. 2 is the principle of the present invention schematic diagram.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained all other without creative efforts
Embodiment shall fall within the protection scope of the present invention.
A kind of commodity tax as shown in Figure 1 incorporates the acquisition methods of code into own forces, and acquisition methods include the following steps:
Step 1:Data synchronize:Invoice data, commodity tax in the account of electronics bottom are received coded data and synchronized by synchronization program
To invoice data platform, newly-increased data are synchronized daily, are stopped data after models mature and are synchronized, number is opened when coding has update
According to synchronization;
Step 2:Data processing:Data in electronic ledger are acquired, denoising, conversion, loading procedure processing, number
Characteristic dimension is reduced by the way of dimensionality reduction on feature is selected according to processing, rejects and centainly arrives noise, processing step is as follows:
The first step:Commodity dictionary is built, Word Intelligent Segmentation then is carried out to invoice trade name;
Second step:Word frequency is counted to calculate;
Third walks:Implement feature Hash;
Step 3:The online incremental learning of model is carried out based on spark engines.
Study the specific steps are:
S1:Training data is received in a sequential manner, is then learnt the first batch data and is obtained a learning model;
S2:The second batch data is obtained, according to model or rule, makes a policy, provides result;
S3:According to true as a result, correction model weight vectors W;
S4:Then third batch data is received again, recycles S2 and S3 steps.
Using online active attack type learning algorithm, PA algorithms originate from the big edge theory in SVM, for popular, SVM
A kind of two classification model, basic model is defined as the maximum linear classifier in the interval on feature space, that is, support to
The learning strategy of amount machine is margin maximization (big edge theoretical), can finally be converted into asking for convex quadratic programming problem
Solution;The often step update of PA algorithms is obtained by the parsing of a simple optimization problem.The problem does not require nothing more than updated weight
Vector can correctly classify current sample, and it is desirable that updated weight vector neighbour close to before update, it is final it is expected
Obtained weight vector modulus value is minimum, that is, meet big edge theory in SVM (be exactly to find such a optimal classification surface, make from
The geometric distance of the nearest point of optimal classification surface is maximum), obtain preferable bustling performance.Correction model mainly corrects weights,
A parameter Tt is increased, when predicting correct, weights need not be adjusted, when prediction error, active accommodation weights, advantage is to subtract
The number of few mistake classification, improves nicety of grading, and be suitable for inseparable noise situations.
In the description of this specification, the description of reference term " one embodiment ", " example ", " specific example " etc. means
Particular features, structures, materials, or characteristics described in conjunction with this embodiment or example are contained at least one implementation of the present invention
In example or example.In the present specification, schematic expression of the above terms may not refer to the same embodiment or example.
Moreover, particular features, structures, materials, or characteristics described can be in any one or more of the embodiments or examples to close
Suitable mode combines.
The basic principles, main features and advantages of the present invention have been shown and described above.The technology of the industry
Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and the above embodiments and description only describe this
The principle of invention, without departing from the spirit and scope of the present invention, various changes and improvements may be made to the invention, these changes
Change and improvement all fall within the protetion scope of the claimed invention.
Claims (4)
1. a kind of commodity tax incorporates the acquisition methods of code into own forces, which is characterized in that the acquisition methods include the following steps:
Step 1:Data synchronize:Invoice data, commodity tax in the account of electronics bottom are received into coded data by synchronization program and are synchronized to hair
Ticket data platform synchronizes newly-increased data daily;
Step 2:Data processing:Data in electronic ledger are acquired, denoising, conversion, loading procedure processing;
Step 3:The online incremental learning of model is carried out based on spark engines.
2. a kind of commodity tax according to claim 1 incorporates the acquisition methods of code into own forces, it is characterised in that:Number in the step 1
According in synchronizing process, stop data synchronization after models mature, turn-on data synchronizes when coding has update.
3. a kind of commodity tax according to claim 1 incorporates the acquisition methods of code into own forces, it is characterised in that:The number of the step 2
Characteristic dimension is reduced by the way of dimensionality reduction on feature is selected according to processing, rejects and centainly arrives noise, processing step is as follows:
The first step:Commodity dictionary is built, Word Intelligent Segmentation then is carried out to invoice trade name;
Second step:Word frequency is counted to calculate;
Third walks:Implement feature Hash.
4. a kind of commodity tax according to claim 1 incorporates the acquisition methods of code into own forces, it is characterised in that:In the step 3
Line incremental learning the specific steps are:
S1:Training data is received in a sequential manner, is then learnt the first batch data and is obtained a learning model;
S2:The second batch data is obtained, according to model or rule, makes a policy, provides result;
S3:According to true as a result, correction model weight vectors W;
S4:Then third batch data is received again, recycles S2 and S3 steps.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810273206.XA CN108491887A (en) | 2018-03-29 | 2018-03-29 | A kind of commodity tax incorporates the acquisition methods of code into own forces |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810273206.XA CN108491887A (en) | 2018-03-29 | 2018-03-29 | A kind of commodity tax incorporates the acquisition methods of code into own forces |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108491887A true CN108491887A (en) | 2018-09-04 |
Family
ID=63317391
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810273206.XA Pending CN108491887A (en) | 2018-03-29 | 2018-03-29 | A kind of commodity tax incorporates the acquisition methods of code into own forces |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108491887A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109871861A (en) * | 2018-12-27 | 2019-06-11 | 航天信息股份有限公司 | It is a kind of for providing the system and method for coding for target data |
CN110287218A (en) * | 2019-06-26 | 2019-09-27 | 浙江诺诺网络科技有限公司 | A kind of matched method of tax revenue sorting code number, system and equipment |
CN110443313A (en) * | 2019-08-08 | 2019-11-12 | 山东浪潮商用系统有限公司 | Invoice product name collecting method based on machine learning algorithm |
CN111210329A (en) * | 2019-12-31 | 2020-05-29 | 航天信息软件技术有限公司 | Accounting document generation method and device, storage medium and electronic equipment |
CN111275476A (en) * | 2018-12-05 | 2020-06-12 | 北京京东尚科信息技术有限公司 | Logistics storage service quotation method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7764830B1 (en) * | 2000-03-02 | 2010-07-27 | Science Applications International Corporation | Machine learning of document templates for data extraction |
US20110317700A1 (en) * | 2010-06-28 | 2011-12-29 | Avaya Inc. | Method for real-time synchronization of arp record in rsmlt cluster |
US20130262069A1 (en) * | 2012-03-29 | 2013-10-03 | Platte River Associates, Inc. | Targeted site selection within shale gas basins |
CN103617466A (en) * | 2013-12-13 | 2014-03-05 | 李敬泉 | Comprehensive evaluation method for commodity demand predication model |
CN104573740A (en) * | 2014-12-22 | 2015-04-29 | 山东鲁能软件技术有限公司 | SVM classification model-based equipment fault diagnosing method |
CN106933814A (en) * | 2015-12-28 | 2017-07-07 | 航天信息股份有限公司 | Tax data exception analysis method and system |
-
2018
- 2018-03-29 CN CN201810273206.XA patent/CN108491887A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7764830B1 (en) * | 2000-03-02 | 2010-07-27 | Science Applications International Corporation | Machine learning of document templates for data extraction |
US20110317700A1 (en) * | 2010-06-28 | 2011-12-29 | Avaya Inc. | Method for real-time synchronization of arp record in rsmlt cluster |
US20130262069A1 (en) * | 2012-03-29 | 2013-10-03 | Platte River Associates, Inc. | Targeted site selection within shale gas basins |
CN103617466A (en) * | 2013-12-13 | 2014-03-05 | 李敬泉 | Comprehensive evaluation method for commodity demand predication model |
CN104573740A (en) * | 2014-12-22 | 2015-04-29 | 山东鲁能软件技术有限公司 | SVM classification model-based equipment fault diagnosing method |
CN106933814A (en) * | 2015-12-28 | 2017-07-07 | 航天信息股份有限公司 | Tax data exception analysis method and system |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111275476A (en) * | 2018-12-05 | 2020-06-12 | 北京京东尚科信息技术有限公司 | Logistics storage service quotation method and device |
CN111275476B (en) * | 2018-12-05 | 2023-11-03 | 北京京东振世信息技术有限公司 | Quotation method and device for logistics storage service |
CN109871861A (en) * | 2018-12-27 | 2019-06-11 | 航天信息股份有限公司 | It is a kind of for providing the system and method for coding for target data |
CN109871861B (en) * | 2018-12-27 | 2023-05-23 | 航天信息股份有限公司 | System and method for providing coding for target data |
CN110287218A (en) * | 2019-06-26 | 2019-09-27 | 浙江诺诺网络科技有限公司 | A kind of matched method of tax revenue sorting code number, system and equipment |
CN110443313A (en) * | 2019-08-08 | 2019-11-12 | 山东浪潮商用系统有限公司 | Invoice product name collecting method based on machine learning algorithm |
CN111210329A (en) * | 2019-12-31 | 2020-05-29 | 航天信息软件技术有限公司 | Accounting document generation method and device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108491887A (en) | A kind of commodity tax incorporates the acquisition methods of code into own forces | |
Liu et al. | A generic first-order algorithmic framework for bi-level programming beyond lower-level singleton | |
Grandvalet et al. | Support vector machines with a reject option | |
Pham et al. | An incremental K-means algorithm | |
CN111401300B (en) | Face clustering archiving method and device and storage medium | |
CN114841257B (en) | Small sample target detection method based on self-supervision comparison constraint | |
CN103886396A (en) | Method for determining mixing optimizing of artificial fish stock and particle swarm | |
Mukhopadhyay et al. | Multiobjective genetic clustering with ensemble among pareto front solutions: Application to MRI brain image segmentation | |
CN107578101B (en) | Data stream load prediction method | |
CN116503676A (en) | Picture classification method and system based on knowledge distillation small sample increment learning | |
CN109842614B (en) | Network intrusion detection method based on data mining | |
CN111695011A (en) | Tensor expression-based dynamic hypergraph structure learning classification method and system | |
CN108416168B (en) | Terrain adaptive area selection scheme based on layered decision | |
CN107666403A (en) | The acquisition methods and device of a kind of achievement data | |
Liu et al. | A fast information-theoretic approximation of joint mutual information feature selection | |
US20210365617A1 (en) | Design and optimization algorithm utilizing multiple networks and adversarial training | |
CN112347842B (en) | Offline face clustering method based on association graph | |
CN107704969A (en) | A kind of Forecast of Logistics Demand method based on Weighted naive bayes algorithm | |
Kliegr | Quantitative CBA: Small and Comprehensible Association Rule Classification Models | |
Lan et al. | A new model of combining multiple classifiers based on neural network | |
CN109886340B (en) | Remote sensing image classification method | |
CN113763710A (en) | Short-term traffic flow prediction method based on nonlinear adaptive system | |
CN108595843B (en) | Dynamically self-adaptive crowd-sourced design scheme data optimization method | |
Lvovich et al. | Algorithmic procedures for selection control options for electric power systems | |
Liu et al. | Synthetic aperture radar image target recognition based on improved fusion of R-FCN and SRC |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180904 |
|
RJ01 | Rejection of invention patent application after publication |