CN111985204B - Method for predicting tax numbers of customs import and export commodities - Google Patents
Method for predicting tax numbers of customs import and export commodities Download PDFInfo
- Publication number
- CN111985204B CN111985204B CN202010744808.6A CN202010744808A CN111985204B CN 111985204 B CN111985204 B CN 111985204B CN 202010744808 A CN202010744808 A CN 202010744808A CN 111985204 B CN111985204 B CN 111985204B
- Authority
- CN
- China
- Prior art keywords
- commodity
- network
- tax
- export
- customs
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Business, Economics & Management (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Accounting & Taxation (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Finance (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a customs import and export commodity tax number prediction method, which specifically comprises the following steps: step 1: preprocessing the customs import and export commodity text to obtain element names and element contents; step 2: splitting the element content obtained in the step 1, and then utilizing an auxiliary network to select different elements; step 3: and (3) sending the differential elements obtained in the step (2) into a CNN network for feature extraction, and simultaneously extracting element name features and SSCNN network extract element content features by using the DPCNN network. Step 4: and (3) fusing the differential element characteristics, the element name characteristics and the element content characteristics obtained in the step (3), and then performing classification operation to obtain the commodity tax number. According to the method and the device, by utilizing the special corpus resource of customs, tax prediction is carried out on the commodity text of customs import and export on the premise that the feature dilution of the short elements is caused by the difference of the length of the declaration elements, and the accuracy of tax prediction is improved.
Description
Technical Field
The invention relates to the technical field of natural language processing, in particular to a customs import and export commodity tax number prediction method based on a mixed convolution neural network and an auxiliary network.
Background
Customs tax is a major source of tax in many countries. At present, china customs mainly uses manpower to check tax rate of import and export commodities, and can cover a small part of massive import and export commodities. Since the main basis of customs tax collection is text information of commodities, the natural language processing technology is used for classifying the commodity text, and tax is determined according to the category, so that automation of tax risk prevention and control can be realized. Tax predictions for goods can be translated into a Chinese text classification problem.
Chinese text classification refers to the process of automatically classifying and marking a text set (or other entities or objects) according to a certain classification system or standard. According to a marked training document set, a relation model between document characteristics and document categories is found, and then category judgment is carried out on a new document by using the relation model obtained by learning. Existing text classification is gradually transitioning from knowledge-based methods to statistical and machine learning-based methods. Many classification models obtain ideal effects on Chinese text classification tasks, compared with common Chinese, customs import and export declaration texts, a single text is linearly composed of a plurality of elements, and continuous context semantics are not available. At present, no attempt is made to carry out customs import and export declaration text classification tasks by using artificial intelligence, but the text classification task is abstracted into a traditional text classification problem, and text characteristics can be well extracted by a text CNN convolution model proposed by Yoon Kim of a Massachu staff, and text classification is carried out by using characteristic combinations; the BERT model proposed by google utilizes large-scale pre-training corpus and huge model parameter quantity, so that the precision of text classification tasks is improved. However, for customs import and export declaration text classification tasks, the common model has poor effect on customs commodity classification tasks due to the territory and specificity of customs texts.
Disclosure of Invention
The purpose of the application is to provide a method for predicting tax numbers of customs import and export commodity, which utilizes the special corpus resource of customs to predict tax numbers of customs import and export commodity texts on the premise of diluting short element characteristics caused by the difference of the length of declaration elements, so that the accuracy rate of tax number prediction is improved.
In order to achieve the above purpose, the technical scheme of the application is as follows: a customs import and export commodity tax number prediction method specifically comprises the following steps:
step 1: preprocessing the customs import and export commodity text to obtain element names and element contents;
step 2: splitting the element content obtained in the step 1, and then utilizing an auxiliary network to select different elements;
step 3: and (3) sending the differential elements obtained in the step (2) into a CNN network for feature extraction, and simultaneously extracting element name features and SSCNN network extract element content features by using the DPCNN network.
Step 4: and (3) fusing the differential element characteristics, the element name characteristics and the element content characteristics obtained in the step (3), and then performing classification operation to obtain the commodity tax number.
Further, the specific implementation manner of the step 2 is as follows:
step 21, gathering the obtained element content and the data with the same commodity major class together to form a paragraph;
step 22, calculating how many commodity subclasses are in each paragraph, and sending each paragraph into an auxiliary network according to the commodity subclasses, and performing classification training on the commodity subclasses; when training each paragraph, sequentially changing element contents into element names according to the sequence to obtain a loss value of each element;
step 23, selecting the first 2 differential elements according to the order from the big to the small by using the loss value of each element obtained in each paragraph.
Further, the specific implementation manner of the step 3 is as follows:
step 31, sending the differential elements into a CNN network, extracting features by using a convolution layer, and carrying out feature sparsity by using a maximum pooling layer;
step 32, sending the element name into a DPCNN network, extracting features by using a convolution layer, and reducing the sequence length of a sampling layer to enlarge a receptive field;
step 33, the element content is sent into an SSCNN network to extract shallow features by using a structured convolution layer.
Further, the specific implementation manner of the step 4 is as follows:
step 41, splicing the different element characteristics, the element name characteristics and the element content characteristics;
step 42, sending the spliced characteristics into two layers of full-connection layers, wherein each node of the full-connection layers is a network connected with all nodes of the upper layer and is used for integrating the extracted characteristics; the first layer output dimension is commodity large class number, the second layer full-connection layer output dimension is commodity small class number, and the large class number and the small class number are spliced together to obtain commodity tax number.
By adopting the technical scheme, the invention can obtain the following technical effects: according to the invention, by integrating a plurality of convolution networks and utilizing the special corpus resource of customs and combining the characteristics of customs texts, the problems of unobvious commodity distinguishing property under the same objective and short element characteristic dilution caused by the difference of the length of the declaration elements are solved, the independence and the importance of the shorter content elements in the integral characteristics are enhanced, and the accuracy of customs import and export commodity tax number prediction is improved.
Drawings
FIG. 1 is a flow chart of a method for predicting tax numbers of customs import and export commodities.
Detailed Description
The invention is described in further detail below with reference to the attached drawings and to specific embodiments: this is taken as an example to describe the present application further.
Example 1
In the process of predicting tax numbers of articles at customs import and export, the characteristic that each element has congenital demarcations is utilized, so that the semantic fusion degree among the elements needs to be reduced, and enough outstanding features need to be extracted to obtain correct tax numbers of articles. Based on the characteristics of customs texts and the problems in the customs import and export commodity tax prediction task, referring to fig. 1, the application provides a customs import and export commodity tax prediction method: firstly, data preprocessing is carried out on commodity declaration texts of customs import and export, then word segmentation is carried out on the text data, and element names corresponding to the commodity declaration text element content are found out by referring to a declaration element catalog. And then, using an auxiliary convolution network to find out decisive difference elements of commodities under the same large class, and using a mixed convolution neural network to predict tax numbers for commodity texts. The mixed convolution neural network processes different commodity text contents by using three convolutions, performs feature extraction on differential elements by using a common convolution neural network (Convolution Neural Network, CNN), performs feature extraction on element contents by using shallow structured convolutions (Shallow Structured Convolution Neural Network, SSCNN), performs feature extraction on element names by using deep pyramid convolutions (Deep Pyramid Convolution Neural Network, DPCNN), and performs classification by using a fully connected network after the three feature extraction, thereby obtaining commodity tax numbers. The method effectively solves the problems that commodities in the same general class are difficult to distinguish and short element characteristics are diluted due to the fact that the length difference of declaration elements is too large in the customs import and export commodity tax number prediction problem, and the accuracy is remarkably improved compared with other mainstream deep learning methods at present.
The present invention will be described in detail below with reference to examples and drawings so as to enable one of ordinary skill in the art to practice the same, with reference to the present description.
In this embodiment, pycharm is used as a development platform, and Python is used as a development language. The method is carried out on 1400000 sentence corpus of customs real data. The method comprises the following specific processes:
step 1: and preprocessing the customs import and export commodity text to obtain element names and element contents.
Step 2: splitting the element content obtained in the step 1, and then using an auxiliary network to perform differential element selection, wherein the method specifically comprises the following steps:
step 21: the element content obtained in the step 1 is gathered together with the data with the same commodity category to form a paragraph; for example, data:
data A, "8412390000 |pneumatic actuator|43| converts pneumatic power to mechanical power|InGERGOLLAND| 94695194" for pneumatic valve
Data B, "8412310090 ram air actuator |4|3| provides air pressure linear force |aircraft power system |HONEYWELL| 676000141'
The two commodity declaration records are mainly "84123", and the fixed declaration elements are all "commodity category |brand type|outlet sharing benefit condition|principle|use|brand|model", so that the commodity declaration records are gathered in one paragraph.
Step 22: and calculating how many commodity subclasses exist in each paragraph, and sending each paragraph into an auxiliary network according to the commodity subclasses, and performing classification training on the commodity subclasses. When training each paragraph, sequentially changing element contents into element names according to the sequence to obtain a loss value of each element;
step 23: the first 2 differential elements are selected in order from the largest to the smallest by using the loss values of the elements obtained in each paragraph.
The two pieces of data are calculated by an auxiliary network, and the obtained differential elements are a commodity name and a principle.
Step 3: sending the differential elements obtained in the step 2 into a CNN network to extract features, and simultaneously respectively extracting element names and element content features by using DPCNN and SSPCNN networks, wherein the method specifically comprises the following steps:
step 31: sending the difference elements into a CNN network, extracting features by using a convolution layer, and carrying out feature sparseness by using a maximum pooling layer;
step 32: sending the element name into a DPCNN network, extracting features by using a convolution layer, and reducing the sequence length of a layer of laminated sequence by a downsampled layer to enlarge a receptive field;
step 33: and (3) sending the element content into an SSPCNN network, and extracting shallow features by using a structured convolution layer.
For example, the above data, element names and element contents are kept from being transmitted to respective convolutional neural network models to extract features, the data A converts pneumatic actuator I into mechanical power, and the data B transmits ram air actuator I to provide pneumatic linear acting force to the textCNN model to extract features.
Step 4: and (3) fusing the differential element characteristics, the element name characteristics and the element content characteristics obtained in the step (3), and then performing classification operation.
Step 41: splicing the differential element characteristics, the element name characteristics and the element content characteristics;
step 42: and sending the spliced features into two layers of full-connection layers for classification, wherein the output dimension of the first layer is a commodity large-class number, the output dimension of the second layer is a commodity small-class number, and splicing the large-class number and the small-class number together to obtain the commodity tax number.
For example, the data is obtained according to the probability of all classifications, and the target class with the highest probability is selected as the final prediction class of the model.
According to the steps, the word segmentation effect is compared with DPCNN model, transform model, BERT model and RoBERTa model methods. As can be seen from table 1, the method proposed by the present invention is significantly superior to other methods in terms of classification accuracy, precision and F1 value.
Table 1 comparison of different models for customs import and export commodity classification effect
Meanwhile, the invention also verifies the influence of different auxiliary networks on the final commodity classification. As shown in Table 2, the accuracy of classification of customs import and export commodities can be greatly improved by adopting the textCNN model for the auxiliary network.
TABLE 2 influence of different auxiliary networks on the classification of customs import and export commodities
While the invention has been described with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (3)
1. A customs import and export commodity tax number prediction method is characterized by comprising the following steps:
step 1: preprocessing the customs import and export commodity text to obtain element names and element contents;
step 2: splitting the element content obtained in the step 1, and then utilizing an auxiliary network to select different elements;
step 3: sending the differential elements obtained in the step 2 into a CNN network for feature extraction, and simultaneously extracting element name features and SSCNN network content features by using the DPCNN network;
step 4: fusing the differential element characteristics, the element name characteristics and the element content characteristics obtained in the step 3, and then performing classification operation to obtain commodity tax numbers;
the specific implementation mode of the step 2 is as follows:
step 21, gathering the obtained element content and the data with the same commodity major class together to form a paragraph;
step 22, calculating how many commodity subclasses are in each paragraph, and sending each paragraph into an auxiliary network according to the commodity subclasses, and performing classification training on the commodity subclasses; when training each paragraph, sequentially changing element contents into element names according to the sequence to obtain a loss value of each element;
step 23, selecting the first 2 differential elements according to the order from the big to the small by using the loss value of each element obtained in each paragraph.
2. The customs import and export commodity tax prediction method according to claim 1, wherein the specific implementation manner of the step 3 is as follows:
step 31, sending the differential elements into a CNN network, extracting features by using a convolution layer, and carrying out feature sparsity by using a maximum pooling layer;
step 32, sending the element name into a DPCNN network, extracting features by using a convolution layer, and reducing the sequence length of a sampling layer to enlarge a receptive field;
step 33, the element content is sent into an SSCNN network to extract shallow features by using a structured convolution layer.
3. The customs import and export commodity tax prediction method according to claim 1, wherein the specific implementation manner of the step 4 is as follows:
step 41, splicing the different element characteristics, the element name characteristics and the element content characteristics;
step 42, sending the spliced features into two full-connection layers for integrating the extracted features; the first layer output dimension is commodity large class number, the second layer full-connection layer output dimension is commodity small class number, and the large class number and the small class number are spliced together to obtain commodity tax number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010744808.6A CN111985204B (en) | 2020-07-29 | 2020-07-29 | Method for predicting tax numbers of customs import and export commodities |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010744808.6A CN111985204B (en) | 2020-07-29 | 2020-07-29 | Method for predicting tax numbers of customs import and export commodities |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111985204A CN111985204A (en) | 2020-11-24 |
CN111985204B true CN111985204B (en) | 2023-06-02 |
Family
ID=73445564
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010744808.6A Active CN111985204B (en) | 2020-07-29 | 2020-07-29 | Method for predicting tax numbers of customs import and export commodities |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111985204B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113343640B (en) * | 2021-05-26 | 2024-02-20 | 南京大学 | Method and device for classifying customs commodity HS codes |
CN113705188B (en) * | 2021-08-19 | 2023-06-06 | 大连大学 | Intelligent evaluation method for customs import and export commodity specification declaration |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104008536A (en) * | 2013-11-04 | 2014-08-27 | 无锡金帆钻凿设备股份有限公司 | Multi-focus noise image fusion method based on CS-CHMT and IDPCNN |
CN109034154A (en) * | 2018-07-23 | 2018-12-18 | 西安电子科技大学昆山创新研究院 | The extraction and recognition methods of Invoice Seal duty paragraph |
CN110175235A (en) * | 2019-04-23 | 2019-08-27 | 苏宁易购集团股份有限公司 | Intelligence commodity tax sorting code number method and system neural network based |
-
2020
- 2020-07-29 CN CN202010744808.6A patent/CN111985204B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104008536A (en) * | 2013-11-04 | 2014-08-27 | 无锡金帆钻凿设备股份有限公司 | Multi-focus noise image fusion method based on CS-CHMT and IDPCNN |
CN109034154A (en) * | 2018-07-23 | 2018-12-18 | 西安电子科技大学昆山创新研究院 | The extraction and recognition methods of Invoice Seal duty paragraph |
CN110175235A (en) * | 2019-04-23 | 2019-08-27 | 苏宁易购集团股份有限公司 | Intelligence commodity tax sorting code number method and system neural network based |
Non-Patent Citations (1)
Title |
---|
煤炭增碳剂商品归类化验关键点分析;邱越;杨丽飞;王海仙;;中国口岸科学技术(第01期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111985204A (en) | 2020-11-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109766277B (en) | Software fault diagnosis method based on transfer learning and DNN | |
CN107766371B (en) | Text information classification method and device | |
WO2018028077A1 (en) | Deep learning based method and device for chinese semantics analysis | |
CN112347268A (en) | Text-enhanced knowledge graph joint representation learning method and device | |
CN109753660B (en) | LSTM-based winning bid web page named entity extraction method | |
CN108595708A (en) | A kind of exception information file classification method of knowledge based collection of illustrative plates | |
CN112434535B (en) | Element extraction method, device, equipment and storage medium based on multiple models | |
CN107122349A (en) | A kind of feature word of text extracting method based on word2vec LDA models | |
CN114896388A (en) | Hierarchical multi-label text classification method based on mixed attention | |
CN111259153B (en) | Attribute-level emotion analysis method of complete attention mechanism | |
CN111460164B (en) | Intelligent fault judging method for telecommunication work orders based on pre-training language model | |
CN112052684A (en) | Named entity identification method, device, equipment and storage medium for power metering | |
CN111985204B (en) | Method for predicting tax numbers of customs import and export commodities | |
CN108710894A (en) | A kind of Active Learning mask method and device based on cluster representative point | |
CN112446215B (en) | Entity relation joint extraction method | |
CN113051914A (en) | Enterprise hidden label extraction method and device based on multi-feature dynamic portrait | |
CN111966825A (en) | Power grid equipment defect text classification method based on machine learning | |
CN112925904B (en) | Lightweight text classification method based on Tucker decomposition | |
CN111143567A (en) | Comment emotion analysis method based on improved neural network | |
CN116089610A (en) | Label identification method and device based on industry knowledge | |
CN110969023A (en) | Text similarity determination method and device | |
Li et al. | Rethinking table structure recognition using sequence labeling methods | |
CN115146062A (en) | Intelligent event analysis method and system fusing expert recommendation and text clustering | |
CN109543038A (en) | A kind of sentiment analysis method applied to text data | |
CN112561530A (en) | Transaction flow processing method and system based on multi-model fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Zhang Qiang Inventor after: Zhou Chengjie Inventor after: Che Chao Inventor before: Che Chao Inventor before: Zhou Chengjie Inventor before: Zhang Qiang |
|
CB03 | Change of inventor or designer information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |