CN111985204A - Customs import and export commodity tax number prediction method - Google Patents
Customs import and export commodity tax number prediction method Download PDFInfo
- Publication number
- CN111985204A CN111985204A CN202010744808.6A CN202010744808A CN111985204A CN 111985204 A CN111985204 A CN 111985204A CN 202010744808 A CN202010744808 A CN 202010744808A CN 111985204 A CN111985204 A CN 111985204A
- Authority
- CN
- China
- Prior art keywords
- commodity
- network
- import
- export
- customs
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Business, Economics & Management (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Accounting & Taxation (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Finance (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a customs import and export commodity tax number prediction method, which specifically comprises the following steps: step 1: preprocessing the customhouse import and export commodity text to obtain an element name and element content; step 2: splitting the element content obtained in the step 1, and then selecting differential elements by using an auxiliary network; and step 3: and (3) sending the difference elements obtained in the step (2) into a CNN network for feature extraction, and simultaneously extracting element name features and element content features by using a DPCNN network. And 4, step 4: and (4) fusing the difference element characteristics, the element name characteristics and the element content characteristics obtained in the step (3), and then performing classification operation to obtain the commodity tax number. By utilizing the special corpus resources of customs, the method realizes the tax number prediction of the customs import and export commodity text on the premise of short element characteristic dilution caused by reporting element length difference, and improves the accuracy of the tax number prediction.
Description
Technical Field
The invention relates to the technical field of natural language processing, in particular to a customhouse import and export commodity tax number prediction method based on a hybrid convolutional neural network and an auxiliary network.
Background
Customs taxes are a major source of taxes in many countries. At present, China customs mainly uses manpower to examine the tax rate of import and export commodities, and can cover a small part of mass import and export commodities. The customs taxation is mainly based on the text information of the commodity, the natural language processing technology is used for classifying the commodity text, and the taxation is determined according to the category, so that the automation of tax risk prevention and control can be realized. Tax prediction for goods may be translated into a Chinese text classification problem.
The Chinese text classification refers to a process of automatically classifying and marking a text set (or other entities or objects) according to a certain classification system or standard. It finds the relation model between the document feature and the document category according to a labeled training document set, and then judges the category of the new document by using the relation model obtained by learning. Existing text classification is gradually changing from knowledge-based methods to statistical and machine learning-based methods. Many classification models achieve a relatively ideal effect on Chinese text classification tasks, and compared with common Chinese, a single text is linearly composed of a plurality of elements and has no continuous context semantics. At present, no attempt is made by people to perform a customhouse import and export declaration text classification task by using artificial intelligence, but the task is abstracted into the traditional text classification problem, a textCNN convolution model provided by Yoon Kim and labor can well extract text features, and text classification is performed by using feature combinations; according to the BERT model provided by Google, the precision of a text classification task is improved by utilizing large-scale pre-training corpora and huge model parameters. However, for the customs import and export declaration text classification task, due to the domain and the particularity of the customs text, the common model has poor effect performance on the customs commodity classification task.
Disclosure of Invention
The application aims to provide a customs import and export commodity tax number prediction method, by utilizing the language material resources special for customs, the tax number prediction of a customs import and export commodity text is realized on the premise that short element feature dilution is caused by reporting element length difference, and the accuracy of the tax number prediction is improved.
In order to achieve the purpose, the technical scheme of the application is as follows: a customs import and export commodity tax number prediction method specifically comprises the following steps:
step 1: preprocessing the customhouse import and export commodity text to obtain an element name and element content;
step 2: splitting the element content obtained in the step 1, and then selecting differential elements by using an auxiliary network;
and step 3: and (3) sending the difference elements obtained in the step (2) into a CNN network for feature extraction, and simultaneously extracting element name features and element content features by using a DPCNN network.
And 4, step 4: and (4) fusing the difference element characteristics, the element name characteristics and the element content characteristics obtained in the step (3), and then performing classification operation to obtain the commodity tax number.
Further, the specific implementation manner of step 2 is as follows:
step 21, gathering the obtained element contents together with data with the same commodity categories to form a paragraph;
step 22, calculating the number of commodity subclasses of each paragraph, and sending each paragraph into an auxiliary network according to the number of commodity subclasses, and performing classification training on the commodity subclasses; during the training of each paragraph, sequentially changing the element content into the element name in sequence to obtain the loss value of each element;
and 23, selecting the first 2 differential elements according to the loss values of the elements obtained from each paragraph from large to small.
Further, the specific implementation manner of step 3 is as follows:
step 31, sending the difference elements into a CNN network to extract features by utilizing the convolutional layer, and performing feature sparseness on the largest pooling layer;
step 32, sending the element names into a DPCNN network to extract features by utilizing a convolutional layer, and sampling the layer to compress the sequence length to enlarge the receptive field;
and 33, sending the content to be processed into the SSCNN network to extract shallow features by using the structured convolution layer.
Further, the specific implementation manner of step 4 is as follows:
step 41, splicing the difference element characteristics, the element name characteristics and the element content characteristics;
step 42, sending the spliced characteristics into a full connection layer of two layers, wherein the full connection layer is a network in which each node is connected with all nodes of the previous layer and is used for integrating the extracted characteristics; the first layer of output dimensionality is a commodity large-class number, the second layer of full-connection layer of output dimensionality is a commodity small-class number, and the large-class number and the small-class number are spliced together to obtain a commodity tax number.
Due to the adoption of the technical scheme, the invention can obtain the following technical effects: the method solves the problems of unobvious commodity distinctiveness under the same catalogue and dilution of short element characteristics caused by reporting of element length difference by fusing various convolution networks, utilizing the special corpus resources of customs and combining the characteristics of customs texts, enhances the independence and importance of shorter content elements in the overall characteristics, and improves the accuracy of customs import and export commodity tax number prediction.
Drawings
FIG. 1 is a flow chart of a method for predicting the commodity tax number of customs import and export.
Detailed Description
The invention is described in further detail below with reference to the following figures and specific examples: the present application is further described by taking this as an example.
Example 1
In the process of predicting the commodity tax number of customs import and export, the characteristic that each element has an innate boundary is well utilized, so that the semantic fusion degree among the elements is required to be reduced, and the characteristic which is sufficiently prominent is required to be extracted to obtain the correct tax number of the commodity. Based on the characteristics of customs texts and the problems in the customs import and export commodity tax number prediction task, referring to fig. 1, the application provides a customs import and export commodity tax number prediction method: firstly, data preprocessing is carried out on customhouse import and export commodity declaration texts, then word segmentation is carried out on the text data, and element names corresponding to the commodity declaration text element contents are found by looking up a declaration element catalog. And then finding out the decisive difference elements of the commodities under the same large class by utilizing an auxiliary convolutional network, and predicting the tax number of the commodity text by utilizing a mixed convolutional neural network. The mixed convolutional Neural Network uses three types of Convolution to process different commodity text contents, a common Convolutional Neural Network (CNN) is used for extracting features of different elements, Shallow Structured Convolution (SSCNN) is used for extracting features of the element contents, Deep Pyramid Convolution (DPCNN) is used for extracting features of the element names, the three types of features are spliced together and classified by using a full-connection Network, and then a commodity tax number is obtained. The method effectively solves the problems that commodities in the same category are difficult to distinguish and short element characteristics are diluted due to the fact that the difference of reported element lengths is too large in the problem of forecasting the tax number of the commodities in import and export of customs, and the accuracy rate of the method is remarkably improved compared with other mainstream deep learning methods at present.
The present invention is described in detail below with reference to examples and the accompanying drawings so that those skilled in the art can implement the invention by referring to the description.
In this embodiment, Pycharm is used as a development platform, and Python is used as a development language. The custom truth data is processed on 1400000 sentences of corpus. The specific process is as follows:
step 1: and preprocessing the customs import and export commodity text to obtain the element name and the element content.
Step 2: splitting the element content obtained in the step 1, and then performing differential element selection by using an auxiliary network, wherein the specific steps are as follows:
step 21: gathering the element contents obtained in the step 1 with data with the same commodity categories to form a paragraph; for example, the data:
data A "8412390000 | pneumatic actuator |43| converting pneumatic power to mechanical power | pneumatic valve | IngersolL RAND |94695194
Data B "8412310090 | ram air actuator |4|3| providing pneumatic linear force | HONEYWELL |676000141 for aircraft powertrain systems"
Both of these two item declaration records, the major category is "84123", and the fixed declaration elements are both "item category | brand type | export affordance | principle | use | brand | model number", so they are grouped in one paragraph.
Step 22: and calculating the number of commodity subclasses of each paragraph, and sending each paragraph into an auxiliary network according to the number of commodity subclasses, and performing classification training on the commodity subclasses. During the training of each paragraph, sequentially changing the element content into the element name in sequence to obtain the loss value of each element;
step 23: and selecting the first 2 differential elements according to the loss values of the elements obtained from each paragraph from large to small.
The difference elements obtained by the two data through the calculation of the auxiliary network are the commodity name and the principle respectively.
And step 3: and (3) utilizing the difference elements obtained in the step (2) to be sent into a CNN network to extract features, and simultaneously utilizing DPCNN and SSPCNN networks to respectively extract element names and element content features, wherein the method specifically comprises the following steps:
step 31: the difference elements are sent into a CNN network, features are extracted by utilizing the convolutional layers, and feature sparseness is carried out on the largest pooling layer;
step 32: sending the key element name into a DPCNN network to extract features by utilizing a convolution layer, and sampling the sequence length of the layer compression to expand the receptive field;
step 33: and sending the element content into an SSPCNN network to extract shallow features by utilizing the structured convolution layer.
For example, the above data, element names and element contents are kept unchanged and are sent to the respective convolutional neural network models to extract features, the data a sends the data "pneumatic actuator | converts pneumatic power into mechanical power", and the data B sends the data "ram air actuator | provides pneumatic linear force" to the TextCNN model to extract features.
And 4, step 4: and (4) fusing the difference element characteristics, the element name characteristics and the element content characteristics obtained in the step (3), and then performing classification operation.
Step 41: splicing the difference element characteristics, the element name characteristics and the element content characteristics;
step 42: and sending the spliced features into two full-connection layers for classification, wherein the output dimensionality of the first layer is a commodity large-class number, the output dimensionality of the second full-connection layer is a commodity small-class number, and splicing the large-class number and the small-class number together to obtain the commodity tax number.
For example, the data is obtained by selecting the target class with the highest probability as the final prediction class of the model according to the probabilities of all the classes.
According to the steps, the word segmentation effect is compared with the DPCNN model, the Transform model, the BERT model and the RoBERTA model. As can be seen from Table 1, the method provided by the invention is obviously superior to other methods in the aspects of classification accuracy, precision and F1 value.
TABLE 1 comparison of classification effect of different models for customs import and export commodities
Meanwhile, the invention also verifies the influence of different auxiliary networks on the final commodity classification. As shown in table 2, the TextCNN model selected by the auxiliary network in the present invention can greatly improve the accuracy of classification of customs import and export commodities.
TABLE 2 influence of different auxiliary networks on classification effect of customs import and export commodities
The above description is only for the purpose of creating a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the technical scope of the present invention.
Claims (4)
1. A customs import and export commodity tax number prediction method is characterized by specifically comprising the following steps:
step 1: preprocessing the customhouse import and export commodity text to obtain an element name and element content;
step 2: splitting the element content obtained in the step 1, and then selecting differential elements by using an auxiliary network;
and step 3: sending the difference elements obtained in the step (2) into a CNN network for feature extraction, and simultaneously extracting element name features and SSCNN network element content features by using a DPCNN network;
and 4, step 4: and (4) fusing the difference element characteristics, the element name characteristics and the element content characteristics obtained in the step (3), and then performing classification operation to obtain the commodity tax number.
2. The method for predicting the commodity tax number of customs import and export according to claim 1, wherein the step 2 is implemented in a way that:
step 21, gathering the obtained element contents together with data with the same commodity categories to form a paragraph;
step 22, calculating the number of commodity subclasses of each paragraph, and sending each paragraph into an auxiliary network according to the number of commodity subclasses, and performing classification training on the commodity subclasses; during the training of each paragraph, sequentially changing the element content into the element name in sequence to obtain the loss value of each element;
and 23, selecting the first 2 differential elements according to the loss values of the elements obtained from each paragraph from large to small.
3. The method for predicting the commodity tax number of customs import and export according to claim 1, wherein the step 3 is implemented in a manner that:
step 31, sending the difference elements into a CNN network to extract features by utilizing the convolutional layer, and performing feature sparseness on the largest pooling layer;
step 32, sending the element names into a DPCNN network to extract features by utilizing a convolutional layer, and sampling the layer to compress the sequence length to enlarge the receptive field;
and 33, sending the content to be processed into the SSCNN network to extract shallow features by using the structured convolution layer.
4. The method for predicting the commodity tax number of customs import and export according to claim 1, wherein the step 4 is implemented in a manner that:
step 41, splicing the difference element characteristics, the element name characteristics and the element content characteristics;
step 42, sending the spliced characteristics into a full-connection layer of two layers for integrating the extracted characteristics; the first layer of output dimensionality is a commodity large-class number, the second layer of full-connection layer of output dimensionality is a commodity small-class number, and the large-class number and the small-class number are spliced together to obtain a commodity tax number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010744808.6A CN111985204B (en) | 2020-07-29 | 2020-07-29 | Method for predicting tax numbers of customs import and export commodities |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010744808.6A CN111985204B (en) | 2020-07-29 | 2020-07-29 | Method for predicting tax numbers of customs import and export commodities |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111985204A true CN111985204A (en) | 2020-11-24 |
CN111985204B CN111985204B (en) | 2023-06-02 |
Family
ID=73445564
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010744808.6A Active CN111985204B (en) | 2020-07-29 | 2020-07-29 | Method for predicting tax numbers of customs import and export commodities |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111985204B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113343640A (en) * | 2021-05-26 | 2021-09-03 | 南京大学 | Customs clearance commodity HS code classification method and device |
CN113705188A (en) * | 2021-08-19 | 2021-11-26 | 大连大学 | Intelligent evaluation method for customs import and export commodity specification declaration |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104008536A (en) * | 2013-11-04 | 2014-08-27 | 无锡金帆钻凿设备股份有限公司 | Multi-focus noise image fusion method based on CS-CHMT and IDPCNN |
CN109034154A (en) * | 2018-07-23 | 2018-12-18 | 西安电子科技大学昆山创新研究院 | The extraction and recognition methods of Invoice Seal duty paragraph |
CN110175235A (en) * | 2019-04-23 | 2019-08-27 | 苏宁易购集团股份有限公司 | Intelligence commodity tax sorting code number method and system neural network based |
-
2020
- 2020-07-29 CN CN202010744808.6A patent/CN111985204B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104008536A (en) * | 2013-11-04 | 2014-08-27 | 无锡金帆钻凿设备股份有限公司 | Multi-focus noise image fusion method based on CS-CHMT and IDPCNN |
CN109034154A (en) * | 2018-07-23 | 2018-12-18 | 西安电子科技大学昆山创新研究院 | The extraction and recognition methods of Invoice Seal duty paragraph |
CN110175235A (en) * | 2019-04-23 | 2019-08-27 | 苏宁易购集团股份有限公司 | Intelligence commodity tax sorting code number method and system neural network based |
Non-Patent Citations (1)
Title |
---|
邱越;杨丽飞;王海仙;: "煤炭增碳剂商品归类化验关键点分析", 中国口岸科学技术, no. 01 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113343640A (en) * | 2021-05-26 | 2021-09-03 | 南京大学 | Customs clearance commodity HS code classification method and device |
CN113343640B (en) * | 2021-05-26 | 2024-02-20 | 南京大学 | Method and device for classifying customs commodity HS codes |
CN113705188A (en) * | 2021-08-19 | 2021-11-26 | 大连大学 | Intelligent evaluation method for customs import and export commodity specification declaration |
CN113705188B (en) * | 2021-08-19 | 2023-06-06 | 大连大学 | Intelligent evaluation method for customs import and export commodity specification declaration |
Also Published As
Publication number | Publication date |
---|---|
CN111985204B (en) | 2023-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110298037B (en) | Convolutional neural network matching text recognition method based on enhanced attention mechanism | |
CN107766371B (en) | Text information classification method and device | |
CN111325029B (en) | Text similarity calculation method based on deep learning integrated model | |
CN112347268A (en) | Text-enhanced knowledge graph joint representation learning method and device | |
CN108595708A (en) | A kind of exception information file classification method of knowledge based collection of illustrative plates | |
CN112434535B (en) | Element extraction method, device, equipment and storage medium based on multiple models | |
CN113312501A (en) | Construction method and device of safety knowledge self-service query system based on knowledge graph | |
CN111259153B (en) | Attribute-level emotion analysis method of complete attention mechanism | |
US10706030B2 (en) | Utilizing artificial intelligence to integrate data from multiple diverse sources into a data structure | |
CN110516239B (en) | Segmentation pooling relation extraction method based on convolutional neural network | |
CN114896388A (en) | Hierarchical multi-label text classification method based on mixed attention | |
CN113051914A (en) | Enterprise hidden label extraction method and device based on multi-feature dynamic portrait | |
CN116127090B (en) | Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction | |
CN112925904B (en) | Lightweight text classification method based on Tucker decomposition | |
CN113254507B (en) | Intelligent construction and inventory method for data asset directory | |
CN110969023B (en) | Text similarity determination method and device | |
CN111209362A (en) | Address data analysis method based on deep learning | |
CN114881043B (en) | Deep learning model-based legal document semantic similarity evaluation method and system | |
CN115827819A (en) | Intelligent question and answer processing method and device, electronic equipment and storage medium | |
CN111985204A (en) | Customs import and export commodity tax number prediction method | |
CN114564563A (en) | End-to-end entity relationship joint extraction method and system based on relationship decomposition | |
CN116089610A (en) | Label identification method and device based on industry knowledge | |
CN111831624A (en) | Data table creating method and device, computer equipment and storage medium | |
CN111178080A (en) | Named entity identification method and system based on structured information | |
CN113869054A (en) | Deep learning-based electric power field project feature identification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Zhang Qiang Inventor after: Zhou Chengjie Inventor after: Che Chao Inventor before: Che Chao Inventor before: Zhou Chengjie Inventor before: Zhang Qiang |
|
CB03 | Change of inventor or designer information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |