CN111985204A - Customs import and export commodity tax number prediction method - Google Patents

Customs import and export commodity tax number prediction method Download PDF

Info

Publication number
CN111985204A
CN111985204A CN202010744808.6A CN202010744808A CN111985204A CN 111985204 A CN111985204 A CN 111985204A CN 202010744808 A CN202010744808 A CN 202010744808A CN 111985204 A CN111985204 A CN 111985204A
Authority
CN
China
Prior art keywords
commodity
network
import
export
customs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010744808.6A
Other languages
Chinese (zh)
Other versions
CN111985204B (en
Inventor
车超
周成杰
张强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN202010744808.6A priority Critical patent/CN111985204B/en
Publication of CN111985204A publication Critical patent/CN111985204A/en
Application granted granted Critical
Publication of CN111985204B publication Critical patent/CN111985204B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Accounting & Taxation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a customs import and export commodity tax number prediction method, which specifically comprises the following steps: step 1: preprocessing the customhouse import and export commodity text to obtain an element name and element content; step 2: splitting the element content obtained in the step 1, and then selecting differential elements by using an auxiliary network; and step 3: and (3) sending the difference elements obtained in the step (2) into a CNN network for feature extraction, and simultaneously extracting element name features and element content features by using a DPCNN network. And 4, step 4: and (4) fusing the difference element characteristics, the element name characteristics and the element content characteristics obtained in the step (3), and then performing classification operation to obtain the commodity tax number. By utilizing the special corpus resources of customs, the method realizes the tax number prediction of the customs import and export commodity text on the premise of short element characteristic dilution caused by reporting element length difference, and improves the accuracy of the tax number prediction.

Description

Customs import and export commodity tax number prediction method
Technical Field
The invention relates to the technical field of natural language processing, in particular to a customhouse import and export commodity tax number prediction method based on a hybrid convolutional neural network and an auxiliary network.
Background
Customs taxes are a major source of taxes in many countries. At present, China customs mainly uses manpower to examine the tax rate of import and export commodities, and can cover a small part of mass import and export commodities. The customs taxation is mainly based on the text information of the commodity, the natural language processing technology is used for classifying the commodity text, and the taxation is determined according to the category, so that the automation of tax risk prevention and control can be realized. Tax prediction for goods may be translated into a Chinese text classification problem.
The Chinese text classification refers to a process of automatically classifying and marking a text set (or other entities or objects) according to a certain classification system or standard. It finds the relation model between the document feature and the document category according to a labeled training document set, and then judges the category of the new document by using the relation model obtained by learning. Existing text classification is gradually changing from knowledge-based methods to statistical and machine learning-based methods. Many classification models achieve a relatively ideal effect on Chinese text classification tasks, and compared with common Chinese, a single text is linearly composed of a plurality of elements and has no continuous context semantics. At present, no attempt is made by people to perform a customhouse import and export declaration text classification task by using artificial intelligence, but the task is abstracted into the traditional text classification problem, a textCNN convolution model provided by Yoon Kim and labor can well extract text features, and text classification is performed by using feature combinations; according to the BERT model provided by Google, the precision of a text classification task is improved by utilizing large-scale pre-training corpora and huge model parameters. However, for the customs import and export declaration text classification task, due to the domain and the particularity of the customs text, the common model has poor effect performance on the customs commodity classification task.
Disclosure of Invention
The application aims to provide a customs import and export commodity tax number prediction method, by utilizing the language material resources special for customs, the tax number prediction of a customs import and export commodity text is realized on the premise that short element feature dilution is caused by reporting element length difference, and the accuracy of the tax number prediction is improved.
In order to achieve the purpose, the technical scheme of the application is as follows: a customs import and export commodity tax number prediction method specifically comprises the following steps:
step 1: preprocessing the customhouse import and export commodity text to obtain an element name and element content;
step 2: splitting the element content obtained in the step 1, and then selecting differential elements by using an auxiliary network;
and step 3: and (3) sending the difference elements obtained in the step (2) into a CNN network for feature extraction, and simultaneously extracting element name features and element content features by using a DPCNN network.
And 4, step 4: and (4) fusing the difference element characteristics, the element name characteristics and the element content characteristics obtained in the step (3), and then performing classification operation to obtain the commodity tax number.
Further, the specific implementation manner of step 2 is as follows:
step 21, gathering the obtained element contents together with data with the same commodity categories to form a paragraph;
step 22, calculating the number of commodity subclasses of each paragraph, and sending each paragraph into an auxiliary network according to the number of commodity subclasses, and performing classification training on the commodity subclasses; during the training of each paragraph, sequentially changing the element content into the element name in sequence to obtain the loss value of each element;
and 23, selecting the first 2 differential elements according to the loss values of the elements obtained from each paragraph from large to small.
Further, the specific implementation manner of step 3 is as follows:
step 31, sending the difference elements into a CNN network to extract features by utilizing the convolutional layer, and performing feature sparseness on the largest pooling layer;
step 32, sending the element names into a DPCNN network to extract features by utilizing a convolutional layer, and sampling the layer to compress the sequence length to enlarge the receptive field;
and 33, sending the content to be processed into the SSCNN network to extract shallow features by using the structured convolution layer.
Further, the specific implementation manner of step 4 is as follows:
step 41, splicing the difference element characteristics, the element name characteristics and the element content characteristics;
step 42, sending the spliced characteristics into a full connection layer of two layers, wherein the full connection layer is a network in which each node is connected with all nodes of the previous layer and is used for integrating the extracted characteristics; the first layer of output dimensionality is a commodity large-class number, the second layer of full-connection layer of output dimensionality is a commodity small-class number, and the large-class number and the small-class number are spliced together to obtain a commodity tax number.
Due to the adoption of the technical scheme, the invention can obtain the following technical effects: the method solves the problems of unobvious commodity distinctiveness under the same catalogue and dilution of short element characteristics caused by reporting of element length difference by fusing various convolution networks, utilizing the special corpus resources of customs and combining the characteristics of customs texts, enhances the independence and importance of shorter content elements in the overall characteristics, and improves the accuracy of customs import and export commodity tax number prediction.
Drawings
FIG. 1 is a flow chart of a method for predicting the commodity tax number of customs import and export.
Detailed Description
The invention is described in further detail below with reference to the following figures and specific examples: the present application is further described by taking this as an example.
Example 1
In the process of predicting the commodity tax number of customs import and export, the characteristic that each element has an innate boundary is well utilized, so that the semantic fusion degree among the elements is required to be reduced, and the characteristic which is sufficiently prominent is required to be extracted to obtain the correct tax number of the commodity. Based on the characteristics of customs texts and the problems in the customs import and export commodity tax number prediction task, referring to fig. 1, the application provides a customs import and export commodity tax number prediction method: firstly, data preprocessing is carried out on customhouse import and export commodity declaration texts, then word segmentation is carried out on the text data, and element names corresponding to the commodity declaration text element contents are found by looking up a declaration element catalog. And then finding out the decisive difference elements of the commodities under the same large class by utilizing an auxiliary convolutional network, and predicting the tax number of the commodity text by utilizing a mixed convolutional neural network. The mixed convolutional Neural Network uses three types of Convolution to process different commodity text contents, a common Convolutional Neural Network (CNN) is used for extracting features of different elements, Shallow Structured Convolution (SSCNN) is used for extracting features of the element contents, Deep Pyramid Convolution (DPCNN) is used for extracting features of the element names, the three types of features are spliced together and classified by using a full-connection Network, and then a commodity tax number is obtained. The method effectively solves the problems that commodities in the same category are difficult to distinguish and short element characteristics are diluted due to the fact that the difference of reported element lengths is too large in the problem of forecasting the tax number of the commodities in import and export of customs, and the accuracy rate of the method is remarkably improved compared with other mainstream deep learning methods at present.
The present invention is described in detail below with reference to examples and the accompanying drawings so that those skilled in the art can implement the invention by referring to the description.
In this embodiment, Pycharm is used as a development platform, and Python is used as a development language. The custom truth data is processed on 1400000 sentences of corpus. The specific process is as follows:
step 1: and preprocessing the customs import and export commodity text to obtain the element name and the element content.
Step 2: splitting the element content obtained in the step 1, and then performing differential element selection by using an auxiliary network, wherein the specific steps are as follows:
step 21: gathering the element contents obtained in the step 1 with data with the same commodity categories to form a paragraph; for example, the data:
data A "8412390000 | pneumatic actuator |43| converting pneumatic power to mechanical power | pneumatic valve | IngersolL RAND |94695194
Data B "8412310090 | ram air actuator |4|3| providing pneumatic linear force | HONEYWELL |676000141 for aircraft powertrain systems"
Both of these two item declaration records, the major category is "84123", and the fixed declaration elements are both "item category | brand type | export affordance | principle | use | brand | model number", so they are grouped in one paragraph.
Step 22: and calculating the number of commodity subclasses of each paragraph, and sending each paragraph into an auxiliary network according to the number of commodity subclasses, and performing classification training on the commodity subclasses. During the training of each paragraph, sequentially changing the element content into the element name in sequence to obtain the loss value of each element;
step 23: and selecting the first 2 differential elements according to the loss values of the elements obtained from each paragraph from large to small.
The difference elements obtained by the two data through the calculation of the auxiliary network are the commodity name and the principle respectively.
And step 3: and (3) utilizing the difference elements obtained in the step (2) to be sent into a CNN network to extract features, and simultaneously utilizing DPCNN and SSPCNN networks to respectively extract element names and element content features, wherein the method specifically comprises the following steps:
step 31: the difference elements are sent into a CNN network, features are extracted by utilizing the convolutional layers, and feature sparseness is carried out on the largest pooling layer;
step 32: sending the key element name into a DPCNN network to extract features by utilizing a convolution layer, and sampling the sequence length of the layer compression to expand the receptive field;
step 33: and sending the element content into an SSPCNN network to extract shallow features by utilizing the structured convolution layer.
For example, the above data, element names and element contents are kept unchanged and are sent to the respective convolutional neural network models to extract features, the data a sends the data "pneumatic actuator | converts pneumatic power into mechanical power", and the data B sends the data "ram air actuator | provides pneumatic linear force" to the TextCNN model to extract features.
And 4, step 4: and (4) fusing the difference element characteristics, the element name characteristics and the element content characteristics obtained in the step (3), and then performing classification operation.
Step 41: splicing the difference element characteristics, the element name characteristics and the element content characteristics;
step 42: and sending the spliced features into two full-connection layers for classification, wherein the output dimensionality of the first layer is a commodity large-class number, the output dimensionality of the second full-connection layer is a commodity small-class number, and splicing the large-class number and the small-class number together to obtain the commodity tax number.
For example, the data is obtained by selecting the target class with the highest probability as the final prediction class of the model according to the probabilities of all the classes.
According to the steps, the word segmentation effect is compared with the DPCNN model, the Transform model, the BERT model and the RoBERTA model. As can be seen from Table 1, the method provided by the invention is obviously superior to other methods in the aspects of classification accuracy, precision and F1 value.
TABLE 1 comparison of classification effect of different models for customs import and export commodities
Figure BDA0002607979150000071
Meanwhile, the invention also verifies the influence of different auxiliary networks on the final commodity classification. As shown in table 2, the TextCNN model selected by the auxiliary network in the present invention can greatly improve the accuracy of classification of customs import and export commodities.
TABLE 2 influence of different auxiliary networks on classification effect of customs import and export commodities
Figure BDA0002607979150000072
The above description is only for the purpose of creating a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the technical scope of the present invention.

Claims (4)

1. A customs import and export commodity tax number prediction method is characterized by specifically comprising the following steps:
step 1: preprocessing the customhouse import and export commodity text to obtain an element name and element content;
step 2: splitting the element content obtained in the step 1, and then selecting differential elements by using an auxiliary network;
and step 3: sending the difference elements obtained in the step (2) into a CNN network for feature extraction, and simultaneously extracting element name features and SSCNN network element content features by using a DPCNN network;
and 4, step 4: and (4) fusing the difference element characteristics, the element name characteristics and the element content characteristics obtained in the step (3), and then performing classification operation to obtain the commodity tax number.
2. The method for predicting the commodity tax number of customs import and export according to claim 1, wherein the step 2 is implemented in a way that:
step 21, gathering the obtained element contents together with data with the same commodity categories to form a paragraph;
step 22, calculating the number of commodity subclasses of each paragraph, and sending each paragraph into an auxiliary network according to the number of commodity subclasses, and performing classification training on the commodity subclasses; during the training of each paragraph, sequentially changing the element content into the element name in sequence to obtain the loss value of each element;
and 23, selecting the first 2 differential elements according to the loss values of the elements obtained from each paragraph from large to small.
3. The method for predicting the commodity tax number of customs import and export according to claim 1, wherein the step 3 is implemented in a manner that:
step 31, sending the difference elements into a CNN network to extract features by utilizing the convolutional layer, and performing feature sparseness on the largest pooling layer;
step 32, sending the element names into a DPCNN network to extract features by utilizing a convolutional layer, and sampling the layer to compress the sequence length to enlarge the receptive field;
and 33, sending the content to be processed into the SSCNN network to extract shallow features by using the structured convolution layer.
4. The method for predicting the commodity tax number of customs import and export according to claim 1, wherein the step 4 is implemented in a manner that:
step 41, splicing the difference element characteristics, the element name characteristics and the element content characteristics;
step 42, sending the spliced characteristics into a full-connection layer of two layers for integrating the extracted characteristics; the first layer of output dimensionality is a commodity large-class number, the second layer of full-connection layer of output dimensionality is a commodity small-class number, and the large-class number and the small-class number are spliced together to obtain a commodity tax number.
CN202010744808.6A 2020-07-29 2020-07-29 Method for predicting tax numbers of customs import and export commodities Active CN111985204B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010744808.6A CN111985204B (en) 2020-07-29 2020-07-29 Method for predicting tax numbers of customs import and export commodities

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010744808.6A CN111985204B (en) 2020-07-29 2020-07-29 Method for predicting tax numbers of customs import and export commodities

Publications (2)

Publication Number Publication Date
CN111985204A true CN111985204A (en) 2020-11-24
CN111985204B CN111985204B (en) 2023-06-02

Family

ID=73445564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010744808.6A Active CN111985204B (en) 2020-07-29 2020-07-29 Method for predicting tax numbers of customs import and export commodities

Country Status (1)

Country Link
CN (1) CN111985204B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343640A (en) * 2021-05-26 2021-09-03 南京大学 Customs clearance commodity HS code classification method and device
CN113705188A (en) * 2021-08-19 2021-11-26 大连大学 Intelligent evaluation method for customs import and export commodity specification declaration

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008536A (en) * 2013-11-04 2014-08-27 无锡金帆钻凿设备股份有限公司 Multi-focus noise image fusion method based on CS-CHMT and IDPCNN
CN109034154A (en) * 2018-07-23 2018-12-18 西安电子科技大学昆山创新研究院 The extraction and recognition methods of Invoice Seal duty paragraph
CN110175235A (en) * 2019-04-23 2019-08-27 苏宁易购集团股份有限公司 Intelligence commodity tax sorting code number method and system neural network based

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008536A (en) * 2013-11-04 2014-08-27 无锡金帆钻凿设备股份有限公司 Multi-focus noise image fusion method based on CS-CHMT and IDPCNN
CN109034154A (en) * 2018-07-23 2018-12-18 西安电子科技大学昆山创新研究院 The extraction and recognition methods of Invoice Seal duty paragraph
CN110175235A (en) * 2019-04-23 2019-08-27 苏宁易购集团股份有限公司 Intelligence commodity tax sorting code number method and system neural network based

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邱越;杨丽飞;王海仙;: "煤炭增碳剂商品归类化验关键点分析", 中国口岸科学技术, no. 01 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343640A (en) * 2021-05-26 2021-09-03 南京大学 Customs clearance commodity HS code classification method and device
CN113343640B (en) * 2021-05-26 2024-02-20 南京大学 Method and device for classifying customs commodity HS codes
CN113705188A (en) * 2021-08-19 2021-11-26 大连大学 Intelligent evaluation method for customs import and export commodity specification declaration
CN113705188B (en) * 2021-08-19 2023-06-06 大连大学 Intelligent evaluation method for customs import and export commodity specification declaration

Also Published As

Publication number Publication date
CN111985204B (en) 2023-06-02

Similar Documents

Publication Publication Date Title
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN107766371B (en) Text information classification method and device
CN111325029B (en) Text similarity calculation method based on deep learning integrated model
CN112347268A (en) Text-enhanced knowledge graph joint representation learning method and device
CN108595708A (en) A kind of exception information file classification method of knowledge based collection of illustrative plates
CN112434535B (en) Element extraction method, device, equipment and storage medium based on multiple models
CN113312501A (en) Construction method and device of safety knowledge self-service query system based on knowledge graph
CN111259153B (en) Attribute-level emotion analysis method of complete attention mechanism
US10706030B2 (en) Utilizing artificial intelligence to integrate data from multiple diverse sources into a data structure
CN110516239B (en) Segmentation pooling relation extraction method based on convolutional neural network
CN114896388A (en) Hierarchical multi-label text classification method based on mixed attention
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
CN116127090B (en) Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction
CN112925904B (en) Lightweight text classification method based on Tucker decomposition
CN113254507B (en) Intelligent construction and inventory method for data asset directory
CN110969023B (en) Text similarity determination method and device
CN111209362A (en) Address data analysis method based on deep learning
CN114881043B (en) Deep learning model-based legal document semantic similarity evaluation method and system
CN115827819A (en) Intelligent question and answer processing method and device, electronic equipment and storage medium
CN111985204A (en) Customs import and export commodity tax number prediction method
CN114564563A (en) End-to-end entity relationship joint extraction method and system based on relationship decomposition
CN116089610A (en) Label identification method and device based on industry knowledge
CN111831624A (en) Data table creating method and device, computer equipment and storage medium
CN111178080A (en) Named entity identification method and system based on structured information
CN113869054A (en) Deep learning-based electric power field project feature identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Zhang Qiang

Inventor after: Zhou Chengjie

Inventor after: Che Chao

Inventor before: Che Chao

Inventor before: Zhou Chengjie

Inventor before: Zhang Qiang

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant