CN111985204B - Method for predicting tax numbers of customs import and export commodities - Google Patents

Method for predicting tax numbers of customs import and export commodities Download PDF

Info

Publication number
CN111985204B
CN111985204B CN202010744808.6A CN202010744808A CN111985204B CN 111985204 B CN111985204 B CN 111985204B CN 202010744808 A CN202010744808 A CN 202010744808A CN 111985204 B CN111985204 B CN 111985204B
Authority
CN
China
Prior art keywords
commodity
network
tax
export
customs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010744808.6A
Other languages
Chinese (zh)
Other versions
CN111985204A (en
Inventor
张强
周成杰
车超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN202010744808.6A priority Critical patent/CN111985204B/en
Publication of CN111985204A publication Critical patent/CN111985204A/en
Application granted granted Critical
Publication of CN111985204B publication Critical patent/CN111985204B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Accounting & Taxation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a customs import and export commodity tax number prediction method, which specifically comprises the following steps: step 1: preprocessing the customs import and export commodity text to obtain element names and element contents; step 2: splitting the element content obtained in the step 1, and then utilizing an auxiliary network to select different elements; step 3: and (3) sending the differential elements obtained in the step (2) into a CNN network for feature extraction, and simultaneously extracting element name features and SSCNN network extract element content features by using the DPCNN network. Step 4: and (3) fusing the differential element characteristics, the element name characteristics and the element content characteristics obtained in the step (3), and then performing classification operation to obtain the commodity tax number. According to the method and the device, by utilizing the special corpus resource of customs, tax prediction is carried out on the commodity text of customs import and export on the premise that the feature dilution of the short elements is caused by the difference of the length of the declaration elements, and the accuracy of tax prediction is improved.

Description

Method for predicting tax numbers of customs import and export commodities
Technical Field
The invention relates to the technical field of natural language processing, in particular to a customs import and export commodity tax number prediction method based on a mixed convolution neural network and an auxiliary network.
Background
Customs tax is a major source of tax in many countries. At present, china customs mainly uses manpower to check tax rate of import and export commodities, and can cover a small part of massive import and export commodities. Since the main basis of customs tax collection is text information of commodities, the natural language processing technology is used for classifying the commodity text, and tax is determined according to the category, so that automation of tax risk prevention and control can be realized. Tax predictions for goods can be translated into a Chinese text classification problem.
Chinese text classification refers to the process of automatically classifying and marking a text set (or other entities or objects) according to a certain classification system or standard. According to a marked training document set, a relation model between document characteristics and document categories is found, and then category judgment is carried out on a new document by using the relation model obtained by learning. Existing text classification is gradually transitioning from knowledge-based methods to statistical and machine learning-based methods. Many classification models obtain ideal effects on Chinese text classification tasks, compared with common Chinese, customs import and export declaration texts, a single text is linearly composed of a plurality of elements, and continuous context semantics are not available. At present, no attempt is made to carry out customs import and export declaration text classification tasks by using artificial intelligence, but the text classification task is abstracted into a traditional text classification problem, and text characteristics can be well extracted by a text CNN convolution model proposed by Yoon Kim of a Massachu staff, and text classification is carried out by using characteristic combinations; the BERT model proposed by google utilizes large-scale pre-training corpus and huge model parameter quantity, so that the precision of text classification tasks is improved. However, for customs import and export declaration text classification tasks, the common model has poor effect on customs commodity classification tasks due to the territory and specificity of customs texts.
Disclosure of Invention
The purpose of the application is to provide a method for predicting tax numbers of customs import and export commodity, which utilizes the special corpus resource of customs to predict tax numbers of customs import and export commodity texts on the premise of diluting short element characteristics caused by the difference of the length of declaration elements, so that the accuracy rate of tax number prediction is improved.
In order to achieve the above purpose, the technical scheme of the application is as follows: a customs import and export commodity tax number prediction method specifically comprises the following steps:
step 1: preprocessing the customs import and export commodity text to obtain element names and element contents;
step 2: splitting the element content obtained in the step 1, and then utilizing an auxiliary network to select different elements;
step 3: and (3) sending the differential elements obtained in the step (2) into a CNN network for feature extraction, and simultaneously extracting element name features and SSCNN network extract element content features by using the DPCNN network.
Step 4: and (3) fusing the differential element characteristics, the element name characteristics and the element content characteristics obtained in the step (3), and then performing classification operation to obtain the commodity tax number.
Further, the specific implementation manner of the step 2 is as follows:
step 21, gathering the obtained element content and the data with the same commodity major class together to form a paragraph;
step 22, calculating how many commodity subclasses are in each paragraph, and sending each paragraph into an auxiliary network according to the commodity subclasses, and performing classification training on the commodity subclasses; when training each paragraph, sequentially changing element contents into element names according to the sequence to obtain a loss value of each element;
step 23, selecting the first 2 differential elements according to the order from the big to the small by using the loss value of each element obtained in each paragraph.
Further, the specific implementation manner of the step 3 is as follows:
step 31, sending the differential elements into a CNN network, extracting features by using a convolution layer, and carrying out feature sparsity by using a maximum pooling layer;
step 32, sending the element name into a DPCNN network, extracting features by using a convolution layer, and reducing the sequence length of a sampling layer to enlarge a receptive field;
step 33, the element content is sent into an SSCNN network to extract shallow features by using a structured convolution layer.
Further, the specific implementation manner of the step 4 is as follows:
step 41, splicing the different element characteristics, the element name characteristics and the element content characteristics;
step 42, sending the spliced characteristics into two layers of full-connection layers, wherein each node of the full-connection layers is a network connected with all nodes of the upper layer and is used for integrating the extracted characteristics; the first layer output dimension is commodity large class number, the second layer full-connection layer output dimension is commodity small class number, and the large class number and the small class number are spliced together to obtain commodity tax number.
By adopting the technical scheme, the invention can obtain the following technical effects: according to the invention, by integrating a plurality of convolution networks and utilizing the special corpus resource of customs and combining the characteristics of customs texts, the problems of unobvious commodity distinguishing property under the same objective and short element characteristic dilution caused by the difference of the length of the declaration elements are solved, the independence and the importance of the shorter content elements in the integral characteristics are enhanced, and the accuracy of customs import and export commodity tax number prediction is improved.
Drawings
FIG. 1 is a flow chart of a method for predicting tax numbers of customs import and export commodities.
Detailed Description
The invention is described in further detail below with reference to the attached drawings and to specific embodiments: this is taken as an example to describe the present application further.
Example 1
In the process of predicting tax numbers of articles at customs import and export, the characteristic that each element has congenital demarcations is utilized, so that the semantic fusion degree among the elements needs to be reduced, and enough outstanding features need to be extracted to obtain correct tax numbers of articles. Based on the characteristics of customs texts and the problems in the customs import and export commodity tax prediction task, referring to fig. 1, the application provides a customs import and export commodity tax prediction method: firstly, data preprocessing is carried out on commodity declaration texts of customs import and export, then word segmentation is carried out on the text data, and element names corresponding to the commodity declaration text element content are found out by referring to a declaration element catalog. And then, using an auxiliary convolution network to find out decisive difference elements of commodities under the same large class, and using a mixed convolution neural network to predict tax numbers for commodity texts. The mixed convolution neural network processes different commodity text contents by using three convolutions, performs feature extraction on differential elements by using a common convolution neural network (Convolution Neural Network, CNN), performs feature extraction on element contents by using shallow structured convolutions (Shallow Structured Convolution Neural Network, SSCNN), performs feature extraction on element names by using deep pyramid convolutions (Deep Pyramid Convolution Neural Network, DPCNN), and performs classification by using a fully connected network after the three feature extraction, thereby obtaining commodity tax numbers. The method effectively solves the problems that commodities in the same general class are difficult to distinguish and short element characteristics are diluted due to the fact that the length difference of declaration elements is too large in the customs import and export commodity tax number prediction problem, and the accuracy is remarkably improved compared with other mainstream deep learning methods at present.
The present invention will be described in detail below with reference to examples and drawings so as to enable one of ordinary skill in the art to practice the same, with reference to the present description.
In this embodiment, pycharm is used as a development platform, and Python is used as a development language. The method is carried out on 1400000 sentence corpus of customs real data. The method comprises the following specific processes:
step 1: and preprocessing the customs import and export commodity text to obtain element names and element contents.
Step 2: splitting the element content obtained in the step 1, and then using an auxiliary network to perform differential element selection, wherein the method specifically comprises the following steps:
step 21: the element content obtained in the step 1 is gathered together with the data with the same commodity category to form a paragraph; for example, data:
data A, "8412390000 |pneumatic actuator|43| converts pneumatic power to mechanical power|InGERGOLLAND| 94695194" for pneumatic valve
Data B, "8412310090 ram air actuator |4|3| provides air pressure linear force |aircraft power system |HONEYWELL| 676000141'
The two commodity declaration records are mainly "84123", and the fixed declaration elements are all "commodity category |brand type|outlet sharing benefit condition|principle|use|brand|model", so that the commodity declaration records are gathered in one paragraph.
Step 22: and calculating how many commodity subclasses exist in each paragraph, and sending each paragraph into an auxiliary network according to the commodity subclasses, and performing classification training on the commodity subclasses. When training each paragraph, sequentially changing element contents into element names according to the sequence to obtain a loss value of each element;
step 23: the first 2 differential elements are selected in order from the largest to the smallest by using the loss values of the elements obtained in each paragraph.
The two pieces of data are calculated by an auxiliary network, and the obtained differential elements are a commodity name and a principle.
Step 3: sending the differential elements obtained in the step 2 into a CNN network to extract features, and simultaneously respectively extracting element names and element content features by using DPCNN and SSPCNN networks, wherein the method specifically comprises the following steps:
step 31: sending the difference elements into a CNN network, extracting features by using a convolution layer, and carrying out feature sparseness by using a maximum pooling layer;
step 32: sending the element name into a DPCNN network, extracting features by using a convolution layer, and reducing the sequence length of a layer of laminated sequence by a downsampled layer to enlarge a receptive field;
step 33: and (3) sending the element content into an SSPCNN network, and extracting shallow features by using a structured convolution layer.
For example, the above data, element names and element contents are kept from being transmitted to respective convolutional neural network models to extract features, the data A converts pneumatic actuator I into mechanical power, and the data B transmits ram air actuator I to provide pneumatic linear acting force to the textCNN model to extract features.
Step 4: and (3) fusing the differential element characteristics, the element name characteristics and the element content characteristics obtained in the step (3), and then performing classification operation.
Step 41: splicing the differential element characteristics, the element name characteristics and the element content characteristics;
step 42: and sending the spliced features into two layers of full-connection layers for classification, wherein the output dimension of the first layer is a commodity large-class number, the output dimension of the second layer is a commodity small-class number, and splicing the large-class number and the small-class number together to obtain the commodity tax number.
For example, the data is obtained according to the probability of all classifications, and the target class with the highest probability is selected as the final prediction class of the model.
According to the steps, the word segmentation effect is compared with DPCNN model, transform model, BERT model and RoBERTa model methods. As can be seen from table 1, the method proposed by the present invention is significantly superior to other methods in terms of classification accuracy, precision and F1 value.
Table 1 comparison of different models for customs import and export commodity classification effect
Figure BDA0002607979150000071
Meanwhile, the invention also verifies the influence of different auxiliary networks on the final commodity classification. As shown in Table 2, the accuracy of classification of customs import and export commodities can be greatly improved by adopting the textCNN model for the auxiliary network.
TABLE 2 influence of different auxiliary networks on the classification of customs import and export commodities
Figure BDA0002607979150000072
While the invention has been described with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (3)

1. A customs import and export commodity tax number prediction method is characterized by comprising the following steps:
step 1: preprocessing the customs import and export commodity text to obtain element names and element contents;
step 2: splitting the element content obtained in the step 1, and then utilizing an auxiliary network to select different elements;
step 3: sending the differential elements obtained in the step 2 into a CNN network for feature extraction, and simultaneously extracting element name features and SSCNN network content features by using the DPCNN network;
step 4: fusing the differential element characteristics, the element name characteristics and the element content characteristics obtained in the step 3, and then performing classification operation to obtain commodity tax numbers;
the specific implementation mode of the step 2 is as follows:
step 21, gathering the obtained element content and the data with the same commodity major class together to form a paragraph;
step 22, calculating how many commodity subclasses are in each paragraph, and sending each paragraph into an auxiliary network according to the commodity subclasses, and performing classification training on the commodity subclasses; when training each paragraph, sequentially changing element contents into element names according to the sequence to obtain a loss value of each element;
step 23, selecting the first 2 differential elements according to the order from the big to the small by using the loss value of each element obtained in each paragraph.
2. The customs import and export commodity tax prediction method according to claim 1, wherein the specific implementation manner of the step 3 is as follows:
step 31, sending the differential elements into a CNN network, extracting features by using a convolution layer, and carrying out feature sparsity by using a maximum pooling layer;
step 32, sending the element name into a DPCNN network, extracting features by using a convolution layer, and reducing the sequence length of a sampling layer to enlarge a receptive field;
step 33, the element content is sent into an SSCNN network to extract shallow features by using a structured convolution layer.
3. The customs import and export commodity tax prediction method according to claim 1, wherein the specific implementation manner of the step 4 is as follows:
step 41, splicing the different element characteristics, the element name characteristics and the element content characteristics;
step 42, sending the spliced features into two full-connection layers for integrating the extracted features; the first layer output dimension is commodity large class number, the second layer full-connection layer output dimension is commodity small class number, and the large class number and the small class number are spliced together to obtain commodity tax number.
CN202010744808.6A 2020-07-29 2020-07-29 Method for predicting tax numbers of customs import and export commodities Active CN111985204B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010744808.6A CN111985204B (en) 2020-07-29 2020-07-29 Method for predicting tax numbers of customs import and export commodities

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010744808.6A CN111985204B (en) 2020-07-29 2020-07-29 Method for predicting tax numbers of customs import and export commodities

Publications (2)

Publication Number Publication Date
CN111985204A CN111985204A (en) 2020-11-24
CN111985204B true CN111985204B (en) 2023-06-02

Family

ID=73445564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010744808.6A Active CN111985204B (en) 2020-07-29 2020-07-29 Method for predicting tax numbers of customs import and export commodities

Country Status (1)

Country Link
CN (1) CN111985204B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343640B (en) * 2021-05-26 2024-02-20 南京大学 Method and device for classifying customs commodity HS codes
CN113705188B (en) * 2021-08-19 2023-06-06 大连大学 Intelligent evaluation method for customs import and export commodity specification declaration

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008536A (en) * 2013-11-04 2014-08-27 无锡金帆钻凿设备股份有限公司 Multi-focus noise image fusion method based on CS-CHMT and IDPCNN
CN109034154A (en) * 2018-07-23 2018-12-18 西安电子科技大学昆山创新研究院 The extraction and recognition methods of Invoice Seal duty paragraph
CN110175235A (en) * 2019-04-23 2019-08-27 苏宁易购集团股份有限公司 Intelligence commodity tax sorting code number method and system neural network based

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008536A (en) * 2013-11-04 2014-08-27 无锡金帆钻凿设备股份有限公司 Multi-focus noise image fusion method based on CS-CHMT and IDPCNN
CN109034154A (en) * 2018-07-23 2018-12-18 西安电子科技大学昆山创新研究院 The extraction and recognition methods of Invoice Seal duty paragraph
CN110175235A (en) * 2019-04-23 2019-08-27 苏宁易购集团股份有限公司 Intelligence commodity tax sorting code number method and system neural network based

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
煤炭增碳剂商品归类化验关键点分析;邱越;杨丽飞;王海仙;;中国口岸科学技术(第01期);全文 *

Also Published As

Publication number Publication date
CN111985204A (en) 2020-11-24

Similar Documents

Publication Publication Date Title
CN109766277B (en) Software fault diagnosis method based on transfer learning and DNN
CN107766371B (en) Text information classification method and device
WO2018028077A1 (en) Deep learning based method and device for chinese semantics analysis
CN112347268A (en) Text-enhanced knowledge graph joint representation learning method and device
CN109753660B (en) LSTM-based winning bid web page named entity extraction method
CN108595708A (en) A kind of exception information file classification method of knowledge based collection of illustrative plates
CN112434535B (en) Element extraction method, device, equipment and storage medium based on multiple models
CN107122349A (en) A kind of feature word of text extracting method based on word2vec LDA models
CN114896388A (en) Hierarchical multi-label text classification method based on mixed attention
CN111259153B (en) Attribute-level emotion analysis method of complete attention mechanism
CN111460164B (en) Intelligent fault judging method for telecommunication work orders based on pre-training language model
CN112052684A (en) Named entity identification method, device, equipment and storage medium for power metering
CN111985204B (en) Method for predicting tax numbers of customs import and export commodities
CN108710894A (en) A kind of Active Learning mask method and device based on cluster representative point
CN112446215B (en) Entity relation joint extraction method
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
CN111966825A (en) Power grid equipment defect text classification method based on machine learning
CN112925904B (en) Lightweight text classification method based on Tucker decomposition
CN111143567A (en) Comment emotion analysis method based on improved neural network
CN116089610A (en) Label identification method and device based on industry knowledge
CN110969023A (en) Text similarity determination method and device
Li et al. Rethinking table structure recognition using sequence labeling methods
CN115146062A (en) Intelligent event analysis method and system fusing expert recommendation and text clustering
CN109543038A (en) A kind of sentiment analysis method applied to text data
CN112561530A (en) Transaction flow processing method and system based on multi-model fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Zhang Qiang

Inventor after: Zhou Chengjie

Inventor after: Che Chao

Inventor before: Che Chao

Inventor before: Zhou Chengjie

Inventor before: Zhang Qiang

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant