CN112434691A - HS code matching and displaying method and system based on intelligent analysis and identification and storage medium - Google Patents

HS code matching and displaying method and system based on intelligent analysis and identification and storage medium Download PDF

Info

Publication number
CN112434691A
CN112434691A CN202011404276.8A CN202011404276A CN112434691A CN 112434691 A CN112434691 A CN 112434691A CN 202011404276 A CN202011404276 A CN 202011404276A CN 112434691 A CN112434691 A CN 112434691A
Authority
CN
China
Prior art keywords
text
judged
hierarchy
recognition
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011404276.8A
Other languages
Chinese (zh)
Inventor
张东峰
冯玉静
陆欢旺
万晓磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sandao Intelligent Technology Co ltd
Original Assignee
Shanghai Sandao Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sandao Intelligent Technology Co ltd filed Critical Shanghai Sandao Intelligent Technology Co ltd
Priority to CN202011404276.8A priority Critical patent/CN112434691A/en
Publication of CN112434691A publication Critical patent/CN112434691A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the technical field of form generation, and discloses an HS code matching and displaying method, a system and a storage medium based on intelligent analysis and identification, wherein the method comprises the following steps: acquiring an object to be judged; correcting the imaging problem; detecting a text in an object to be judged; recognizing the text content; extracting required fields and/or elements from the text recognition result to generate object description information to be judged; judging the category of the object to be judged according to the acquired description information of the object to be judged and the pre-trained atlas data, and performing entity link with the atlas data; and (3) training the pre-trained map data according to the provided HS coding document data by combining a semantic library to generate a model, and continuously learning and optimizing an AI algorithm through external data feedback. The method and the system can meet the requirement that an intelligent search knowledge engine in the customs declaration and pre-classification business field can rapidly acquire knowledge, and can accurately correspond to various columns by combining character recognition.

Description

HS code matching and displaying method and system based on intelligent analysis and identification and storage medium
Technical Field
The application relates to the technical field of form generation, in particular to an HS code matching and displaying method, system and storage medium based on intelligent analysis and recognition.
Background
The HS code (unified international commodity classification code) is a unified standard for quantitatively managing entry and exit accounts and return tax rates of various products by a code coordination system established by the international customs administration.
HS coding is a wide variety of classes, including 22 major classes, 96 major class sections, and tens of thousands of sub-classes in total. Information that can be used in relation to the classification of HS codes at clearance is mainly the name of the product and the specification of the product (i.e., declaration elements), and ten thousand codes are used for the total number. Therefore, when the user has knowledge requirement, it is considered to search the specific domain knowledge base for the answer of the corresponding question. At present, the knowledge in the field of customs declaration and classification is relatively mature, but a relatively laggard electronic document mode is adopted in the aspects of knowledge representation, organization, management and the like, the relation among the knowledge is not well established, and the problem of information isolated island exists among the knowledge. Meanwhile, in the pre-classification process, a plurality of classification systems have certain similarity, the same question may have similar answers in a plurality of chapters, and the answers are difficult to be accurately matched in the traditional database search. In addition, the atlas needs to manually input commodity information, the system classifies the commodity information according to the input information and fills the existing information into the column corresponding to the declaration element, so that the manually input commodity information is often incomplete, and the accuracy of the result given by the system cannot be kept on a higher horizontal line.
Disclosure of Invention
In order to build an intelligent search knowledge engine capable of meeting the requirements of rapidly acquiring knowledge in the field of customs declaration and classification services, simultaneously meet knowledge management and maintenance knowledge base of knowledge intelligent updating requirements, and improve the accuracy of output results, the application provides a method, a system and a storage medium for HS code matching and display based on intelligent analysis and recognition.
In a first aspect, the present application provides a method for matching and displaying HS codes based on intelligent parsing and recognition, including:
acquiring objects to be judged, including a picture class and a non-picture class, converting the non-picture class into a picture format, and storing the non-picture class and the picture class file in a unified manner;
analyzing the file, and analyzing the type and format of the object to be determined;
image preprocessing, namely correcting the image imaging problem of the object to be judged;
character detection, detecting the position, range and layout of a text in an object to be judged;
character recognition, namely recognizing the text content on the basis of text detection;
text extraction, namely extracting required fields and/or elements from a text recognition result to generate object description information to be judged;
judging the category of the object to be judged according to the acquired description information of the object to be judged and the pre-trained atlas data, and performing entity link with the atlas data;
the pre-trained atlas data is combined with semantic library training to generate a model according to provided HS coding document data, and an AI algorithm is continuously learned and optimized through external data feedback.
By adopting the technical scheme, model training based on deep learning is assisted by natural language processing, a database is not simply called, classification of new commodities (commodities which do not appear in the database) can be achieved, meanwhile, through external data feedback, an AI algorithm can continuously learn by self to grow up, atlas data can grow along with time, a user can use the knowledge management system, the knowledge management system can continuously learn by self, optimize and become stronger, finally, an intelligent search knowledge engine which can meet the customs declaration and pre-classification business field is built to achieve the requirement of rapidly acquiring knowledge, and meanwhile, knowledge management and a knowledge base maintenance requirements of knowledge intelligent updating are met. In addition, the method can accurately correspond to various columns by combining character recognition, so that the condition that a client judges information inaccurately and provides wrong information is avoided.
In some embodiments, the image pre-processing comprises:
inputting an image of a file to be processed into a pre-trained image correction network for geometric change and/or distortion correction to obtain a corrected first target image;
performing small-angle correction on the first target image through a CV algorithm and an affine transformation matrix to obtain a second target image;
removing the blur of the second target image through a denoising algorithm to obtain a third target image;
and carrying out binarization processing on the third target image to obtain a binarized image.
In some embodiments, the text detection comprises:
inputting the binary image into a pre-trained feature extraction network;
extracting output information of at least two convolution layers in the feature extraction network, and fusing the output information;
inputting the fused information into a full connection layer in the feature extraction network, and outputting 2k vertical direction coordinates and coordinate scores of k anchors corresponding to the text region of the binary image and k boundary regression results to realize text positioning and obtain a rectangular text box.
In some embodiments of the present invention, the substrate is,
the character recognition comprises the following steps: performing character recognition on text contents in the rectangular text box through a pre-trained character recognition network to acquire text content information;
the text extraction comprises:
generating a basic semantic analysis engine based on a preset semantic database, wherein the semantic database comprises a field basic corpus, a field dictionary and a field knowledge map;
performing field analysis processing on the text content information based on a basic semantic analysis engine;
extracting the required fields and/or elements in the text content based on the extraction requirement extraction data set.
In some embodiments, determining the category of the object to be determined according to the acquired description information of the object to be determined and the pre-trained atlas data comprises:
classifying the object of the class to be determined into classes;
judging the hierarchy category corresponding to each hierarchy from top to bottom according to the obtained object description information and a pre-trained hierarchy classification model corresponding to the hierarchy;
to a unique entity in the profile data.
In some embodiments, the hierarchical classification model corresponding to each hierarchy is trained by:
selecting a training sample, and extracting characteristic contents of description information of the sample as a query statement;
and matching the extracted query sentences and the corresponding hierarchy categories to train and obtain a hierarchy classification model corresponding to each hierarchy.
In some embodiments, the method further comprises training a hierarchical classification model corresponding to each hierarchy in the following manner:
extracting characteristic content of the determined object description information as a query statement based on the object description information to be determined;
and matching the extracted query sentences and the corresponding hierarchy categories to train and obtain a hierarchy classification model corresponding to each hierarchy.
In some embodiments, determining the hierarchy type corresponding to the hierarchy includes calculating a degree of matching based on ranking learning and semantic features, and performing search ranking.
In a second aspect, the present application provides an HS code matching and displaying system based on intelligent parsing and recognition, including:
the acquisition unit is used for acquiring a file to be processed;
the file analysis unit is used for receiving the file to be processed and analyzing the type and the format of the file to be processed;
the image preprocessing unit is used for correcting the image imaging problem of the analyzed file to be processed;
the character detection unit is used for detecting the position, the range and the layout of the text in the file to be processed on the basis of correcting the image imaging problem;
the character recognition unit is used for recognizing the text content on the basis of text detection;
the text extraction unit extracts required fields and/or elements from the text recognition result and generates object description information to be judged;
the judging unit is used for judging the category of the object to be judged according to the acquired object description information to be judged and the pre-trained atlas data;
the display unit is used for displaying the judgment result of the judgment unit; and the number of the first and second groups,
the device comprises a memory and a processor, wherein the memory is stored with a computer program which can be loaded by the processor and executes the HS code matching and displaying method based on intelligent analysis and recognition.
In a third aspect, the present application provides a computer-readable storage medium, which stores a computer program that can be loaded by a processor and execute the above HS code matching and displaying method based on intelligent parsing and recognition.
In summary, the HS code matching and displaying method, system and storage medium based on intelligent parsing and recognition provided by the present application include at least one of the following beneficial technical effects:
1. after the knowledge graph is constructed, automatically extracting a query path to generate a corresponding template, and calculating the matching degree with the problem based on the characteristics and technologies such as a sequencing learning algorithm, semantic characteristics and the characteristics (popularity) of the knowledge graph, so that a database query sentence is generated, and manual writing rules are reduced;
2. the method is based on deep learning model training, semantic recognition and result searching are carried out on the basis of natural language processing, entity disambiguation and wrongly written word error correction capability, a database is not simply called, and new commodities (commodities which do not appear in the database) can be classified;
3. according to the provided document data, a Chinese semantic library and a training model are combined to generate map data, meanwhile, through external data feedback, an AI algorithm can continuously learn and grow by self, and map data can be continuously learned and optimized by a user along with the increase of time;
4. through character recognition, data can be simply uploaded without considering input specific contents when information is input, so that the difficulty is reduced;
5. the system has clear labels for classification through the information acquired by character recognition, can accurately correspond to various columns, and avoids inaccurate judgment of the client on the information and wrong information provision;
6. the range of information acquisition is improved, and the possibility of acquiring complete information is improved.
Drawings
Fig. 1 is a block diagram of a structure of an HS code matching and displaying system based on intelligent parsing and recognition provided in the present application.
In the figure, 1, an acquisition unit; 2. a file parsing unit; 3. an image preprocessing unit; 4. a character detection unit; 5. a character recognition unit; 6. a text extraction unit; 7. a determination unit; 8. a display unit; 9. a memory; 10. a processor.
Detailed Description
The present application is described in further detail below with reference to the attached drawings.
The embodiment of the application provides an HS code matching and displaying method, system and storage medium based on intelligent analysis and identification.
The application discloses an HS code matching and displaying method based on intelligent analysis and identification, which comprises the following steps:
the method comprises the steps of obtaining an object to be judged, wherein a file to be processed comprises a photo class and a non-photo class, the non-photo class comprises a photocopy and a PDF file, meanwhile, the non-photo class is converted into a picture format and is stored together with the photo class file, the input file to be processed is stored in a file library at the same time, and model training is carried out based on manual marking so as to obtain an image correction network, a feature extraction network, a character recognition network and a deep learning extraction data set.
In the embodiment of the application, the file analysis supports the processing of files with JPG, PNG, TIF and PDF formats.
Image preprocessing, namely correcting the image imaging problem of the file to be processed; the method specifically comprises the following steps:
inputting the image of the file to be processed into a pre-trained image correction network for geometric change and/or distortion correction to obtain a corrected first target image, namely:
regressing the network parameters of the space transformation corresponding to the first target image by utilizing a positioning network in the image correction network;
calculating the position of a pixel point in the corrected first target image in the first target image by using a grid generator in the image correction network and the network parameters;
outputting the corrected first target image by using a sampler in the image correction network and the calculated position;
then, the user can use the device to perform the operation,
performing small-angle correction on the first target image through a CV algorithm and an affine transformation matrix to obtain a second target image;
removing the blur of the second target image through a denoising algorithm to obtain a third target image;
and carrying out binarization processing on the third target image to obtain a binarized image.
After image preprocessing, the following steps are carried out:
the method comprises the following steps of character detection, wherein the position, the range and the layout of a text in a file to be processed are detected, the layout analysis, the character line detection and the like are generally included, and the character detection mainly solves the problems of where characters exist and how large the range of the characters exists. The method comprises the following specific steps:
inputting the binary image into a pre-trained feature extraction network;
extracting output information of at least two convolution layers in the feature extraction network, and fusing the output information;
inputting the fused information into a full-connection layer in the feature extraction network, and outputting 2k vertical direction coordinates and coordinate scores of k anchors corresponding to the text region of the binarized image and k boundary regression results to realize text positioning and obtain a rectangular text box;
the processing algorithm adopted by the character detection comprises the following steps: fast-RCNN, Mask-RCNN, FPN, PANET, Unet, IoUNet, YOLO, SSD.
Then the step of character recognition is entered,
the character recognition is used for recognizing the text content on the basis of character detection, and the problem mainly solved by the character recognition is what each character is. In this embodiment of the present application, character recognition is performed on text contents in a rectangular text box through a pre-trained character recognition network to obtain text content information, and a processing algorithm adopted in the method includes: CRNN, AttentionOCR, RNNLM, BERT;
and then extracting required fields and/or elements from the text recognition result through text extraction, wherein the required fields and/or elements comprise:
generating a basic semantic analysis engine based on a preset semantic database, wherein the semantic database comprises a field basic corpus, a field dictionary and a field knowledge map;
performing field analysis processing on the text content information based on a basic semantic analysis engine;
extracting required fields and/or elements in text content from a data set based on extraction requirements, wherein the extraction requirements comprise: sequence labeling extraction, deep learning extraction and table extraction,
the processing algorithm adopted by the text extraction comprises the following steps: CRF, HMM, HAN, DPCNN, BilSTM + CRF, BERT + CRF, Regex.
And finally, outputting the result and generating the object description information to be judged.
According to the obtained description information of the object to be judged and the pre-trained map data, the type of the object to be judged is judged, the object to be judged is in entity link with the map data, the pre-trained map data is combined with the semantic library training to generate a model according to the provided HS coding document data, and the AI algorithm is continuously learned and optimized through external data feedback.
The method for judging the type of the object to be judged according to the acquired description information of the object to be judged and the pre-trained atlas data comprises the following steps:
classifying the classes of the objects of the classes to be judged;
according to the divided levels from top to bottom, according to the acquired object description information and a pre-trained level classification model corresponding to the levels, the level categories corresponding to the levels are judged, the judgment of the level categories corresponding to the levels comprises the steps of calculating the matching degree based on sequencing learning and semantic features, and searching and sequencing, and specifically comprises the following steps:
static embedded vectors of object description information to be judged and a query path based on characters and words and cosine similarity of the static embedded vectors;
the object description information to be determined is related to the context of the query path, such as the embedded vector and the cosine similarity;
the similarity between the object description information to be judged and the query path is based on the Jaccard similarity of characters and words;
the object description information to be judged and the query path are based on the levenstein similarity of the characters and the words;
querying a graph embedding vector of the subgraph;
and based on the judgment characteristics, searching and sequencing are carried out by adopting a sequencing learning algorithm.
In this embodiment of the present application, the capability of entity disambiguation and error correction is also included: for the comments in the question, based on the knowledge graph and the text similarity model, error correction, entity disambiguation and entity linking are performed in a unified manner.
In addition, when the object description information to be determined is physically linked with the map data, the following features are mined:
semantic similarity between the entity name and object description information to be determined;
semantic similarity between the entity two-degree subgraph and object description information to be judged;
semantic similarity between the entity type and the object description information to be determined;
the occurrence frequency and relationship types of the entity in the knowledge graph;
literal features of ention in question;
based on the above features, a recall ranking is performed using a ranking learning algorithm, and finally linked to a unique entity in the atlas data.
In this embodiment of the present application, the hierarchical classification model corresponding to each hierarchical level is obtained by training in the following manner:
selecting a training sample, extracting the characteristic content of the description information of the sample as a query statement, and/or extracting the characteristic content of the judged object description information as a query statement based on the object description information to be judged;
and matching the extracted query sentences and the corresponding hierarchy categories to train and obtain a hierarchy classification model corresponding to each hierarchy.
In the application of the method, commodity information needing to be inquired is input into a search box. And clicking a search button or pressing a carriage return for searching, wherein in the searching process, a search suggestion is provided below a search bar and is used for quickly selecting information expected to be inquired, and when the classified commodities are searched in a database instead of searching the classified commodities, the system carries out natural language processing and automatically classifies the commodities according to a trained model.
The application also discloses HS code matching and display system based on intelligent analysis and recognition, which comprises:
the device comprises an acquisition unit 1, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a file to be processed;
the file analysis unit 2 is used for receiving the file to be processed and analyzing the type and the format of the file to be processed;
the image preprocessing unit 3 corrects the image imaging problem of the analyzed file to be processed;
the character detection unit 4 is used for detecting the position, the range and the layout of the text in the file to be processed on the basis of correcting the image imaging problem;
a character recognition unit 5 for recognizing the text content based on the text detection;
the text extraction unit 6 is used for extracting required fields and/or elements from the text recognition result and generating object description information to be judged;
the judging unit 7 is used for judging the category of the object to be judged according to the acquired object description information to be judged and the pre-trained atlas data;
a display unit 8 for displaying the result judged by the judgment unit 7; and the number of the first and second groups,
the device comprises a memory 9 and a processor 10, wherein the memory 9 is stored with a computer program which can be loaded by the processor 10 and can execute the HS code matching and displaying method based on intelligent analysis and recognition.
The embodiment of the present application provides a storage medium, in which an instruction set is stored, where the instruction set is suitable for a processor 10 to load and execute the above-mentioned HS code matching and displaying method steps based on intelligent parsing and recognition.
Computer storage media include, for example: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used to describe the technical solutions of the present application in detail, but the above embodiments are only used to help understanding the method and the core idea of the present application, and should not be construed as limiting the present application. Those skilled in the art should also appreciate that various modifications and substitutions can be made without departing from the scope of the present disclosure.

Claims (10)

1. The HS code matching and displaying method based on intelligent analysis and identification is characterized by comprising the following steps:
acquiring objects to be judged, including a picture class and a non-picture class, converting the non-picture class into a picture format, and storing the non-picture class and the picture class file in a unified manner;
analyzing the file, and analyzing the type and format of the object to be determined;
image preprocessing, namely correcting the image imaging problem of the object to be judged;
character detection, detecting the position, range and layout of a text in an object to be judged;
character recognition, namely recognizing the text content on the basis of text detection;
text extraction, namely extracting required fields and/or elements from a text recognition result to generate object description information to be judged;
judging the category of the object to be judged according to the acquired description information of the object to be judged and the pre-trained atlas data, and performing entity link with the atlas data;
the pre-trained atlas data is combined with semantic library training to generate a model according to provided HS coding document data, and an AI algorithm is continuously learned and optimized through external data feedback.
2. The HS code matching and displaying method based on intelligent analytic recognition according to claim 1, wherein the image preprocessing comprises:
inputting an image of a file to be processed into a pre-trained image correction network for geometric change and/or distortion correction to obtain a corrected first target image;
performing small-angle correction on the first target image through a CV algorithm and an affine transformation matrix to obtain a second target image;
removing the blur of the second target image through a denoising algorithm to obtain a third target image;
and carrying out binarization processing on the third target image to obtain a binarized image.
3. The HS code matching and displaying method based on intelligent parsing and recognition according to claim 1, wherein said text detection comprises:
inputting the binary image into a pre-trained feature extraction network;
extracting output information of at least two convolution layers in the feature extraction network, and fusing the output information;
inputting the fused information into a full connection layer in the feature extraction network, and outputting 2k vertical direction coordinates and coordinate scores of k anchors corresponding to the text region of the binary image and k boundary regression results to realize text positioning and obtain a rectangular text box.
4. The HS code matching and displaying method based on intelligent analytic identification according to claim 1,
the character recognition comprises the following steps: performing character recognition on text contents in the rectangular text box through a pre-trained character recognition network to acquire text content information;
the text extraction comprises:
generating a basic semantic analysis engine based on a preset semantic database, wherein the semantic database comprises a field basic corpus, a field dictionary and a field knowledge map;
performing field analysis processing on the text content information based on a basic semantic analysis engine;
extracting the required fields and/or elements in the text content based on the extraction requirement extraction data set.
5. The HS code matching and displaying method based on intelligent parsing and recognition according to claim 1, wherein determining the class of the object to be determined according to the obtained object description information to be determined and pre-trained atlas data comprises:
classifying the object of the class to be determined into classes;
judging the hierarchy category corresponding to each hierarchy from top to bottom according to the obtained object description information and a pre-trained hierarchy classification model corresponding to the hierarchy;
to a unique entity in the profile data.
6. The HS code matching and displaying method based on intelligent analytic recognition according to claim 5, wherein a hierarchical classification model corresponding to each hierarchical level is obtained by training in the following way:
selecting a training sample, and extracting characteristic contents of description information of the sample as a query statement;
and matching the extracted query sentences and the corresponding hierarchy categories to train and obtain a hierarchy classification model corresponding to each hierarchy.
7. The HS code matching and displaying method based on intelligent parsing and recognition according to claim 6, further comprising training a hierarchical classification model corresponding to each hierarchical level in the following manner:
extracting characteristic content of the determined object description information as a query statement based on the object description information to be determined;
and matching the extracted query sentences and the corresponding hierarchy categories to train and obtain a hierarchy classification model corresponding to each hierarchy.
8. The HS code matching and displaying method based on intelligent parsing and recognition of claim 5, wherein said determining the hierarchy type corresponding to the hierarchy comprises calculating matching degree based on ranking learning and semantic features, and performing search ranking.
9. HS code matching, display system based on intelligent analysis discernment, its characterized in that includes:
the device comprises an acquisition unit (1) for acquiring a file to be processed;
the file analysis unit (2) is used for receiving the file to be processed and analyzing the type and the format of the file to be processed;
the image preprocessing unit (3) is used for correcting the image imaging problem of the analyzed file to be processed;
the character detection unit (4) is used for detecting the position, the range and the layout of the text in the file to be processed on the basis of correcting the image imaging problem;
a character recognition unit (5) for recognizing the text content on the basis of the text detection;
a text extraction unit (6) which extracts required fields and/or elements from the text recognition result and generates object description information to be judged;
the judging unit (7) is used for judging the type of the object to be judged according to the acquired object description information to be judged and the pre-trained atlas data;
a display unit (8) for displaying the result determined by the determination unit (7); and the number of the first and second groups,
a memory (9) and a processor (10), the memory (9) having stored thereon a computer program that can be loaded by the processor (10) and that executes the HS code matching, presentation method based on intelligent analytics recognition as claimed in any one of claims 1 to 8.
10. A computer-readable storage medium, characterized in that a computer program is stored which can be loaded by a processor (10) and which executes the HS code matching, presentation method based on intelligent analytics recognition as claimed in any one of claims 1 to 8.
CN202011404276.8A 2020-12-02 2020-12-02 HS code matching and displaying method and system based on intelligent analysis and identification and storage medium Withdrawn CN112434691A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011404276.8A CN112434691A (en) 2020-12-02 2020-12-02 HS code matching and displaying method and system based on intelligent analysis and identification and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011404276.8A CN112434691A (en) 2020-12-02 2020-12-02 HS code matching and displaying method and system based on intelligent analysis and identification and storage medium

Publications (1)

Publication Number Publication Date
CN112434691A true CN112434691A (en) 2021-03-02

Family

ID=74692663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011404276.8A Withdrawn CN112434691A (en) 2020-12-02 2020-12-02 HS code matching and displaying method and system based on intelligent analysis and identification and storage medium

Country Status (1)

Country Link
CN (1) CN112434691A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966111A (en) * 2021-03-19 2021-06-15 北京星汉博纳医药科技有限公司 AI-based automatic classification method and system for object attribute text
CN113011144A (en) * 2021-03-30 2021-06-22 中国工商银行股份有限公司 Form information acquisition method and device and server
CN113051607A (en) * 2021-03-11 2021-06-29 天津大学 Privacy policy information extraction method
CN113343640A (en) * 2021-05-26 2021-09-03 南京大学 Customs clearance commodity HS code classification method and device
CN113486148A (en) * 2021-07-07 2021-10-08 中国建设银行股份有限公司 PDF file conversion method and device, electronic equipment and computer readable medium
CN113536771A (en) * 2021-09-17 2021-10-22 深圳前海环融联易信息科技服务有限公司 Element information extraction method, device, equipment and medium based on text recognition
CN114579712A (en) * 2022-05-05 2022-06-03 中科雨辰科技有限公司 Text attribute extraction and matching method based on dynamic model
CN114580429A (en) * 2022-01-26 2022-06-03 云捷计算机软件(江苏)有限责任公司 Artificial intelligence-based language and image understanding integrated service system
CN115171129A (en) * 2022-09-06 2022-10-11 京华信息科技股份有限公司 Character recognition error correction method and device, terminal equipment and storage medium
CN116127047A (en) * 2023-04-04 2023-05-16 北京大学深圳研究生院 Method and device for establishing enterprise information base
CN117542067A (en) * 2023-12-18 2024-02-09 北京长河数智科技有限责任公司 Region labeling form recognition method based on visual recognition

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051607A (en) * 2021-03-11 2021-06-29 天津大学 Privacy policy information extraction method
CN113051607B (en) * 2021-03-11 2022-04-19 天津大学 Privacy policy information extraction method
CN112966111A (en) * 2021-03-19 2021-06-15 北京星汉博纳医药科技有限公司 AI-based automatic classification method and system for object attribute text
CN113011144B (en) * 2021-03-30 2024-01-30 中国工商银行股份有限公司 Form information acquisition method, device and server
CN113011144A (en) * 2021-03-30 2021-06-22 中国工商银行股份有限公司 Form information acquisition method and device and server
CN113343640A (en) * 2021-05-26 2021-09-03 南京大学 Customs clearance commodity HS code classification method and device
CN113343640B (en) * 2021-05-26 2024-02-20 南京大学 Method and device for classifying customs commodity HS codes
CN113486148A (en) * 2021-07-07 2021-10-08 中国建设银行股份有限公司 PDF file conversion method and device, electronic equipment and computer readable medium
CN113536771A (en) * 2021-09-17 2021-10-22 深圳前海环融联易信息科技服务有限公司 Element information extraction method, device, equipment and medium based on text recognition
CN114580429A (en) * 2022-01-26 2022-06-03 云捷计算机软件(江苏)有限责任公司 Artificial intelligence-based language and image understanding integrated service system
CN114579712B (en) * 2022-05-05 2022-07-15 中科雨辰科技有限公司 Text attribute extraction and matching method based on dynamic model
CN114579712A (en) * 2022-05-05 2022-06-03 中科雨辰科技有限公司 Text attribute extraction and matching method based on dynamic model
CN115171129A (en) * 2022-09-06 2022-10-11 京华信息科技股份有限公司 Character recognition error correction method and device, terminal equipment and storage medium
CN116127047A (en) * 2023-04-04 2023-05-16 北京大学深圳研究生院 Method and device for establishing enterprise information base
CN116127047B (en) * 2023-04-04 2023-08-01 北京大学深圳研究生院 Method and device for establishing enterprise information base
CN117542067A (en) * 2023-12-18 2024-02-09 北京长河数智科技有限责任公司 Region labeling form recognition method based on visual recognition

Similar Documents

Publication Publication Date Title
CN112434691A (en) HS code matching and displaying method and system based on intelligent analysis and identification and storage medium
Kang et al. Convolve, attend and spell: An attention-based sequence-to-sequence model for handwritten word recognition
US11734328B2 (en) Artificial intelligence based corpus enrichment for knowledge population and query response
US11514698B2 (en) Intelligent extraction of information from a document
US20230206000A1 (en) Data-driven structure extraction from text documents
US10915788B2 (en) Optical character recognition using end-to-end deep learning
Mao et al. Document structure analysis algorithms: a literature survey
US8249344B2 (en) Grammatical parsing of document visual structures
CN109145260B (en) Automatic text information extraction method
JP6462970B1 (en) Classification device, classification method, generation method, classification program, and generation program
CN114612921B (en) Form recognition method and device, electronic equipment and computer readable medium
Tkaczyk New methods for metadata extraction from scientific literature
JP2022541199A (en) A system and method for inserting data into a structured database based on image representations of data tables.
Sharma et al. [Retracted] Optimized CNN‐Based Recognition of District Names of Punjab State in Gurmukhi Script
Al-Barhamtoshy et al. An arabic manuscript regions detection, recognition and its applications for OCRing
JP2006309347A (en) Method, system, and program for extracting keyword from object document
CN115294593A (en) Image information extraction method and device, computer equipment and storage medium
Blomqvist et al. Reading the ransom: Methodological advancements in extracting the swedish wealth tax of 1571
KR102467096B1 (en) Method and apparatus for checking dataset to learn extraction model for metadata of thesis
CN114003750A (en) Material online method, device, equipment and storage medium
CN112395429A (en) Method, system and storage medium for determining, pushing and applying HS (high speed coding) codes based on graph neural network
Khan et al. Analysis of Cursive Text Recognition Systems: A Systematic Literature Review
JP4466241B2 (en) Document processing method and document processing apparatus
Wu et al. Automatic semantic knowledge extraction from electronic forms
Kashevnik et al. An Approach to Engineering Drawing Organization: Title Block Detection and Processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210302