CN112633283A - Method and system for identifying and translating English mail address - Google Patents

Method and system for identifying and translating English mail address Download PDF

Info

Publication number
CN112633283A
CN112633283A CN202110248496.4A CN202110248496A CN112633283A CN 112633283 A CN112633283 A CN 112633283A CN 202110248496 A CN202110248496 A CN 202110248496A CN 112633283 A CN112633283 A CN 112633283A
Authority
CN
China
Prior art keywords
english
image
information
text
address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110248496.4A
Other languages
Chinese (zh)
Inventor
夏志鹏
丁明
李海荣
陈永辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xuanwu Wireless Technology Co Ltd
Original Assignee
Guangzhou Xuanwu Wireless Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xuanwu Wireless Technology Co Ltd filed Critical Guangzhou Xuanwu Wireless Technology Co Ltd
Priority to CN202110248496.4A priority Critical patent/CN112633283A/en
Publication of CN112633283A publication Critical patent/CN112633283A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Character Discrimination (AREA)

Abstract

The invention provides an English mail address recognition and translation method and system, wherein the method comprises the following steps: acquiring a picture containing English mail address information, and preprocessing the picture to obtain a first image containing the English mail address information; wherein the pre-processing comprises: denoising, angle correction and enhancement; performing text region detection on the first image to obtain the number and position information of the text regions of the first image; cutting according to the number of text areas and the position information of the first image to obtain a second image only containing text information; and translating the English information in the second image to obtain Chinese address information. The invention can realize the automatic recognition and translation of the address information of English international letters of different formats in the image and carry out efficient butt joint with a service system. The invention supports image processing of single/batch English international mails and supports synchronous/asynchronous address information identification and translation.

Description

Method and system for identifying and translating English mail address
Technical Field
The invention relates to the technical field of natural language processing, in particular to an English mail address recognition and translation method and system.
Background
With the further development of globalization, the number of international letters between different countries increases year by year, and the number of mails sent to China in countries using English as official language increases more obviously. The traditional sorting method for processing international mails is that English of address and recipient information on a letter is translated into Chinese by manpower, then the English is written on the letter or a corresponding address stamp is covered on the letter, and the letter is delivered to a courier.
The postal delivery information writing format of the international English letter has no uniform format, the content is formed by mixing Chinese pinyin and English, and during translation, a worker is required to be very familiar with the place name of a service coverage area of a logistics company so as to achieve accurate translation. When the number of letters is small, the manual translation is slow, but can support business requirements. At present, the number of letters is increased sharply, particularly in large cities with developed international trade, the number of international letters to be processed every day is hundreds of thousands, huge labor cost is brought to logistics companies by means of manual translation, and the requirement of seeking technical breakthrough to improve working efficiency is urgent.
Disclosure of Invention
The invention provides a method and a system for identifying and translating an English mail address, which can identify and translate English mail address information shot by electronic equipment such as a mobile phone and a camera based on OCR (optical character recognition) and machine translation of a neural network, transmit the obtained Chinese address information into a service system, and support workers to accurately and efficiently sort international letters in different areas.
One embodiment of the invention provides an English mail address recognition and translation method, which comprises the following steps:
acquiring a picture containing English mail address information, and preprocessing the picture to obtain a first image containing the English mail address information; wherein the pre-processing comprises: denoising, angle correction and enhancement;
performing text region detection on the first image to obtain the number and position information of the text regions of the first image;
cutting according to the number of text areas and the position information of the first image to obtain a second image only containing text information;
and translating the English information in the second image to obtain Chinese address information.
Further, the translating the english information in the second image to obtain the chinese address information includes:
segmenting the English character string in the second image according to an English segmentation and sentence processing model based on a Porter stemmer algorithm to generate an English sentence;
translating the English sentences through a Chinese and English translation model trained based on a deep neural network, filtering the sentences irrelevant to the address information according to keywords representing regional levels, and generating Chinese address information;
checking the Chinese address information according to a preset Chinese address database;
the detecting the text region of the first image to obtain the number of the text regions and the position information of the first image includes:
training feature labeling data of the English address image based on a deep neural network to obtain a detection model, and judging whether the first image contains an English address area or not;
and when the first image is judged to contain the English address area, obtaining the number of the text areas and the position information of the first image according to a text detection model.
Further, before translating the english information in the second image, the method further includes:
and identifying the English information through an identification model obtained by training the English image and the English character marking data based on the deep neural network.
Further, the denoising processing is carried out according to a Gaussian filter algorithm and a median filter algorithm, and noise pixels which obviously interfere text information in the picture are eliminated;
the angle correction is carried out according to a text angle detection model based on a deep neural network, the inclination and/or the turning angle of the picture are/is detected, and the picture is subjected to rotation correction;
and performing the enhancement processing according to a super-resolution technology to improve the resolution of the image.
An embodiment of the present invention provides an identification and translation system for an english mail address, including:
the picture preprocessing module is used for acquiring a picture containing the address information of the English mail and preprocessing the picture to obtain a first image containing the address information of the English mail; wherein the pre-processing comprises: denoising, angle correction and enhancement;
a text region detection module, configured to perform text region detection on the first image to obtain the number of text regions and position information of the first image;
the image cutting module is used for cutting according to the number of the text areas and the position information of the first image to obtain a second image only containing text information;
the translation module is used for translating the English information in the second image to obtain Chinese address information;
wherein the translation module comprises:
the word segmentation submodule is used for segmenting the English character strings in the second image according to an English word segmentation and sentence processing model based on a Porter stemmer algorithm to generate an English sentence;
the Chinese-English translation sub-module is used for translating the English sentences through a Chinese-English translation model trained based on a deep neural network, filtering the sentences irrelevant to the address information according to the keywords representing the regional level and generating Chinese address information;
and the checking submodule is used for checking the Chinese address information according to a preset Chinese address database.
Further, the text region detection module includes:
the English address area judging submodule is used for training the characteristic marking data of the English address image based on the deep neural network to obtain a detection model and judging whether the first image contains an English address area or not;
and the text region number and position information acquisition submodule is used for acquiring the text region number and position information of the first image according to a text detection model when the first image is judged to contain the English address region.
Further, the system for identifying and translating an english mail address further includes:
and the character recognition module is used for recognizing English information through a recognition model obtained by training the English image and the English character marking data based on the deep neural network.
Further, the picture preprocessing module includes:
the denoising processing submodule is used for carrying out denoising processing according to a Gaussian filter algorithm and a median filter algorithm and eliminating noise pixels which obviously interfere text information in the picture;
the angle correction submodule is used for correcting the angle according to a text angle detection model based on a deep neural network, detecting the inclination and/or the turnover angle of the picture and rotationally correcting the picture;
and the enhancement processing submodule is used for carrying out enhancement processing according to a super-resolution technology and improving the resolution of the image.
An embodiment of the present invention further provides an electronic apparatus, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor implements any one of the methods for identifying and translating an english mail address when executing the computer program.
An embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, the computer-readable storage medium is controlled to execute any one of the methods for identifying and translating an english mail address.
Compared with the prior art, the embodiment of the invention has the beneficial effects that:
one embodiment of the invention provides an English mail address recognition and translation method, which comprises the following steps: acquiring a picture containing English mail address information, and preprocessing the picture to obtain a first image containing the English mail address information; wherein the pre-processing comprises: denoising, angle correction and enhancement; performing text region detection on the first image to obtain the number and position information of the text regions of the first image; cutting according to the number of text areas and the position information of the first image to obtain a second image only containing text information; and translating the English information in the second image to obtain Chinese address information. The invention can realize the automatic recognition and translation of the address information of English international letters of different formats in the image and carry out efficient butt joint with a service system. The invention supports image processing of single/batch English international mails and supports synchronous/asynchronous address information identification and translation.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for identifying and translating an english mail address according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for identifying and translating an english mail address according to another embodiment of the present invention;
fig. 3 is a flowchart of a method for identifying and translating an english mail address according to another embodiment of the present invention;
fig. 4 is a flowchart of a method for identifying and translating an english mail address according to another embodiment of the present invention;
FIG. 5 is a flowchart of English letter address recognition and translation according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating address recognition clipping of an English letter according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating English letter address text recognition according to an embodiment of the present invention;
FIG. 8 is a diagram illustrating the result of recognizing English letter address text according to an embodiment of the present invention;
FIG. 9 is a block diagram of an apparatus for an English mail address recognition and translation system according to an embodiment of the present invention;
FIG. 10 is a diagram of an apparatus of an English mail address recognition and translation system according to another embodiment of the present invention;
FIG. 11 is a diagram of an apparatus of a system for identifying and translating an English mail address according to another embodiment of the present invention;
FIG. 12 is a diagram of an apparatus of a system for identifying and translating an English mail address according to another embodiment of the present invention;
FIG. 13 is a diagram of an apparatus of an English mail address recognition and translation system according to another embodiment of the present invention;
fig. 14 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be understood that the step numbers used herein are for convenience of description only and are not intended as limitations on the order in which the steps are performed.
It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
The terms "comprises" and "comprising" indicate the presence of the described features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The term "and/or" refers to and includes any and all possible combinations of one or more of the associated listed items.
Natural Language Processing (NLP) is a cross-domain of computer science, artificial intelligence, and linguistics, with the goal of letting a computer process or "understand" Natural Language to perform tasks such as Language translation and question answering.
Machine translation is one of the important technical directions of NLP, and can be generally classified into three major categories: rule-based Machine Translation (RBMT), Statistical Machine Translation (SMT), and Neural Network Machine Translation (NMT). With the rapid development of deep learning technology, the technology of applying neural network to perform machine translation is greatly improved, compared with the traditional Statistical Machine Translation (SMT), NMT can directly perform model training on the serialization of a source text and a target text, output a sequence with a longer length, and obtain very good performance in the aspects of translation, conversation and word summarization.
OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into computer text using a Character Recognition method.
In recent years, artificial intelligence technology is rapidly developed in the field of computer vision, deep learning technology is applied to OCR, mainstream OCR technology is realized by adopting a text detection and text recognition mode based on a deep learning network, a text region in an image is detected through a text detection model, then the detected text region is transmitted into the text recognition model, character information is recognized, and then characters are formatted and output.
The English mail address recognition and translation technology provided by the invention is based on OCR recognition and machine translation of a neural network, can recognize and translate English mail address information shot by electronic equipment such as a mobile phone and a camera, and transmits the obtained Chinese address information into a service system, thereby supporting workers to accurately and efficiently sort international mails in different areas.
A first aspect.
Referring to fig. 1-4, an embodiment of the present invention provides a method for identifying and translating an english mail address, including:
s10, acquiring a picture containing the address information of the English mail, and preprocessing the picture to obtain a first image containing the address information of the English mail; wherein the pre-processing comprises: denoising processing, angle correction and enhancement processing.
In a specific embodiment, the denoising processing is performed according to a gaussian filter algorithm and a median filter algorithm, so as to eliminate noise pixels in the picture which have obvious interference on text information;
the angle correction is carried out according to a text angle detection model based on a deep neural network, the inclination and/or the turning angle of the picture are/is detected, and the picture is subjected to rotation correction;
and performing the enhancement processing according to a super-resolution technology to improve the resolution of the image.
And S20, performing text region detection on the first image to obtain the number and position information of the text regions of the first image.
In a specific embodiment, the S20, performing text region detection on the first image to obtain the number of text regions and the position information of the first image, includes:
s21, training the feature labeling data of the English address image based on the deep neural network to obtain a detection model, and judging whether the first image contains an English address area.
And S22, when the first image is judged to contain the English address area, obtaining the number of text areas and the position information of the first image according to a text detection model.
And S30, cutting according to the number of the text areas and the position information of the first image to obtain a second image only containing text information.
In a specific embodiment, the method further comprises:
and S31, identifying the English information through an identification model obtained by training the English image and the English character labeling data based on the deep neural network.
And S40, translating the English information in the second image to obtain Chinese address information.
In a specific embodiment, the S40 translating the english information in the second image to obtain chinese address information includes:
and S41, segmenting the English character string in the second image according to the English segmentation and sentence processing model based on the Porter stemmer algorithm, and generating the English sentence.
And S42, translating the English sentences through a Chinese and English translation model trained based on the deep neural network, and filtering the sentences irrelevant to the address information according to the keywords representing the regional level to generate Chinese address information.
And S43, checking the Chinese address information according to a preset Chinese address database.
The invention can realize the automatic recognition and translation of the address information of English international letters of different formats in the image and carry out efficient butt joint with a service system. The invention supports image processing of single/batch English international mails and supports synchronous/asynchronous address information identification and translation.
In a specific embodiment, the overall implementation process of the invention is as shown in fig. 5, and the specific implementation manner is as follows:
(1) image data acquisition device: the business personnel call the camera through the mobile phone APP/small program, the address information of the mail is shot, the shot pictures are uploaded to the Ali cloud server, the number of the uploaded pictures supports single-piece/batch, and the identification mode supports synchronous/asynchronous.
(2) The image preprocessing module carries out denoising, angle correction, enhancement and other processing on the uploaded image. Denoising is based on Gaussian filtering and median filtering algorithm technologies, and noise pixels which obviously interfere text information in the picture are eliminated; the angle correction is based on a text angle detection model of a deep neural network, the inclination/turnover angle of the picture during shooting is detected, and the picture is subjected to rotation correction; the enhancement is based on a super-resolution technology, and the pictures with lower shooting resolution are processed, so that the image resolution is improved.
(3) The English address detection module is used for training the feature labeling data of the English address image based on the deep neural network to obtain a detection model. And the picture data processed by the picture preprocessing module is transmitted into an English address detection module, and the detection module judges whether the image contains an English address area. If the English address area is not detected, ending the identification translation process and re-shooting the picture; and if the English address area is detected, outputting the position information of the English address area in the image.
(4) The text detection module is formed by a detection model obtained by training English labeling data based on a deep neural network. And (4) cutting the picture according to the position information of the English address area detected in the step (3), introducing the cut picture into an English detection module, and detecting a text area, wherein the image processing process is as shown in FIG. 6, and the detection result is as shown in FIG. 7. The number and the position information of the text regions in the image can be obtained through the text detection module.
(5) The text recognition module is formed by a recognition model obtained by training English images and English character marking data based on a deep neural network. And (4) transmitting the text region information detected in the step (4) into a text recognition module, and recognizing the character information of each region. For example, in fig. 8, the information recognized by the text recognition module is: "MRSFENGMEI", "TIANHEQUZIJIGJIE 1 HAO", "901 FANG, GUANGZHOU", "CHINA".
(6) The English word segmentation and sentence processing module is used for segmenting a plurality of English character strings recognized in the step (5) based on Porter stemmer algorithm and respectively generating corresponding English sentences, and the generated English sentences are as follows: "MRS FENG MEI", "TIAN HE QU ZI JING JIE1 HAO", "901 FANG, GUANG ZHOU", "CHINA".
(7) And (3) the machine translation system is formed by a Chinese and English translation model trained based on a deep neural network, translates a plurality of English sentences transmitted in the step (6), filters sentences irrelevant to address information according to keywords such as provinces, cities, districts, streets, roads and cells representing regional levels, and transmits the sentences belonging to the address information into a built Chinese address database for matching and checking.
(8) Outputting the Chinese sentences processed in the step (7) according to the descending order of the regional levels, wherein the final output result is as follows: "the chinese redbud road No. 1 901 room" in the Tianhe area of Guangzhou city, "the applet returns the results interface as shown in FIG. 8.
The invention realizes the acquisition of image data (APP and small programs) by a mobile terminal of a mobile phone based on OCR recognition and machine translation of a neural network, acquires translated Chinese information through an algorithm recognition system, and uploads the translated Chinese information to a service system database in real time.
A second aspect.
Referring to fig. 9-13, an embodiment of the present invention provides an english mail address recognition and translation system, including:
the image preprocessing module 10 is configured to acquire an image containing address information of an english email, and preprocess the image to obtain a first image containing address information of the english email; wherein the pre-processing comprises: denoising processing, angle correction and enhancement processing.
In a specific embodiment, the picture preprocessing module 10 includes:
and the denoising processing submodule 11 is configured to perform denoising processing according to a gaussian filtering algorithm and a median filtering algorithm, and eliminate noise pixels in the picture that have significant interference on text information.
And the angle correction submodule 12 is used for performing angle correction according to a text angle detection model based on a deep neural network, detecting the inclination and/or the turning angle of the picture, and performing rotation correction on the picture.
And the enhancement processing submodule 13 is configured to perform the enhancement processing according to a super-resolution technology, and improve the resolution of the image.
A text region detecting module 20, configured to perform text region detection on the first image, so as to obtain the number of text regions and position information of the first image.
In a specific embodiment, the text region detecting module 20 includes:
the english address region judgment sub-module 21 is configured to train feature labeling data of an english address image based on a deep neural network to obtain a detection model, and judge whether the first image contains an english address region.
And the text region number and position information obtaining sub-module 22 is configured to, when it is determined that the first image contains an english address region, obtain the text region number and position information of the first image according to a text detection model.
And the image cropping module 30 is configured to crop the first image according to the number of text regions and the position information of the first image, so as to obtain a second image only containing text information.
And the translation module 40 is used for translating the English information in the second image to obtain Chinese address information.
In a specific embodiment, the translation module 40 includes:
and the word segmentation submodule 41 is configured to perform word segmentation on the english character string in the second image according to an english word segmentation and sentence processing model based on the Porter stemmer algorithm, so as to generate an english sentence.
And the Chinese-English translation sub-module 42 is used for translating the English sentences through a Chinese-English translation model trained based on the deep neural network, filtering the sentences irrelevant to the address information according to the keywords representing the regional level, and generating Chinese address information.
And a checking submodule 43, configured to check the chinese address information according to a preset chinese address database.
In a specific embodiment, the system for recognizing and translating an english mail address further includes:
and the character recognition module 50 is configured to recognize the english information through a recognition model obtained by training the english image and the english character labeling data based on the deep neural network.
In a third aspect.
The present invention provides an electronic device, including:
a processor, a memory, and a bus;
the bus is used for connecting the processor and the memory;
the memory is used for storing operation instructions;
the processor is configured to invoke the operation instruction, and the executable instruction enables the processor to execute an operation corresponding to the method for identifying and translating the english mail address shown in the first aspect of the present application.
In an alternative embodiment, there is provided an electronic device, as shown in fig. 14, an electronic device 5000 shown in fig. 14 including: a processor 5001 and a memory 5003. The processor 5001 and the memory 5003 are coupled, such as via a bus 5002. Optionally, the electronic device 5000 may also include a transceiver 5004. It should be noted that the transceiver 5004 is not limited to one in practical application, and the structure of the electronic device 5000 is not limited to the embodiment of the present application.
The processor 5001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 5001 may also be a combination of processors implementing computing functionality, e.g., a combination comprising one or more microprocessors, a combination of DSPs and microprocessors, or the like.
Bus 5002 can include a path that conveys information between the aforementioned components. The bus 5002 may be a PCI bus or EISA bus, etc. The bus 5002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 14, but this is not intended to represent only one bus or type of bus.
The memory 5003 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 5003 is used for storing application program codes for executing the present solution, and the execution is controlled by the processor 5001. The processor 5001 is configured to execute application program code stored in the memory 5003 to implement the teachings of any of the foregoing method embodiments.
Among them, electronic devices include but are not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like.
A fourth aspect.
The invention provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements a method for identifying and translating an english mail address as shown in the first aspect of the present application.
Yet another embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, which, when run on a computer, enables the computer to perform the corresponding content in the aforementioned method embodiments.

Claims (10)

1. An English mail address recognition and translation method is characterized by comprising the following steps:
acquiring a picture containing English mail address information, and preprocessing the picture to obtain a first image containing the English mail address information; wherein the pre-processing comprises: denoising, angle correction and enhancement;
performing text region detection on the first image to obtain the number and position information of the text regions of the first image;
cutting according to the number of text areas and the position information of the first image to obtain a second image only containing text information;
translating the English information in the second image to obtain Chinese address information;
wherein, the translating the english information in the second image to obtain the chinese address information includes:
segmenting the English character string in the second image according to an English segmentation and sentence processing model based on a Porter stemmer algorithm to generate an English sentence;
translating the English sentences through a Chinese and English translation model trained based on a deep neural network, filtering the sentences irrelevant to the address information according to keywords representing regional levels, and generating Chinese address information;
and checking the Chinese address information according to a preset Chinese address database.
2. The method for identifying and translating an english mail address according to claim 1, wherein the detecting text regions of the first image to obtain the number of text regions and the position information of the first image comprises:
training feature labeling data of the English address image based on a deep neural network to obtain a detection model, and judging whether the first image contains an English address area or not;
and when the first image is judged to contain the English address area, obtaining the number of the text areas and the position information of the first image according to a text detection model.
3. The method for identifying and translating an english mail address according to claim 1, wherein before translating the english message in the second image, the method further comprises:
and identifying the English information through an identification model obtained by training the English image and the English character marking data based on the deep neural network.
4. The method as claimed in claim 1, wherein the denoising process is performed according to a gaussian filter algorithm and a median filter algorithm to eliminate noise pixels in the image that significantly interfere with text information;
the angle correction is carried out according to a text angle detection model based on a deep neural network, the inclination and/or the turning angle of the picture are/is detected, and the picture is subjected to rotation correction;
and performing the enhancement processing according to a super-resolution technology to improve the resolution of the image.
5. An English mail address recognition and translation system, comprising:
the picture preprocessing module is used for acquiring a picture containing the address information of the English mail and preprocessing the picture to obtain a first image containing the address information of the English mail; wherein the pre-processing comprises: denoising, angle correction and enhancement;
a text region detection module, configured to perform text region detection on the first image to obtain the number of text regions and position information of the first image;
the image cutting module is used for cutting according to the number of the text areas and the position information of the first image to obtain a second image only containing text information;
the translation module is used for translating the English information in the second image to obtain Chinese address information;
wherein the translation module comprises:
the word segmentation submodule is used for segmenting the English character strings in the second image according to an English word segmentation and sentence processing model based on a Porter stemmer algorithm to generate an English sentence;
the Chinese-English translation sub-module is used for translating the English sentences through a Chinese-English translation model trained based on a deep neural network, filtering the sentences irrelevant to the address information according to the keywords representing the regional level and generating Chinese address information;
and the checking submodule is used for checking the Chinese address information according to a preset Chinese address database.
6. The system for recognizing and translating an english mail address according to claim 5, wherein said text area detecting module comprises:
the English address area judging submodule is used for training the characteristic marking data of the English address image based on the deep neural network to obtain a detection model and judging whether the first image contains an English address area or not;
and the text region number and position information acquisition submodule is used for acquiring the text region number and position information of the first image according to a text detection model when the first image is judged to contain the English address region.
7. The system for recognizing and translating an english mail address according to claim 5, further comprising:
and the character recognition module is used for recognizing English information through a recognition model obtained by training the English image and the English character marking data based on the deep neural network.
8. The system for recognizing and translating an english mail address according to claim 5, wherein the image preprocessing module comprises:
the denoising processing submodule is used for carrying out denoising processing according to a Gaussian filter algorithm and a median filter algorithm and eliminating noise pixels which obviously interfere text information in the picture;
the angle correction submodule is used for correcting the angle according to a text angle detection model based on a deep neural network, detecting the inclination and/or the turnover angle of the picture and rotationally correcting the picture;
and the enhancement processing submodule is used for carrying out enhancement processing according to a super-resolution technology and improving the resolution of the image.
9. An electronic device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor executes the computer program to implement a method for identifying and translating an english mail address according to any one of claims 1 to 4.
10. A computer-readable storage medium, comprising a stored computer program, wherein when the computer program runs, the computer-readable storage medium controls a device to execute the method for recognizing and translating an english mail address according to any one of claims 1 to 4.
CN202110248496.4A 2021-03-08 2021-03-08 Method and system for identifying and translating English mail address Pending CN112633283A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110248496.4A CN112633283A (en) 2021-03-08 2021-03-08 Method and system for identifying and translating English mail address

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110248496.4A CN112633283A (en) 2021-03-08 2021-03-08 Method and system for identifying and translating English mail address

Publications (1)

Publication Number Publication Date
CN112633283A true CN112633283A (en) 2021-04-09

Family

ID=75297726

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110248496.4A Pending CN112633283A (en) 2021-03-08 2021-03-08 Method and system for identifying and translating English mail address

Country Status (1)

Country Link
CN (1) CN112633283A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657374A (en) * 2021-06-29 2021-11-16 中南林业科技大学 English address recognition and analysis method for international mail list

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5249687A (en) * 1991-04-19 1993-10-05 International Business Machines Corporation Barcode translation for deferred optical character recognition mail processing
CN101482862A (en) * 2009-01-20 2009-07-15 上海邮政科学研究院 Chinese automatic translation method for English mail address
CN101639760A (en) * 2009-08-27 2010-02-03 上海合合信息科技发展有限公司 Input method and input system of contact information
US20100150398A1 (en) * 2008-12-17 2010-06-17 Electronics And Telecommunications Research Institute Multilingual acceptance information processing method and system based on image recognition
CN202694339U (en) * 2012-05-10 2013-01-23 中邮科技有限责任公司 Automatic batch translation equipment of international letters
CN106970903A (en) * 2016-01-13 2017-07-21 阿里巴巴集团控股有限公司 The processing method and processing device of address information in logistics system
CN109271642A (en) * 2018-11-26 2019-01-25 科大讯飞股份有限公司 Text wants point detecting method, device, equipment, storage medium and appraisal procedure
CN109598238A (en) * 2018-12-04 2019-04-09 拉扎斯网络科技(上海)有限公司 Information processing method and device, storage medium and electronic equipment
CN110209771A (en) * 2019-06-14 2019-09-06 哈尔滨哈银消费金融有限责任公司 User's geographic information analysis and text mining method and apparatus
CN110377897A (en) * 2018-04-13 2019-10-25 顺丰科技有限公司 Chinese and English address automatic testing method and system
CN110414632A (en) * 2019-06-27 2019-11-05 亳州职业技术学院 A kind of information identification storage method for hand-written logistics document
CN110533047A (en) * 2019-08-30 2019-12-03 西南大学 A kind of denoising and binarization method for ancient books picture
CN110659633A (en) * 2019-08-15 2020-01-07 坎德拉(深圳)科技创新有限公司 Image text information recognition method and device and storage medium
US10657368B1 (en) * 2017-02-03 2020-05-19 Aon Risk Services, Inc. Of Maryland Automatic human-emulative document analysis
CN111259889A (en) * 2020-01-17 2020-06-09 平安医疗健康管理股份有限公司 Image text recognition method and device, computer equipment and computer storage medium
CN111415399A (en) * 2020-03-19 2020-07-14 北京奇艺世纪科技有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium
CN111666937A (en) * 2020-04-17 2020-09-15 广州多益网络股份有限公司 Method and system for recognizing text in image
CN112257472A (en) * 2020-11-13 2021-01-22 腾讯科技(深圳)有限公司 Training method of text translation model, and text translation method and device

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5249687A (en) * 1991-04-19 1993-10-05 International Business Machines Corporation Barcode translation for deferred optical character recognition mail processing
US20100150398A1 (en) * 2008-12-17 2010-06-17 Electronics And Telecommunications Research Institute Multilingual acceptance information processing method and system based on image recognition
CN101482862A (en) * 2009-01-20 2009-07-15 上海邮政科学研究院 Chinese automatic translation method for English mail address
CN101639760A (en) * 2009-08-27 2010-02-03 上海合合信息科技发展有限公司 Input method and input system of contact information
CN202694339U (en) * 2012-05-10 2013-01-23 中邮科技有限责任公司 Automatic batch translation equipment of international letters
CN106970903A (en) * 2016-01-13 2017-07-21 阿里巴巴集团控股有限公司 The processing method and processing device of address information in logistics system
US10657368B1 (en) * 2017-02-03 2020-05-19 Aon Risk Services, Inc. Of Maryland Automatic human-emulative document analysis
CN110377897A (en) * 2018-04-13 2019-10-25 顺丰科技有限公司 Chinese and English address automatic testing method and system
CN109271642A (en) * 2018-11-26 2019-01-25 科大讯飞股份有限公司 Text wants point detecting method, device, equipment, storage medium and appraisal procedure
CN109598238A (en) * 2018-12-04 2019-04-09 拉扎斯网络科技(上海)有限公司 Information processing method and device, storage medium and electronic equipment
CN110209771A (en) * 2019-06-14 2019-09-06 哈尔滨哈银消费金融有限责任公司 User's geographic information analysis and text mining method and apparatus
CN110414632A (en) * 2019-06-27 2019-11-05 亳州职业技术学院 A kind of information identification storage method for hand-written logistics document
CN110659633A (en) * 2019-08-15 2020-01-07 坎德拉(深圳)科技创新有限公司 Image text information recognition method and device and storage medium
CN110533047A (en) * 2019-08-30 2019-12-03 西南大学 A kind of denoising and binarization method for ancient books picture
CN111259889A (en) * 2020-01-17 2020-06-09 平安医疗健康管理股份有限公司 Image text recognition method and device, computer equipment and computer storage medium
CN111415399A (en) * 2020-03-19 2020-07-14 北京奇艺世纪科技有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium
CN111666937A (en) * 2020-04-17 2020-09-15 广州多益网络股份有限公司 Method and system for recognizing text in image
CN112257472A (en) * 2020-11-13 2021-01-22 腾讯科技(深圳)有限公司 Training method of text translation model, and text translation method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
屠晓: "信函英文地址的自动识别和翻译", 《华东师范大学学报( 自然科学版)》 *
张婷婷: "OCR文字识别技术的研究", 《计算机技术与发展》 *
王霞玲: "基于图像识别和地址翻译的国际信函批译系统", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657374A (en) * 2021-06-29 2021-11-16 中南林业科技大学 English address recognition and analysis method for international mail list

Similar Documents

Publication Publication Date Title
CN112686257A (en) Storefront character recognition method and system based on OCR
CN108491866B (en) Pornographic picture identification method, electronic device and readable storage medium
Ramiah et al. Detecting text based image with optical character recognition for English translation and speech using Android
CN110765740B (en) Full-type text replacement method, system, device and storage medium based on DOM tree
CN112733639A (en) Text information structured extraction method and device
CN112766255A (en) Optical character recognition method, device, equipment and storage medium
CA3052248A1 (en) Detecting orientation of textual documents on a live camera feed
CN112308069A (en) Click test method, device, equipment and storage medium for software interface
CN114429636B (en) Image scanning identification method and device and electronic equipment
CN116597466A (en) Engineering drawing text detection and recognition method and system based on improved YOLOv5s
CN112149680A (en) Wrong word detection and identification method and device, electronic equipment and storage medium
CN105551044A (en) Picture comparing method and device
CN111462388A (en) Bill inspection method and device, terminal equipment and storage medium
CN112633283A (en) Method and system for identifying and translating English mail address
CN114386013A (en) Automatic student status authentication method and device, computer equipment and storage medium
CN110717397A (en) Online translation system based on mobile phone camera
CN112464927B (en) Information extraction method, device and system
CN111881900B (en) Corpus generation method, corpus translation model training method, corpus translation model translation method, corpus translation device, corpus translation equipment and corpus translation medium
CN113221897B (en) Image correction method, image text recognition method, identity verification method and device
CN115439850A (en) Image-text character recognition method, device, equipment and storage medium based on examination sheet
CN116050379A (en) Document comparison method and storage medium
CN115909449A (en) File processing method, file processing device, electronic equipment, storage medium and program product
CN114359931A (en) Express bill identification method and device, computer equipment and storage medium
CN111476090B (en) Watermark identification method and device
CN113627442A (en) Medical information input method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210409