CN115082919A

CN115082919A - Address recognition method, electronic device and storage medium

Info

Publication number: CN115082919A
Application number: CN202210864968.3A
Authority: CN
Inventors: 苏沁宁; 苏志锋; 潘永通; 孙铁; 王琳婧
Original assignee: Ping An Bank Co Ltd
Current assignee: Ping An Bank Co Ltd
Priority date: 2022-07-22
Filing date: 2022-07-22
Publication date: 2022-09-20
Anticipated expiration: 2042-07-22
Also published as: CN115082919B

Abstract

The application provides an address identification method, an electronic device and a storage medium, comprising: acquiring an image to be identified, wherein the image to be identified comprises address information, and the address information comprises at least one line; acquiring an address first row frame and a key value frame corresponding to the address first row frame in the address information; translating the address first row frame and the key value frame to obtain an address second row frame positioned below the address first row frame in the address information; extracting texts from the address first line frame and the address second line frame, and inputting the extracted texts into an address context recognition model to judge whether the texts of the address second line frame are the context of the texts of the address first line frame; and in response to the text of the second line box of the address being below the text of the first line box of the address, determining the text of the second line box of the address to be part of the address information. By the scheme, the multi-row address can be completely and accurately extracted.

Description

Address recognition method, electronic device and storage medium

Technical Field

The disclosed embodiments of the present application relate to the field of image processing, and more particularly, to an address recognition method, an electronic device, and a storage medium.

Background

The address is important information which needs to be extracted after the image is subjected to character recognition, in reality, the address needs to be extracted in many text scenes, and high requirements are provided for the completeness and accuracy of the address extraction.

At present, address information extraction is often performed after image character recognition, and the address information extraction can be realized through key value matching, address self structuring and other methods. However, the existing character recognition technical methods only support single-row recognition, and for some longer addresses crossing multiple rows, because the information distribution range is wide, the technologies cannot accurately and completely extract address information, so that the results of multi-row address recognition are not ideal, and the extracted addresses have the problems of node breaking and the like.

Disclosure of Invention

According to an embodiment of the present application, an address recognition method, an electronic device, and a storage medium are provided to solve the above problems.

According to a first aspect of the present application, an exemplary address recognition method is disclosed. The exemplary address identification method comprises: acquiring an image to be recognized, wherein the image to be recognized comprises address information, and the address information comprises at least one line; acquiring an address first row frame and a key value frame corresponding to the address first row frame in the address information; translating the address first row frame and the key value frame to obtain an address second row frame positioned below the address first row frame in the address information; extracting texts from the address first line frame and the address second line frame, and inputting the extracted texts into an address context recognition model to judge whether the texts of the address second line frame are the context of the texts of the address first line frame; and in response to the text of the second line box of the address being below the text of the first line box of the address, determining the text of the second line box of the address to be part of the address information.

According to a second aspect of the present application, an exemplary electronic device is disclosed, which includes a memory and a processor coupled to each other, wherein the processor is configured to execute program instructions stored in the memory to implement the address identification method in the first aspect.

According to a third aspect of the present application, an exemplary non-transitory computer-readable storage medium is disclosed, having stored thereon program instructions that, when executed by a processor, implement the address identification method of the first aspect described above.

According to the scheme, the address first line frame and the key value frame are arranged and translated, the address second line frame positioned below the address first line frame in the address information can be obtained, namely, the multi-line addresses can be obtained, text extraction is carried out on the address first line frame and the address second line frame, the extracted text is input into the address context recognition model, the semantic relation among the multi-line addresses can be recognized, and the completeness and the accuracy of extracting the multi-line addresses are improved.

These and other objects of the present application will no doubt become obvious to those of ordinary skill in the art after having read the following detailed description of the preferred embodiment, which is illustrated in the various drawing figures and drawings.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of an address identification method of the present application;

FIG. 2 is a diagram of an application scenario of an embodiment of the address recognition method of the present application;

FIG. 3 is a block diagram of an embodiment of an electronic device of the present application;

FIG. 4 is a block diagram of one embodiment of a non-volatile computer-readable storage medium of the present application.

Detailed Description

The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

If the technical scheme of the present application relates to personal information, a product applying the technical scheme of the present application clearly informs personal information processing rules and obtains personal self-approval before processing the personal information. If the technical scheme of the application relates to sensitive personal information, a product applying the technical scheme of the application obtains individual consent before processing the personal information, and simultaneously meets the requirement of 'express consent'. For example, at a personal information collection device such as a camera, a clear and significant flag is set to inform that the personal information collection range is entered, the personal information is collected, and if the person voluntarily enters the collection range, the person is considered as agreeing to collect the personal information; or on the device for processing the personal information, under the condition of informing the personal information processing rule by using obvious identification/information, obtaining personal authorization by modes of popping window information or asking a person to upload personal information of the person by himself, and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing method, and a type of personal information to be processed.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an address identification method according to an embodiment of the present application. The main body of the address recognition method may be an address recognition apparatus, for example, the address recognition method may be performed by a terminal device or a server or other processing device, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the address recognition method may be implemented by a processor calling computer readable instructions stored in a memory.

Specifically, as shown in fig. 1, the method may include the steps of:

step S11: and acquiring an image to be identified, wherein the image to be identified comprises address information, and the address information comprises at least one line.

Acquiring an image to be recognized, wherein the image to be recognized may be an image within a preset range, for example: a license image, a money transfer order image, a mailing order image, etc., the range of which can be determined according to actual circumstances.

The image to be recognized includes address information, the address information includes at least one line, wherein the address information is information for calibrating the position of the target location, for example, in the case of a mailing list image, the mailing list image includes address information such as province, city, district, street, and zip code, and the like, and a space of one or more lines in the image may be occupied according to the number of words of the address information.

It should be noted that, for identification and acquisition of the address information included in the image, technologies such as character recognition, character extraction, and the like can be used, and the technology is not limited herein.

Step S12: and acquiring an address first row frame and a key value frame corresponding to the address first row frame in the address information.

The address first line box is a checkbox obtained according to the address information, and includes a first line information text in the address information, and the key value box is also a checkbox obtained according to the address information, and includes a key value information text, which may be other information texts except the address information, for example: in the case of "address: guangdong Shenzhen, southern mountain region. Here, please refer to fig. 2, where fig. 2 is an application scenario diagram of an embodiment of the address identification method of the present application, in which a frame 201 is a key value frame, and a frame 202 is an address first-line frame.

And after the address information is identified and acquired, acquiring an address head line frame and a corresponding key value frame according to the address information.

Step S13: and translating the address first line frame and the key value frame to obtain an address second line frame positioned below the address first line frame in the address information.

Determining the translation direction and the translation distance of the address first row frame and the key value frame, and translating the address first row frame and the key value frame according to the determined translation direction and translation distance to obtain an address second row frame, wherein the determination of the translation distance and the translation direction is performed according to preset rules and can be selected according to actual conditions, for example: the translation distance and the translation direction are set by a preset algorithm and/or formula. And the second line frame of the address comprises address information text except the first line information.

Step S14: and extracting texts of the address first line frame and the address second line frame, and inputting the extracted texts into an address context recognition model to judge whether the texts of the address second line frame are the context of the texts of the address first line frame.

And performing text extraction on the obtained address first line frame and address second line frame, wherein the text extraction can use Optical Character Recognition (OCR) technology, the Optical Recognition technology can determine the shape of the Character by detecting the light and shade in the image, and then the shape is translated into the computer characters by using a Character Recognition method.

And inputting the extracted text into an address context recognition model, wherein the context recognition model is used for recognizing whether the text in the address first line frame and the address second line frame has a context relation or not.

Step S15: and in response to the text of the second line box being the text of the first line box, determining that the text of the second line box is part of the address information.

If the context recognition model confirms that the content of the second line box and the content of the address first line box have a context relationship, the text of the address second line box can be determined to be the context of the text of the address first line box, namely the text of the address second line box is determined to be a part of the address information.

In this embodiment, by setting the address head line frame and the key value frame and translating the address head line frame and the key value frame, the address second line frame located below the address head line frame in the address information can be obtained, that is, a multi-line address can be obtained, and text extraction is performed on the address head line frame and the address second line frame, and the extracted text is input to the address context recognition model, so that a semantic relation between the multi-line addresses can be recognized, and the integrity and accuracy of extracting the multi-line address are improved.

And acquiring the address first row frame and the key value frame corresponding to the address first row frame according to the address information. In some embodiments, obtaining an address first row frame and a key value frame corresponding to the address first row frame in the address information includes: acquiring an identified first rectangular frame, wherein the first rectangular frame is identified according to the initial content of the address information; obtaining an identified key value box, wherein the key value box is identified according to a symbol which is positioned in front of the starting content and is adjacent to the starting content; and in response to the position information of the first rectangular frame meeting a preset condition relative to the position information of the key value frame, determining that the first rectangular frame is an address first line frame.

Wherein, the starting content of the address information, i.e. the first line of the address information or the preset part of the first line of the address information, for example: the first line in certain address information includes "address: guangdong Shenzhen, southern mountain region. "wherein" Guandong Shenzhen city Nanshan region "is the address information start content, or" Guandong Shenzhen "is set as the address information start content, in summary, the range of the address information start content can be set according to the actual situation.

The first rectangular frame is identified according to the initial content of the address information and used for selecting the initial content of the address information, and when the initial content of the preset address information is identified, the first rectangular frame is generated and at least comprises the initial content of the address information.

The key value box is identified by a symbol that precedes and is adjacent to the starting content, again in terms of "address: guangdong Shenzhen, southern mountain region. For example, the Nanshan region of Guandong Shenzhen City is address information start content, a first rectangular box is generated according to the address information start content, the address is determined to be key value information by identifying the colon before the Nanshan region of Guandong Shenzhen City, so that a key value box is generated, and the process of generating the key value box through other symbols is not repeated herein.

The preset condition is used for screening out a first rectangular frame meeting the condition as an address first line frame, judging whether position information of the generated first rectangular frame relative to the key value frame meets the preset condition or not, if the generated first rectangular frame meets the preset condition, judging that the first rectangular frame is the address first line frame, and if the generated first rectangular frame does not meet the preset condition, continuously generating the first rectangular frame and judging whether the first rectangular frame meets the preset condition or not until the first rectangular frame meeting the preset condition is obtained and serves as the address first line frame.

As described above, the preset condition is used to filter out the first rectangular frame meeting the condition as the address first line frame, in some embodiments, the position information of the first rectangular frame includes the lower right-corner coordinates (x1i, y1i) and the upper left-corner coordinates (x2i, y2i) of the first rectangular frame, and the position information of the key value frame includes the lower right-corner coordinates (x1, y1) and the upper left-corner coordinates (x2, y2) of the key value frame; the preset condition includes one of: y1i < = y2 < = y2 i; y1i < = y1 < = y2 i; and y1 < = y1i < = y2i < = y 2.

That is to say, by comparing the sizes of the vertical coordinates of the first rectangular frame and the key value frame, it is determined whether the generated position information of the first rectangular frame relative to the key value frame meets a preset condition, and in addition, the numerical value of the coordinate may be determined according to an established coordinate system, which is selected according to the actual situation.

When y1i < = y2 < = y2i, the horizontal line where the lower border of the first rectangular frame is located between the horizontal lines where the upper border and the lower border of the key value frame are located;

when y1i < = y2 < = y2i, the horizontal line where the upper border of the first rectangular frame is located between the horizontal lines where the upper border and the lower border of the key value frame are located;

when y1 < = y1i < = y2i < = y2, the horizontal lines where the upper and lower borders of the first rectangular box are located between the horizontal lines where the upper and lower borders of the key value box are located.

When any condition of the generated first rectangular frame is met, the position information of the first rectangular frame relative to the key value frame meets a preset condition, at this time, the first rectangular frame comprises a first line of address information, and can be used as an address first line frame, and the first rectangular frame is stored as the address first line frame.

After the address first row frame and the key value frame are obtained, the address first row frame and the key value frame are translated. In some embodiments, translating the address first line frame and the key value frame to obtain the address second line frame positioned below the address first line frame in the address information includes: translating the key value frame to obtain a translated key value frame; translating the address first line frame according to the translation of the key value frame to obtain a translated address first line frame; acquiring an identified second rectangular frame, and acquiring a first intersection ratio of the second rectangular frame and the translated key value frame and a second intersection ratio of the second rectangular frame and the translated address first-row frame; and in response to the first intersection ratio being less than the first threshold and the second intersection ratio being greater than the second threshold, determining the second rectangular frame as the second line frame of the address.

Specifically, let the coordinates of the top left corner of the key-value box O be (topLeftOx, topLeftOy), the coordinates of the bottom left corner be (bottomLeftOx, bottomLeftOy), the translation direction vector be v, and the translation distance be h.

Then the translation distance calculation formula may be defined as follows:

the translation distance of the key value frame O is calculated according to the above formula, it should be noted that in the above formula, width and height are respectively the width and height corresponding to the image to be recognized, and 0.015 is an optional value measured according to the image to be recognized, and may be selected according to actual situations, for example: and according to the line spacing and the font size of different images to be recognized, modifying the specific numerical value to adjust the translation distance.

In addition, the translation direction vector calculation formula is defined as follows:

the moving direction of the key value frame O is calculated according to the above formula, and the moving direction of the key value frame O is determined according to the result of the calculation.

And translating the key value frame O according to the calculation results of the translation direction vector calculation formula and the translation distance calculation formula to obtain a translated key value frame NewO, and translating the address first row frame I according to the translation of the key value frame O, namely translating the address first row frame I according to the translation direction v and the translation distance h which are the same as those of the key value frame O to obtain the translated address first row frame NewI.

Correspondingly, if the translated key-value box NewO has an upper left coordinate of (topLeftNewOx, topLeftNewOy) and a lower left coordinate of (bottomLeftNewOx, bottomNewOy), the calculation formula is as follows:

the upper left coordinate and the lower left coordinate of the translated key-value box NewO can be calculated according to the above formula, and the coordinate transformation formula of the upper right coordinate and the lower right coordinate of the translated key-value box NewO and the coordinate transformation formula of the address first row box I can be obtained by correspondingly transforming the above formula, which is not described herein again.

And after obtaining the translated address first row frame NewI, identifying and acquiring a second rectangular frame, and acquiring a first intersection and parallel ratio of the second rectangular frame and the translated key value frame NewO and a second intersection and parallel ratio of the second rectangular frame and the translated address first row frame NewI, wherein the first intersection and parallel ratio represents the ratio of the intersection and parallel set of the second rectangular frame and the translated key value frame NewO, the second intersection and parallel ratio represents the ratio of the intersection and parallel set of the second rectangular frame and the translated address first row frame NewI, the larger the value of the intersection and parallel ratio is, the larger the area of the overlapped part of the two is, and otherwise, the smaller the area of the overlapped part of the two is.

Further, when the first cross-over ratio is smaller than the first threshold and the second cross-over ratio is larger than the second threshold, it is determined that the second rectangular frame is the second line frame of the address, which means that the area of the overlapped part of the second rectangular frame and the translated key-value frame NewO is smaller and the overlapped part of the second rectangular frame and the translated key-value frame NewO is larger than the second cross-over ratio of the translated first line frame NewI.

As described above, a first intersection ratio of the second rectangular frame and the translated key value frame and a second intersection ratio of the second rectangular frame and the translated address first row frame are obtained, and in some embodiments, the first intersection ratio is:

，

the second intersection ratio is:

。

wherein, inererea 1 represents the cross area between the second rectangular box and the translated key-value box, areaNewO represents the area of the translated key-value box, inererea 2 represents the cross area between the second rectangular box and the translated address first-line box, arenewi represents the area of the translated address first-line box, and areaN represents the area of the second rectangular box.

Calculating the values of iouO and iouI, comparing the calculated value of iouO with a first threshold, and comparing the calculated value of iouI with a second threshold to determine whether the obtained second rectangular frame meets the preset conditions of the first threshold and the second threshold, and when the second rectangular frame meets the preset values of the first threshold and the second threshold, taking the second rectangular frame as a second line frame, for example: and when the iouO value of the second rectangular frame is less than 0.01 and the iouI value is greater than 0.1, saving the second rectangular frame as a second line frame.

As described above, the address first line frame is translated to obtain the address second line frame, in some embodiments, the key value frame is translated again, and the address second line frame is translated according to the translation of the key value frame again, so as to obtain the address third line frame located below the address second line frame in the address information; extracting texts of the second line frame and the third line frame of the address, and inputting the extracted texts into an address context recognition model to judge whether the texts of the third line frame of the address are the context of the texts of the second line frame of the address; in response to the text of the address third line box being the text of the address second line box, determining the text of the address third line box to be another portion of the address information.

At this time, please refer to fig. 2, fig. 2 is an application scenario diagram of an embodiment of the address identification method of the present application, in which dashed boxes may represent a translated key value box, a translated address first row box, and a translated address second row box, a box 201 represents a key value box, a box 202 represents an address first row box, a box 203 represents an address second row box, and a box 204 represents an address third row box.

As shown in fig. 2, the key value frame 201 is translated again, and the address second line frame 203 is translated according to the key value frame 201, so as to obtain the address third line frame 204 located under the address second line frame 203 in the address information, and as shown in fig. 2, frames with different sizes can be generated within a preset range to meet the requirements of different images and text information.

The process of obtaining the third line frame of the address located below the second line frame of the address in the address information may refer to the process of obtaining the second line frame of the address, and is not described herein again.

Extracting texts from the second line frame and the third line frame, and inputting the extracted texts into an address context recognition model to judge whether the texts in the third line frame are the contexts of the texts in the second line frame, for example: extracting text information in the second line frame and the third line frame of the address by an Optical Recognition technology (OCR), inputting the extracted text into an address context Recognition model, and performing feature extraction, analysis, matching and other operations on the input address information in the second line frame and the third line frame of the address by the address context Recognition model to judge whether the text in the third line frame and the second line frame of the address has a context relationship.

The address context identification model is used for identifying whether a context relationship exists between texts in an address third line frame and an address second line frame, and different models can be selected as the address context identification model according to actual conditions, for example: a BERT model, an OpenAI GPT model, an ELMo model, and the like.

As described above, identifying the context relationship between the text in the third line frame and the text in the second line frame by the address context identification model, in some embodiments, obtaining the address context identification model includes: initializing parameters of an address context recognition model; and inputting the training data set into the address context recognition model for training, and learning the parameters of the address context recognition model based on the loss function so as to obtain the address context recognition model.

Initializing parameters of an address context recognition model, wherein the parameters of the address context recognition model, for example: learning rate, decay coefficient, etc.

In the case of using the BERT model, the BERT model includes an embedded layer, a convolutional layer, a Bn layer, a fully connected layer, a multi-head attention mechanism layer, a classifier, etc., parameters of the BERT model are initialized by using the BERT pre-training model to obtain initial values of the parameters of each structure in the BERT model, and according to actual conditions, the number or positions of the embedded layer, the convolutional layer, the Bn layer, the fully connected layer, and the multi-head attention mechanism layer in the model can be adjusted, for example, a plurality of convolutional layers are arranged in the model, for example, the multi-head attention mechanism layer is connected behind the fully connected layer, and another fully connected layer is connected behind the multi-head attention mechanism layer.

Inputting a training data set into the address context recognition model for training, and learning parameters of the address context recognition model based on a loss function, so as to obtain the address context recognition model, for example: inputting the training data set into a BERT model for training, learning parameters of the BERT model based on the NSP loss function, and updating initial parameters of the BERT model according to the training and learning result to obtain the BERT model.

As described above, the training data set is input to the address context recognition model for training, and in some embodiments, the training data set includes a positive sample and a negative sample, where the positive sample includes a plurality of pairs of consecutive two-segment sentences, the plurality of pairs of consecutive two-segment sentences are obtained by splitting the address sentences according to separation positions and separation times, and the negative sample includes parts of the plurality of pairs of consecutive two-segment sentences and negative sample data.

The positive sample comprises a plurality of pairs of continuous two-section statements, the plurality of pairs of continuous two-section statements comprise required address information, and the positive sample is input into the address context model to enhance the recognition capability of the address context model to the address information. For example: acquiring address statement data in reality, selecting a separation position idx of a sentence according to Gaussian distribution sampling, determining random sampling separation times n in 1-5 according to Bernoulli distribution, splitting the address statement according to n different idx positions, and randomly selecting two continuous statements as model input.

The negative examples include parts in two consecutive statements and negative sample data, that is, the negative examples include address information required by the parts and useless address information and/or other information, for example: company name, corporate and unrelated addresses, etc. Negative examples are input into the address context model to enhance the anti-interference capability of the address context model to other information.

Referring to fig. 3, fig. 3 is a schematic block diagram of an embodiment of an electronic device 30 according to the present application. The electronic device 30 comprises a memory 31 and a processor 32 coupled to each other, and the processor 32 is configured to execute program instructions stored in the memory 31 to implement the steps of any of the above-described embodiments of the address recognition method. In one particular implementation scenario, the electronic device 30 may include, but is not limited to: a microcomputer, a server, and the electronic device 30 may also include a mobile device such as a notebook computer, a tablet computer, and the like, which is not limited herein.

In particular, the processor 32 is configured to control itself and the memory 31 to implement the steps of any of the above-described embodiments of the address recognition method. The processor 32 may also be referred to as a CPU (Central Processing Unit). The processor 32 may be an integrated circuit chip having signal processing capabilities. The Processor 32 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 32 may be commonly implemented by an integrated circuit chip.

Referring to fig. 4, fig. 4 is a block diagram of an embodiment of a non-volatile computer readable storage medium 40 according to the present application. The non-transitory computer readable storage medium 40 stores program instructions 401 executable by the processor, the program instructions 401 being for implementing the steps of any of the address identification method embodiments described above.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and for specific implementation, reference may be made to the description of the above method embodiments, and for brevity, details are not described here again.

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely one type of logical division, and an actual implementation may have another division, for example, a unit or a component may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It will be apparent to those skilled in the art that many modifications and variations can be made in the devices and methods while maintaining the teachings of the present application. Accordingly, the above disclosure should be considered limited only by the scope of the following claims.

Claims

1. An address identification method, comprising:

acquiring an image to be identified, wherein the image to be identified comprises address information, and the address information comprises at least one line;

acquiring an address head line frame in the address information and a key value frame corresponding to the address head line frame;

translating the address first row frame and the key value frame to obtain an address second row frame positioned below the address first row frame in the address information;

extracting texts from the address first line frame and the address second line frame, and inputting the extracted texts into an address context recognition model to judge whether the texts of the address second line frame are the context of the texts of the address first line frame; and

and in response to the text of the second line box of the address being below the text of the first line box of the address, determining that the text of the second line box of the address is part of the address information.

2. The address identification method according to claim 1, wherein the obtaining of the address head line frame in the address information and the key value frame corresponding to the address head line frame includes:

acquiring an identified first rectangular frame, wherein the first rectangular frame is identified according to the initial content of the address information;

obtaining the identified key value box, wherein the key value box is identified according to a symbol which is positioned before the starting content and is adjacent to the starting content; and

and in response to that the position information of the first rectangular frame relative to the position information of the key value frame meets a preset condition, determining that the first rectangular frame is the address first line frame.

3. The address recognition method of claim 2, wherein the position information of the first rectangular box includes lower right-hand coordinates (x1i, y1i) and upper left-hand coordinates (x2i, y2i) of the first rectangular box, and the position information of the key value box includes lower right-hand coordinates (x1, y1) and upper left-hand coordinates (x2, y2) of the key value box;

the preset condition includes one of:

y1i <= y2 <= y2i；

y1i < = y1 < = y2 i; and

y1 <= y1i <= y2i <= y2。

4. the address recognition method according to claim 1,

translating the address first line frame and the key value frame to obtain an address second line frame positioned below the address first line frame in the address information, comprising:

translating the key value frame to obtain a translated key value frame;

translating the address first line frame according to the translation of the key value frame to obtain a translated address first line frame;

acquiring an identified second rectangular frame, and acquiring a first intersection ratio of the second rectangular frame and the translated key value frame and a second intersection ratio of the second rectangular frame and the translated address first line frame; and

and in response to the first intersection ratio being smaller than a first threshold value and the second intersection ratio being larger than a second threshold value, determining the second rectangular frame to be the address second row frame.

5. The address recognition method of claim 4,

the first intersection ratio is:

，

the second intersection ratio is:

，

wherein innerArea1 represents an intersection area between the second rectangular box and the translated key-value box, areaNewO represents an area of the translated key-value box, innerArea2 represents an intersection area between the second rectangular box and the translated address head box, arenewi represents an area of the translated address head box, and areaN represents an area of the second rectangular box.

6. The address recognition method of claim 1, further comprising:

translating the key value frame again, and translating the address second line frame according to the translation of the key value frame again to obtain an address third line frame positioned below the address second line frame in the address information;

extracting texts from the second line frame and the third line frame of the address, and inputting the extracted texts into the address context recognition model to judge whether the texts in the third line frame of the address are the context of the texts in the second line frame of the address;

in response to the text of the address third line box being below the text of the address second line box, determining the text of the address third line box to be another portion of the address information.

7. The address recognition method of any one of claims 1-6, wherein obtaining the address context recognition model comprises:

initializing parameters of the address context identification model;

inputting a training data set into the address context recognition model for training, and learning parameters of the address context recognition model based on a loss function, thereby obtaining the address context recognition model.

8. The address recognition method according to claim 7, wherein the training data set includes a positive sample and a negative sample, wherein the positive sample includes a plurality of pairs of two consecutive sentences, the plurality of pairs of two consecutive sentences are obtained by splitting address sentences according to separation positions and separation times, and the negative sample includes a part of the plurality of pairs of two consecutive sentences and negative sample data.

9. An electronic device comprising a memory coupled to a processor and a processor configured to execute program instructions stored in the memory to implement the address recognition method of any of claims 1 to 8.

10. A non-transitory computer readable storage medium having stored thereon program instructions, wherein the program instructions, when executed by a processor, implement the address recognition method of any one of claims 1 to 8.