CN114359913A

CN114359913A - Text label determination method and related device

Info

Publication number: CN114359913A
Application number: CN202210004883.8A
Authority: CN
Inventors: 刘镇熙
Original assignee: Shenzhen Ideamake Software Technology Co Ltd
Current assignee: Shenzhen Ideamake Software Technology Co Ltd
Priority date: 2022-01-04
Filing date: 2022-01-04
Publication date: 2022-04-15

Abstract

The embodiment of the application discloses a text label determining method and a related device, wherein the method comprises the following steps: the method comprises the steps of obtaining a target text in a target picture and a target text box characteristic vector corresponding to the target text, inputting the target text and the target text box characteristic vector into a pre-trained label determination model, and determining a first label corresponding to the target text, wherein the label determination model is obtained by training a plurality of training texts, data after splicing of the training text box characteristic vectors corresponding to the training texts and a second label corresponding to the training texts, and the second label is a preset label. According to the method and the device, the training text box characteristic vectors and the training texts are spliced, the label determination model is trained through the spliced training text box characteristic vectors and the training texts, the output result of the trained label determination model is the corresponding second label of the training texts, and the accuracy of determining the text labels is improved.

Description

Text label determination method and related device

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method for determining a text label and a related apparatus.

Background

With the development of science and technology, multimedia resources including text pictures are becoming larger and larger. Text retrieval is becoming a research focus in the field of natural language processing, and many text retrieval methods based on Optical Character Recognition (OCR) technology are being generated, which identify text contents from pictures and then implement text picture retrieval systems using text retrieval technology. Through the existing picture extraction technology, after a text is extracted from a picture, information of the text is extracted, so that the text is labeled according to the recognition and extraction results, and the extraction and the recognition of the text of the picture are two independent processes, so that the efficiency of labeling the text in the picture with a corresponding label is low, and the accuracy is low.

Disclosure of Invention

The embodiment of the application provides a text label determining method and a related device, which can determine a label of a target text in a target picture after splicing the target text in the target picture with a feature vector of a target text box, so that the accuracy of determining the text label is improved.

In a first aspect, an embodiment of the present application provides a method for determining a text label, where the method includes:

acquiring a target picture;

extracting a target text in the target picture and a target text box feature vector corresponding to the target text;

inputting the target text and the target text box feature vector into a pre-trained label determination model, and determining a first label corresponding to the target text, wherein the label determination model is obtained by training a plurality of training texts, spliced data of training text box feature vectors corresponding to the training texts and a second label corresponding to the training texts, the training text box feature vector comprises vertex coordinates of an area where the training texts are located in a training picture and a ratio of the length of the oblique edge of the area to the length of the oblique edge of the training picture, and the second label is a preset label.

In a second aspect, an apparatus for determining a text label provided in an embodiment of the present application includes:

a first acquisition unit for acquiring a target picture;

the extraction unit is used for extracting a target text in the target picture and a target text box feature vector corresponding to the target text;

the first input unit is used for inputting the target text and the target text box feature vector into a pre-trained label determination model, and determining a first label corresponding to the target text, wherein the label determination model is obtained by training a plurality of training texts, data obtained by splicing the training text box feature vectors corresponding to the training texts, and a second label corresponding to the training text, the training text box feature vector comprises vertex coordinates of an area where the training text is located in a training picture, and a ratio of the length of the oblique side of the area to the length of the oblique side of the training picture, and the second label is a preset label.

In a third aspect, an embodiment of the present application provides a terminal device, where the terminal device includes a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the program includes instructions for performing some or all of the steps described in the method of the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium storing a computer program for electronic data exchange, where the computer program causes a computer to perform some or all of the steps in the first aspect of the present embodiment.

In the embodiment, according to the technical scheme provided by the application, after the target picture is obtained, the target text in the target picture and the target text box feature vector corresponding to the target text are extracted; and inputting the target text and the feature vector of the target text box into a pre-trained label determination model, and determining a first label corresponding to the target text. The label determination model is obtained by training a plurality of training texts, data obtained after splicing training text box feature vectors corresponding to the training texts and second labels corresponding to the training texts, wherein the training text box feature vectors comprise vertex coordinates of an area where the training texts are located in a training picture and a ratio of the diagonal length of the area to the diagonal length of the training picture, the second labels are preset labels, and the training texts, the vertex coordinates of the area where the training texts are located and the ratio of the diagonal length of the area where the training texts are located to the diagonal length of the training picture are used for ensuring that the training texts are texts corresponding to the second labels and improving the accuracy of label determination model identification.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic diagram of an application architecture provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of a text label determination method provided in an embodiment of the present application;

fig. 3 is a schematic diagram of a text label determination method provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a training process of a label determination model according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a label determination model training process provided by an embodiment of the present application;

fig. 6 is a block diagram illustrating functional units of an apparatus for determining text labels according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to facilitate understanding of the technical solutions provided in the present application, first, related concepts related to the present application are explained.

A first label: the first tag is used for identifying target text information of the target text, for example, the content of the first tag is as follows: the house area, the recognized target text is: and 80 square meters, after the target text is recognized, if the label corresponding to the target text is determined to be the area of the house, marking a first label on the target text to indicate that the area of the house is 80.

A second label: the second label is used for identifying training text information of the training text, for example, the second label is: the house type name, the recognized training text is: and XX family, after the target text is recognized, if the label corresponding to the training text is determined to be the house type name, marking a first label on the training text to represent that the house type name is XX family.

It is understood that the first tag and the second tag are preset tags, and may be set according to actual requirements, for example, the first tag and the second tag may also be project names, house sizes, and the like, and are not limited herein.

Training a feature vector of the text box, wherein the feature vector of the training text box comprises vertex coordinates of a region of a training text in a training picture and a ratio of the length of an oblique side of the region of the training text to the length of the oblique side of the training picture, the feature vector of the training text box further comprises other information obtained when the training picture is extracted, such as a relative confidence of a recognition confidence of the training text relative to the area of the training text box, and the like.

The target text box feature vector comprises vertex coordinates of a region of a target text in a target picture and a ratio of the length of an inclined edge of the region of the target text to the length of the inclined edge of the target picture, and also comprises other information obtained when the target picture is extracted, such as a relative confidence of a recognition confidence of the target text relative to the area of the target text box, and the like.

Optical Character Recognition (OCR) refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method; the method is characterized in that characters in a paper document are converted into an image file with a black-white dot matrix in an optical mode aiming at print characters, and the characters in the image are converted into a text format through recognition software for further editing and processing by word processing software.

The embodiment of the application provides a method for determining a text label, which comprises the following steps:

acquiring a target picture; extracting a target text in the target picture and a target text box feature vector corresponding to the target text; inputting the target text and the target text box feature vector into a pre-trained label determination model, and determining a first label corresponding to the target text, wherein the label determination model is obtained by training a plurality of training texts, spliced data of training text box feature vectors corresponding to the training texts and a second label corresponding to the training texts, the training text box feature vector comprises vertex coordinates of an area where the training texts are located in a training picture and a ratio of the length of the oblique edge of the area to the length of the oblique edge of the training picture, and the second label is a preset label.

By means of the training text, the vertex coordinates of the region where the training text is located and the ratio of the diagonal length of the region where the training text is located to the diagonal length of the training picture, the training text is guaranteed to be the label corresponding to the second label, and the accuracy of label determination model identification is improved.

Referring to fig. 1, fig. 1 is a schematic diagram of an application architecture provided in the embodiment of the present application, including a server 110 and a terminal device 120. The terminal device 120 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited thereto, and various Applications (APPs), such as an OCR recognition program, may be installed on the terminal device 120.

The server 110 may provide various network services for the terminal device 120, and the server 110 may be a corresponding background server for different applications. The server 110 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.

The terminal device 120 and the server 110 may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. For example, the terminal device 120 and the server 110 are connected via the internet to communicate with each other. Optionally, the internet described above uses standard communication techniques and/or protocols. The internet is typically the internet, but can be any Network including, but not limited to, a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), any combination of mobile, wireline or wireless networks, private or virtual private networks.

It should be noted that, the method for determining the tag in the embodiment of the present application is mainly executed by the terminal device 120, for example, the tag determination model is located on the terminal device 120, the user inputs the target picture at the terminal device 120, extracts the target text of the target picture and the feature vector of the target text box, the terminal device 120 identifies the target text and the feature vector of the target text box through the tag determination model, determines the tag of the target text, and outputs the tag corresponding to the target text and the target text. The method specifically comprises the following steps: the terminal device 120 obtains a target picture; inputting a target text and a target text box feature vector in a target picture into a label determination model to obtain a first label corresponding to the target text, wherein the target text and the target text box feature vector can be obtained through models such as an OCR (optical character recognition) model or a double-tower model, the target text box feature vector comprises vertex coordinates of a region where the target text is located in the target picture and a ratio of the length of an inclined edge of the region where the target text is located to the length of an inclined edge of the training picture, and the vertex coordinates are coordinates corresponding to the target text. And the vertex coordinates of different target texts are different, other information such as the vertex coordinates, the ratio and the like is spliced to the target text and then input into the label determination model, so that the first label is obtained, and the determination accuracy is improved.

The application architecture shown in fig. 1 is described by way of example as being applied to the terminal device 120 side, but it is needless to say that the speech recognition method in the embodiment of the present application may be executed by the server 110. For example, the tag determination model is located on the server 110, the user inputs a target picture at the terminal device 120 and needs to extract a target text and a target text box feature vector in the target picture, and the terminal device 120 sends the target text and the target text box feature vector to the server 110 and sends a determination request. The server 110 determines the label of the target text for the target text and the target text box feature vector through the label determination model after receiving the determination request, and outputs the first label and the target text after determining the first label of the target text. The method specifically comprises the following steps: the server 110 obtains the target text and the target text box feature vector, and inputs the target text and the target text box feature vector into the tag determination model to obtain the first tag.

The application architecture diagram in the embodiment of the present application is provided to more clearly illustrate the technical solutions in the embodiment of the present application, and does not limit the technical solutions provided in the embodiment of the present application, and the technical solutions provided in the embodiment of the present application are also applicable to similar problems for other application architectures and applications.

Based on the above embodiment, please refer to fig. 2, and fig. 2 is a schematic flowchart of a method for determining a text label in the embodiment of the present application, and the method includes the following steps.

S210: and acquiring a target picture.

It can be understood that the target picture in the embodiment of the present application may be different types of pictures, for example, a house type diagram, an engineering diagram, and the like, and the embodiment of the present application is mainly described by taking the house type diagram as an example.

S220: and extracting a target text in the target picture and a target text box feature vector corresponding to the target text.

The target picture comprises a plurality of target texts, each target text in the plurality of target texts is provided with a corresponding target text box characteristic vector, each target text in the target picture and the target text box characteristic vector corresponding to the target text are extracted, and preparation is made for a subsequent input label determination model.

The target text box feature vector comprises vertex coordinates of a region where the target text is located in the target picture and the ratio of the length of the oblique edge of the region to the length of the oblique edge of the target picture. Wherein each target text and the target text box feature vector corresponding to each target text can be extracted by the OCR model. The OCR model comprises a text extraction model and a text recognition model, the text extraction model can detect text lines in the target picture through methods such as DB, PSENET and the like, and the text recognition model recognizes the text lines by using algorithms such as CRNN, RARE and the like so as to recognize specific characters.

Specifically, after the terminal device 120 acquires the target picture, the OCR model acquires the target texts in the target picture, and when a plurality of target texts are acquired, the plurality of target texts are collected in the same list, and information of a region where each target text is located, for example, four vertex coordinates of the region where the target text is located, a spacing distance between adjacent regions, line information of the region where the target text is located, a confidence of text recognition, and the like, are acquired to form a vector table. And processing each vector in the vector table to obtain a target text box feature vector. The specific treatment comprises the following steps: after the vertex coordinates of the target picture are obtained, the length and width values of the target picture are calculated according to the vertex coordinates of the target picture, the length of the bevel edge of the target picture is further obtained, the length and width of the region where the target text is located are calculated according to the vertex coordinates of the region where the target text is located, and the length of the bevel edge of the region where the target text is located is further obtained. And dividing the length of the bevel edge of the region where the target text is located by the length of the bevel edge of the target picture to obtain the ratio of the length of the bevel edge of the region where the target text is located to the length of the bevel edge of the target picture, wherein the feature vector of the target text box comprises the ratio and the vertex coordinates. The obtaining of the line information of the region where the target text is located includes: and obtaining information such as thickness of lines, color of lines and the like in the area of the target text. Further, symbol information in an area where the target text is located or in a set range of the target text may be obtained, after the symbol is obtained, corresponding symbol information is searched in a symbol set, for example, a gate symbol is obtained, the symbol is enlarged or reduced, and the same symbol is searched in the symbol set, so that the symbol corresponding information is obtained as the gate, and the gate symbol information is output to determine the first tag. The symbol set is preset symbol information with different symbols and corresponding to each symbol in the different symbols.

Further, the target text and the target text box feature vector can be preprocessed, and repeated target text box feature vectors are removed.

S230: inputting the target text and the target text box feature vector into a pre-trained label determination model, and determining a first label corresponding to the target text, wherein the label determination model is obtained by training a plurality of training texts, spliced data of training text box feature vectors corresponding to the training texts and a second label corresponding to the training texts, the training text box feature vector comprises vertex coordinates of an area where the training texts are located in a training picture and a ratio of the length of the oblique edge of the area to the length of the oblique edge of the training picture, and the second label is a preset label.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating a method for determining a text label according to an embodiment of the present disclosure. And inputting the obtained target text and the feature vector of the target text box into a pre-trained label determination model, and determining a first label corresponding to the target text, wherein the first label is a label for identifying the target text. For example, the target picture is a house type graph, and a target text in the house type graph is extracted: and 30 square meters, and determining that the target text is the house type area through the target text and the feature vector of the target text box, and determining a first label, wherein the content of the first label is the house type area. Because different texts have the specified format and position requirements in the user-type graph, the first label corresponding to the target text is determined according to the spliced information of the target text box feature vector and the target text, and the accuracy of determination is improved.

The label determination model is obtained by training a plurality of training texts, data obtained after splicing of training text box feature vectors corresponding to the training texts and second labels corresponding to the training texts, wherein the training text box feature vectors comprise vertex coordinates of a region where the training texts are located in a training picture and a ratio of diagonal length of the region to diagonal length of the training picture, the second labels are preset labels, different texts have specified format and position requirements, and therefore the model is determined through the spliced training text box feature vectors and the training text training labels, and accuracy of label determination model determination is improved.

Referring to fig. 4 and fig. 5, fig. 4 is a schematic diagram of a training process of a label determination model according to an embodiment of the present application, and fig. 5 is a schematic diagram of a training process of a label determination model according to an embodiment of the present application.

S410: and acquiring a plurality of training texts, the training text box feature vectors corresponding to the training texts and the second labels corresponding to the training texts.

Before obtaining a plurality of training texts, obtaining a training picture, wherein the training picture comprises at least one training text and a training text box feature vector corresponding to the training text, and setting a second label for each training text in advance.

Extracting all training texts in the training picture: text1, text 2. cndot. constitute training text list [ text1, text2 … textn ], and information of each region where training text exists, such as coordinates of four vertices of the region where training text exists, confidence of text recognition, and the like, constitute vector tables [ info1, info2 … info3 ]. Processing the information of the region where each training text in the vector table is located into training text box feature vectors, and forming a plurality of training text box feature vectors into [ info _ feat1, info _ feat2, … and info _ featm ], wherein the specific processing procedures are as follows: after the vertex coordinates of the training pictures are obtained, calculating the length and width values of the training pictures according to the vertex coordinates of the training pictures, and further obtaining the bevel edge length of the training pictures; and calculating the length and width of the region of the training text according to the vertex coordinates of the region of the training text, and further acquiring the length of the diagonal side of the region of the training text. And dividing the length of the oblique edge of the region where the training text is located by the length of the oblique edge of the training picture to obtain the ratio of the length of the oblique edge of the region where the training text is located to the length of the oblique edge of the training picture, wherein the training text box feature vector comprises the vertex coordinate and the ratio.

Before the splicing the training text and the training text box feature vector, the method further comprises: and preprocessing the training text, the training text box feature vector and the second label, and removing the repeated training text, the training text box feature vector and the second label.

S420: and splicing the training text with the feature vector of the training text box to obtain a first matrix.

Wherein the splicing the training text and the training text box feature vector to obtain a first matrix comprises: converting the training text into text numbers through a text dictionary; converting the text number into a text vector through a semantic representation model to obtain a second matrix, wherein the second matrix comprises the text vector; converting the training text box feature vector into a third matrix with dimension 1 and a fourth matrix with dimension k through a fully connected neural network, wherein k is a hyper-parameter; determining a fifth matrix according to the third matrix and the second matrix, and determining a sixth matrix according to the fourth matrix and the second matrix; and determining the first matrix according to the fifth matrix and the sixth matrix.

Specifically, the training text is converted into text numbers through a text dictionary; each [ text1, text2 … textn ] training text in the list is converted to a textual number by a text dictionary.

Converting text numbers into text vectors through a semantic representation model to obtain a second matrix, wherein the second matrix comprises the text vectors; each text digit is converted into a text vector by a semantic representation model, obtaining a second matrix seq1 seq2 … seq 3. Converting the training text box feature vector into a third matrix with dimension 1 and a fourth matrix with dimension k through a full-connection neural network, wherein k is a hyper-parameter; determining a fifth matrix according to the third matrix and the second matrix, and determining a sixth matrix according to the fourth matrix and the second matrix; and determining the first matrix according to the fifth matrix and the sixth matrix.

Wherein, text box feature vectors [ info _ feat1, info _ feat2, …, info _ feat]Conversion into a third matrix m with dimension 1 by a fully connected neural network₁And a fourth matrix m2 with dimension k, where k is a hyperparameter, which is an unknown variable but which is different from the parameters in the training process, which is a parameter that can have an effect on the parameters obtained from the training, requiring manual input by the trainer and adjustments to be made to optimize the effectiveness of the training model. Determining a fifth matrix from the third matrix and the second matrix, based onThe fourth matrix and the second matrix determine a sixth matrix. Optionally, the determining a fifth matrix according to the third matrix and the second matrix, and the determining a sixth matrix according to the fourth matrix and the second matrix includes: splicing the third matrix before the second matrix through a CONCATENATE function to obtain a fifth matrix, and splicing the fourth matrix after the second matrix to obtain a sixth matrix. That is, m1 can be spliced to [ seq1 seq2 … seq3] by the CONCATENATE method]Previously, a fifth matrix was formed, and m2 could be spliced to [ seq1 seq2 … seq3] using the CONCATENATE method]Thereafter, a sixth matrix is formed, and the first matrix [ s1, s2, …, sn ] is determined from the fifth and sixth matrices]。

In other embodiments, a fifth matrix may be obtained by splicing the third matrix before the second matrix and a sixth matrix may be obtained by splicing the fourth matrix after the second matrix by using a dimension expansion and re-element addition method.

Specifically, before determining the fifth matrix according to the third matrix and the second matrix, the method further includes: processing the third matrix by applying an activation function to obtain a seventh matrix; the determining a fifth matrix from the third matrix and the second matrix comprises: and determining a fifth matrix according to the seventh matrix and the second matrix.

And (3) processing the m1 activation function to obtain a seventh matrix, wherein the activation function introduces nonlinear factors to the neurons, so that the neural network can arbitrarily approximate any nonlinear function, and the neural network can be applied to the activation functions in a plurality of nonlinear models, such as Sigmoid functions or Tanh functions.

Wherein the seventh matrix is spliced to [ seq ] by using a CONCATENATE method₁,seq₂…seq_n]Previously, a fifth matrix is formed.

S430: and inputting the first matrix into the label determination model to obtain a third label, and adjusting the label determination model according to the difference between the third label and the second label until a training end condition is reached to obtain the label determination model.

Specifically, after a first matrix [ s1, s2, …, sn ] is obtained, the first matrix is input into a label determination model, a third label of each training text is obtained through a CRF layer, the third label is compared with a preset label, the label determination model is adjusted according to the difference between the third label and the second label, and if the third label is different from the second label, the label determination model is adjusted until the output third label of each training text is the same as the preset second label. Wherein the CRF layer may add some constraints to the last predicted tag to ensure that the predicted tag is legitimate.

After the first label corresponding to the target text is determined, the target text corresponding to the first label is obtained; splicing the target text and the first label; and outputting the spliced first label and the target text.

Specifically, the above description mainly introduces the solution of the embodiment of the present application from the perspective of the method-side implementation process. It is understood that the terminal device includes hardware structures and/or software modules for performing the respective functions in order to implement the functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the terminal device may be divided into the functional units according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

Referring to fig. 6, fig. 6 is a block diagram illustrating functional units of an apparatus for determining text labels according to an embodiment of the present application, where the apparatus includes: a first acquisition unit 610, an extraction unit 620, and a first input unit 630, wherein:

the first obtaining unit 610 is configured to obtain a target picture;

the extracting unit 620 is configured to extract a target text in the target picture and a target text box feature vector corresponding to the target text;

the first input unit 630 is configured to input the target text and the target text box feature vector into a pre-trained label determination model, and determine a first label corresponding to the target text, where the label determination model is obtained by training a plurality of training texts, data obtained by splicing the training text box feature vectors corresponding to the training texts, and a second label corresponding to the training text, where the training text box feature vector includes vertex coordinates of an area where the training text is located in a training picture, and a ratio of a length of an oblique edge of the area to a length of an oblique edge of the training picture, and the second label is a preset label.

Further, the apparatus further comprises:

a second obtaining unit, configured to obtain a plurality of training texts, the training text box feature vectors corresponding to the training texts, and the second labels corresponding to the training texts;

the first splicing unit is used for splicing the training text and the feature vector of the training text box to obtain a first matrix;

and the second input unit is used for inputting the first matrix into the label determination model to obtain a third label, and adjusting the label determination model according to the difference between the third label and the second label until a training end condition is reached to obtain the label determination model.

Further, the apparatus further comprises:

and the preprocessing unit is used for preprocessing the training text, the training text box feature vector and the second label and removing the repeated training text, the training text box feature vector and the second label.

Further, the first splicing unit is further configured to:

converting the training text into text numbers through a text dictionary;

converting the text number into a text vector through a semantic representation model to obtain a second matrix, wherein the second matrix comprises the text vector;

converting the training text box feature vector into a third matrix with dimension 1 and a fourth matrix with dimension k through a fully connected neural network, wherein k is a hyper-parameter;

determining a fifth matrix according to the third matrix and the second matrix, and determining a sixth matrix according to the fourth matrix and the second matrix;

and determining the first matrix according to the fifth matrix and the sixth matrix.

The first splicing unit is further configured to:

splicing the third matrix before the second matrix through a CONCATENATE function to obtain a fifth matrix, and splicing the fourth matrix after the second matrix to obtain a sixth matrix.

Further, the apparatus further comprises:

the activation function processing unit is used for processing the third matrix by applying an activation function to obtain a seventh matrix;

and the determining unit is used for determining a fifth matrix according to the seventh matrix and the second matrix.

Further, the apparatus further comprises:

a third obtaining unit, configured to obtain the target text corresponding to the first tag;

the second splicing unit is used for splicing the target text and the first label;

and the output unit is used for outputting the spliced first label and the target text.

Referring to fig. 7, fig. 7 is a terminal device according to an embodiment of the present application, where the terminal device includes: a processor, a memory, a transceiver, and one or more programs. The processor, memory and transceiver are interconnected by a communication bus.

The processor may be one or more Central Processing Units (CPUs), and in the case of one CPU, the CPU may be a single-core CPU or a multi-core CPU.

The one or more programs are stored in the memory and configured to be executed by the processor; the program includes instructions for performing the steps of:

acquiring a target picture;

It should be noted that, for a specific implementation process in the embodiment of the present application, reference may be made to the specific implementation process described in the foregoing method embodiment, and details are not described herein again.

Embodiments of the present application also provide a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any one of the methods as described in the above method embodiments.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiments of the present application.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a U disk, a ROM, a RAM, a removable hard disk, a magnetic disk, or an optical disk.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash disk, ROM, RAM, magnetic or optical disk, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method for determining a text label, the method comprising:

acquiring a target picture;

2. The method of claim 1, wherein the label determination model is trained by:

acquiring a plurality of training texts, the training text box feature vectors corresponding to the training texts and the second labels corresponding to the training texts;

splicing the training text and the feature vector of the training text box to obtain a first matrix;

and inputting the first matrix into the label determination model to obtain a third label, and adjusting the label determination model according to the difference between the third label and the second label until a training end condition is reached to obtain the label determination model.

3. The method of claim 2, wherein prior to the concatenating the training text with the training text box feature vector, the method further comprises:

and preprocessing the training text, the training text box feature vector and the second label, and removing the repeated training text, the training text box feature vector and the second label.

4. The method of claim 2, wherein the concatenating the training text with the training text box feature vector to obtain a first matrix comprises:

converting the training text into text numbers through a text dictionary;

5. The method of claim 4, wherein determining a fifth matrix from the third matrix and the second matrix, and wherein determining a sixth matrix from the fourth matrix and the second matrix comprises:

6. The method of claim 4, wherein prior to determining a fifth matrix from the third matrix and the second matrix, the method further comprises:

processing the third matrix by applying an activation function to obtain a seventh matrix;

the determining a fifth matrix from the third matrix and the second matrix comprises:

and determining a fifth matrix according to the seventh matrix and the second matrix.

7. The method of claim 1, wherein after determining the first label corresponding to the target text, the method further comprises:

acquiring the target text corresponding to the first label;

splicing the target text and the first label;

and outputting the spliced first label and the target text.

8. An apparatus for text label determination, the apparatus comprising:

a first acquisition unit for acquiring a target picture;

9. A terminal device, characterized in that the terminal device comprises a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for carrying out the steps in the method according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the steps in the method according to any of claims 1-7.