CN114282543A

CN114282543A - Text data processing method and device, computer equipment and storage medium

Info

Publication number: CN114282543A
Application number: CN202110917810.3A
Authority: CN
Inventors: 卢东焕; 何楠君; 魏东; 宁慕楠; 马锴; 郑冶枫
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2022-04-05

Abstract

The embodiment of the application discloses a text data processing method and device, computer equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: acquiring fusion semantic information and fusion spatial information based on the first text data and the second text data; performing cross processing on the first semantic features corresponding to the fusion semantic information and the first spatial features corresponding to the fusion spatial information to obtain second semantic features and second spatial features respectively; combining the second semantic features and the second spatial features to obtain combined features; and matching the combined features to obtain a matching result. The combined features cover semantic features and spatial features of the two text data, matching processing is carried out on the combined features, a matching result of the two text data can be obtained, the method can realize matching of any two text data, and adaptability and robustness of matching of the text data are improved.

Description

Text data processing method and device, computer equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a text data processing method and device, computer equipment and a storage medium.

Background

At present, in many scenes, an OCR (Optical Character Recognition) technology is used for text Recognition, and then, the recognized text data needs to be structured to determine a keyword therein and description information corresponding to the keyword.

In the related art, after the text data is recognized, the text data is matched with a keyword preset in a dictionary, so that the keyword in the text data is found, and then the description information corresponding to the keyword is found in the text data based on a preset rule. However, keywords that are not included in the dictionary may appear in the text data, or the structure of the text data is variable, so that the preset rule is not satisfied between the keywords and the corresponding description information, and thus the adaptability and robustness of the above-described technique are poor.

Disclosure of Invention

The embodiment of the application provides a text data processing method, a text data processing device, computer equipment and a medium, and improves the adaptability and robustness of text data matching. The technical scheme is as follows:

in one aspect, a text data processing method is provided, and the method includes:

acquiring fusion semantic information and fusion spatial information based on the first text data and the second text data;

performing cross processing on the first semantic features corresponding to the fusion semantic information and the first spatial features corresponding to the fusion spatial information to obtain second semantic features and second spatial features respectively; combining the second semantic features and the second spatial features to obtain combined features;

matching the combined features to obtain a matching result, wherein the matching result indicates whether the first text data and the second text data are matched;

the first text data and the second text data are any two text data recognized from the same object, the fusion semantic information represents the semantics of the first text data and the second text data, and the fusion spatial information represents the positions of the first text data and the second text data in the object.

In another aspect, there is provided a text data processing apparatus, the apparatus including:

the information acquisition module is used for acquiring fusion semantic information and fusion spatial information based on the first text data and the second text data;

the cross processing module is used for carrying out cross processing on the first semantic features corresponding to the fusion semantic information and the first spatial features corresponding to the fusion spatial information to respectively obtain second semantic features and second spatial features; combining the second semantic features and the second spatial features to obtain combined features;

the matching processing module is used for matching the combined features to obtain a matching result, and the matching result indicates whether the first text data and the second text data are matched or not;

Optionally, the information obtaining module includes:

an information acquisition unit configured to acquire first semantic information and first spatial information of the first text data, and second semantic information and second spatial information of the second text data;

the first fusion unit is used for fusing the first semantic information and the second semantic information to obtain fused semantic information;

and the second fusion unit is used for fusing the first spatial information and the second spatial information to obtain the fused spatial information.

Optionally, the information obtaining unit is configured to:

adding vectors corresponding to each character in the first text data to obtain the first semantic information, and acquiring the first spatial information based on the vertex coordinates of a text box in which the first text data is positioned in the object;

and adding vectors corresponding to each character in the second text data to obtain the second semantic information, and acquiring the second spatial information based on the vertex coordinates of the text box in which the second text data is positioned in the object.

Optionally, the cross-processing module includes:

the first acquisition unit is used for acquiring a first query feature, a first key feature and a first value feature corresponding to the first semantic feature;

the first obtaining unit is further configured to obtain a second query feature, a second key feature, and a second value feature corresponding to the first spatial feature;

a second obtaining unit, configured to obtain the second semantic feature based on the second query feature, the first key feature, and the first value feature;

the second obtaining unit is further configured to obtain the second spatial feature based on the first query feature, the second key feature, and the second value feature.

Optionally, the first obtaining unit is configured to:

multiplying the first semantic feature by a parameter matrix to obtain a semantic matrix, and acquiring the first query feature, the first key feature and the first value feature based on the semantic matrix;

the first obtaining unit is further configured to:

and obtaining a space matrix by multiplying the first space characteristic and the parameter matrix, and acquiring the second query characteristic, the second key characteristic and the second value characteristic based on the space matrix.

Optionally, the second obtaining unit is configured to:

normalizing the product of the second query feature, the first key feature and a scaling factor to obtain a first normalized feature; determining the product of the first normalized feature and the first value feature as the second semantic feature;

the second obtaining unit is further configured to:

normalizing the product of the first query feature, the second key feature and the scaling factor to obtain a second normalized feature; determining a product of the second normalized feature and the second value feature as the second spatial feature.

Optionally, the apparatus further comprises:

a feature segmentation module, configured to divide the first query feature, the first key feature, and the first value feature into a plurality of first query sub-features, a plurality of first key sub-features, and a plurality of first value sub-features, respectively; dividing the second query feature, the second key feature and the second value feature into a plurality of second query sub-features, a plurality of second key sub-features and a plurality of second value sub-features, respectively;

the second obtaining unit is configured to:

respectively acquiring a plurality of second semantic sub-features based on the plurality of second query sub-features, the plurality of first key sub-features and the plurality of first value sub-features, and splicing the plurality of second semantic sub-features to obtain the second semantic features;

the second obtaining unit is further configured to:

and respectively acquiring a plurality of second spatial sub-features based on the plurality of first query sub-features, the plurality of second key sub-features and the plurality of second value sub-features, and splicing the plurality of second spatial sub-features to obtain the second spatial features.

Optionally, the text matching model comprises: a feature extraction network, a cross processing network and a matching network; the device further comprises:

the feature extraction module is used for calling the feature extraction network, and respectively extracting features of the fusion semantic information and the fusion spatial information to obtain a third semantic feature and a third spatial feature;

the cross processing module is used for calling the cross processing network and respectively extracting the third semantic feature and the third spatial feature to obtain the first semantic feature and the first spatial feature; performing cross processing on the first semantic features and the first spatial features to obtain second semantic features and second spatial features respectively; combining the second semantic features and the second spatial features to obtain the combined features;

and the matching processing module is used for calling the matching network and matching the combined features to obtain the matching result.

Optionally, the cross-processing network is configured to:

acquiring the first query feature, the first key feature and the first value feature corresponding to the first semantic feature;

acquiring the second query feature, the second key feature and the second value feature corresponding to the first spatial feature;

obtaining the second semantic feature based on the second query feature, the first key feature and the first value feature;

and acquiring the second spatial feature based on the first query feature, the second key feature and the second value feature.

Optionally, the text matching model comprises a plurality of the cross-processing networks; the cross-processing module comprises:

the cross processing unit is used for respectively extracting the third semantic feature and the third spatial feature based on a first cross processing network to obtain the first semantic feature and the first spatial feature; performing cross processing on the first semantic feature and the first spatial feature to obtain a fourth semantic feature and a fourth spatial feature respectively;

the cross processing unit is further configured to perform feature extraction and cross processing on the fourth semantic feature and the fourth spatial feature respectively based on a second cross processing network until the second semantic feature and the second spatial feature output by the last cross processing network are obtained;

and the combining unit is used for combining the second semantic features and the second spatial features to obtain the combined features.

Optionally, the apparatus further comprises:

the sample acquisition module is used for acquiring sample fusion semantic information and sample fusion spatial information based on the first sample text data and the second sample text data;

the characteristic extraction module is used for respectively extracting the characteristics of the sample fusion semantic information and the sample fusion spatial information based on the characteristic extraction network to obtain a first sample semantic characteristic and a first sample spatial characteristic;

the cross processing module is further configured to perform cross processing on the first sample semantic feature and the first sample spatial feature based on the cross processing network to obtain a second sample semantic feature and a second sample spatial feature, respectively; combining the semantic features of the second sample with the spatial features of the second sample to obtain sample combination features;

the matching processing module is further used for matching the sample combination characteristics based on the matching network to obtain a sample matching result;

and the model training module is used for training the text matching model based on the matching result of the first sample text data and the second sample text data and the sample matching result.

In another aspect, a computer device is provided, which includes a processor and a memory, wherein at least one computer program is stored in the memory, and the at least one computer program is loaded and executed by the processor to implement the operations performed in the text data processing method according to the above aspect.

In another aspect, there is provided a computer-readable storage medium having at least one computer program stored therein, the at least one computer program being loaded and executed by a processor to implement the operations performed in the text data processing method according to the above aspect.

In another aspect, a computer program product or a computer program is provided, the computer program product or the computer program comprising computer program code stored in a computer-readable storage medium, the computer program code being read by a processor of a computer device from the computer-readable storage medium, the computer program code being executed by the processor so that the computer device implements the operations performed in the text data processing method according to the above aspect.

In the technical scheme provided by the embodiment of the application, the first semantic features represent the features of the fusion semantic information of the first text data and the second text data, the first spatial features represent the features of the fusion spatial information of the first text data and the second text data, and the second semantic features and the second spatial features after the first semantic features and the first spatial features are subjected to cross processing are combined to obtain the combined features covering the semantic features and the spatial features of the two text data.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;

fig. 2 is a flowchart of a text data processing method provided in an embodiment of the present application;

FIG. 3 is a flow chart of another text data processing method provided in the embodiments of the present application;

FIG. 4 is a schematic structural diagram of a text matching model provided in an embodiment of the present application;

FIG. 5 is a schematic structural diagram of another text matching model provided in an embodiment of the present application;

FIG. 6 is a flow chart of another text data processing method provided in the embodiments of the present application;

FIG. 7 is a schematic structural diagram of a feature extraction layer provided in an embodiment of the present application;

FIG. 8 is a schematic structural diagram of an intersection processing layer provided in an embodiment of the present application;

FIG. 9 is a flowchart of a text matching model training method according to an embodiment of the present disclosure;

fig. 10 is a flowchart of a text data processing method according to an embodiment of the present application;

FIG. 11 is a schematic illustration of a medical text image provided by an embodiment of the present application;

fig. 12 is a schematic structural diagram of a text data processing apparatus according to an embodiment of the present application;

FIG. 13 is a schematic structural diagram of another text data processing apparatus according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.

It will be understood that the terms "first," "second," and the like as used herein may be used herein to describe various concepts, which are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, a first semantic feature may be referred to as a second semantic feature and a second semantic feature may be referred to as a first semantic feature without departing from the scope of the present application.

As used herein, the terms "at least one," "a plurality," "each," "any," and the like, at least one comprises one, two, or more than two, and a plurality comprises two or more than two, each referring to each of the corresponding plurality, and any referring to any one of the plurality. For example, the plurality of cross processing networks includes 3 cross processing networks, each cross processing network refers to each of the 3 cross processing networks, and any one of the 3 cross processing networks refers to any one of the 3 cross processing networks, which may be a first one, a second one, or a third one.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

Computer Vision technology (CV) is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes image processing, image Recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and map construction, automatic driving, smart transportation and other technologies, and also includes common biometric identification technologies such as face Recognition and fingerprint Recognition.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

The text data processing method provided by the embodiment of the present application will be described below based on an artificial intelligence technique, a computer vision technique, and a natural language processing technique.

The text data processing method provided by the embodiment of the application can be used in computer equipment. Optionally, the computer device is a terminal or a server. Optionally, the server is an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like. Optionally, the terminal is a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited thereto.

In one possible implementation, the computer program according to the embodiments of the present application may be deployed to be executed on one computer device or on multiple computer devices located at one site, or may be executed on multiple computer devices distributed at multiple sites and interconnected by a communication network, where the multiple computer devices distributed at the multiple sites and interconnected by the communication network can form a block chain system.

In one possible implementation manner, the computer device for training the text matching model in the embodiment of the present application is a node in the blockchain system, and the node is capable of storing the trained text matching model in the blockchain, and then the node or nodes corresponding to other devices in the blockchain may perform matching processing on any two text data based on the text matching model.

Fig. 1 is a schematic diagram of an implementation environment provided in an embodiment of the present application, and referring to fig. 1, the implementation environment includes: a terminal 101 and a server 102. The terminal 101 and the server 102 are connected via a wireless or wired network. Optionally, the server 102 is configured to train a text matching model using the method provided in the embodiment of the present application, where the text matching model is used to determine whether any two text data match. The server 102 sends the trained text matching model to the terminal 101, and the terminal 101 can call the text matching model to perform matching processing on any two text data to obtain a matching result.

In a possible implementation manner, an application client provided by the server runs in the terminal 101, and the server 102 stores the trained text matching model in the application client, and the application client has a text data processing function. The terminal 101 calls a text matching model based on the application client, and performs matching processing on any two text data to obtain a matching result.

It should be noted that, in fig. 1, only the server 102 trains the text matching model and sends the text matching model to the terminal 101 for illustration, in another embodiment, the terminal 101 may also directly train the text matching model.

Fig. 2 is a flowchart of a text data processing method according to an embodiment of the present application. The execution subject of the embodiment of the application is computer equipment. Referring to fig. 2, the method comprises the steps of:

201. the computer device acquires fusion semantic information and fusion spatial information based on the first text data and the second text data.

The computer device acquires first text data and second text data, wherein the first text data and the second text data are any two text data recognized from the same object. For example, the first text data and the second text data are any two text data recognized from the same table, the same image, or the same article. The first text data and the second text data are located at different positions in the object, and the first text data and the second text data comprise characters, numerical values and the like.

The computer equipment acquires fusion semantic information corresponding to the first text data and the second text data and fusion spatial information corresponding to the first text data and the second text data. The fusion semantic information represents the semantics of the first text data and the second text data, and the fusion spatial information represents the positions of the first text data and the second text data in the object.

202. And the computer equipment carries out cross processing on the first semantic features corresponding to the fusion semantic information and the first spatial features corresponding to the fusion spatial information to respectively obtain second semantic features and second spatial features.

The computer equipment acquires a first semantic feature corresponding to the fusion semantic information and a first spatial feature corresponding to the fusion spatial information. The first semantic features are features fusing semantic information, namely semantic features of the first text data and the second text data. The first spatial feature is a feature of fusing spatial information, that is, a spatial feature of the first text data and the second text data.

And the computer equipment carries out cross processing on the first semantic feature and the first spatial feature to obtain a second semantic feature and a second spatial feature. The cross processing means that the first semantic features and the first spatial features are crossed with each other, so that the second semantic features and the second spatial features are obtained. The second semantic feature covers the first semantic feature and the part of the feature represented by the first spatial feature and the second spatial feature also covers the first semantic feature and the part of the feature represented by the first spatial feature.

203. And the computer equipment combines the second semantic features and the second spatial features to obtain combined features.

After the second semantic feature and the second spatial feature are obtained, the computer device combines the second semantic feature and the second spatial feature into a combined feature, so that the combined feature comprises the second semantic feature and the second spatial feature.

204. And the computer equipment performs matching processing on the combined features to obtain a matching result.

The combined features obtained by the computer device comprise semantic features of the first text data and the second text data and spatial features of the first text data and the second text data, and the computer device performs matching processing on the combined features, namely performing matching processing on the semantic features of the first text data and the semantic features of the second text data and performing matching processing on the spatial features of the first text data and the spatial features of the second text data to obtain a matching result, wherein the matching result indicates whether the first text data is matched with the second text data or not.

In the method provided by the embodiment of the application, the first semantic feature represents the feature of the fusion semantic information of the first text data and the second text data, the first spatial feature represents the feature of the fusion spatial information of the first text data and the second text data, and the second semantic feature and the second spatial feature after the first semantic feature and the first spatial feature are processed in a cross mode are combined to obtain the combined feature covering the semantic features and the spatial features of the two text data.

Fig. 3 is a flowchart of a text data processing method according to an embodiment of the present application. The execution subject of the embodiment of the application is computer equipment. Referring to fig. 3, the method comprises the steps of:

301. the computer device acquires first semantic information and first spatial information of the first text data, and second semantic information and second spatial information of the second text data.

The first text data and the second text data are any two text data recognized from the same object. The computer equipment identifies the same object to obtain a plurality of text data in the object, wherein each text data is positioned at a different position in the object. The computer device takes any two text data of the plurality of text data as the first text data and the second text data. The object may be a form, a photographed image, an article, or the like.

The computer device acquires first semantic information and first spatial information of the first text data, and second semantic information and second spatial information of the second text data. Wherein the first semantic information represents the semantics of the first text data, the first spatial information represents the position of the first text data in the object, the second semantic information represents the semantics of the second text data, and the second spatial information represents the position of the second text data in the object.

In one possible implementation, a process for acquiring, by a computer device, first semantic information and first spatial information includes: the computer equipment adds the vectors corresponding to each character in the first text data to obtain first semantic information, and obtains first spatial information based on the vertex coordinates of the text box where the first text data is located in the object.

The computer equipment carries out semantic recognition on the first text data to obtain a vector corresponding to each character of the first text data, and the vector corresponding to the character represents the semantic meaning of the character. And adding the obtained vectors corresponding to the plurality of characters by the computer equipment to obtain first semantic information. Since the first semantic information contains the semantics of each character in the first text data, the first semantic information can represent the semantics of the first text data. Optionally, the vector corresponding to the character is a Word vector (Word Embedding). The characters in the first text data may be characters or numerical values, and the characters include chinese characters or english, and the like.

The first text data is text data recognized in the object. Optionally, the computer device identifies the object to obtain a plurality of text boxes, where the text boxes indicate positions of the text data, and the computer device determines data in each text box as text data, so that each text data corresponds to a text box. The computer device determines a text box in which the first text data is located in the object, and determines vertex coordinates of the text box, wherein the vertex coordinates can represent the position of the first text data, so that the computer device acquires first spatial information corresponding to the first text data based on the vertex coordinates.

Optionally, the computer device obtains vertex coordinates corresponding to four vertices of the text box, and splices the obtained four vertex coordinates to obtain the first spatial information. Or the computer equipment adds the acquired four vertex coordinates to obtain the first spatial information. Since the first spatial information contains vertex coordinates of four vertices of a text box in which the first text data is located, the first spatial information can represent a position of the first text data.

Accordingly, a process for acquiring second semantic information and second spatial information by a computer device includes: and adding vectors corresponding to each character in the second text data to obtain second semantic information, and acquiring second spatial information based on the vertex coordinates of the text box in which the second text data is positioned in the object. The process of obtaining the second semantic information and the second spatial information is the same as the process of obtaining the first semantic information and the first spatial information, and is not repeated herein.

302. The computer equipment fuses the first semantic information and the second semantic information to obtain fused semantic information, and fuses the first spatial information and the second spatial information to obtain fused spatial information.

The fused semantic information includes first semantic information and second semantic information, and thus the fused semantic information represents semantics of the first text data and the second text data. The fused spatial information contains first spatial information and second spatial information, and thus represents the positions of the first text data and the second text data.

In one possible implementation, the first semantic information and the second semantic information are in the form of vectors, and the computer device adds the first semantic information and the second semantic information to obtain the fused semantic information. Or the computer equipment splices the first semantic information and the second semantic information to obtain the fused semantic information. In a possible implementation manner, the first spatial information and the second spatial information are also in the form of vectors, and the computer device adds the first spatial information and the second spatial information to obtain the fused spatial information. Or the computer equipment splices the first spatial information and the second spatial information to obtain the fusion spatial information.

For example, the first semantic information is obtained by adding vectors of each character in the first text data, the second semantic information is obtained by adding vectors of each character in the second text data, and the fused semantic information is obtained by adding the first semantic information and the second semantic information, which is equivalent to adding the vector of each character in the first text data and the vector of each character in the second text data by the computer device to obtain the fused semantic information.

For example, the first spatial information is obtained by adding four vertex coordinates of a text box in which the first text data is located, the second spatial information is obtained by adding four vertex coordinates of a text box in which the second text data is located, and the fused spatial information is obtained by adding the first spatial information and the second spatial information, which is equivalent to adding the four vertex coordinates of the text box in which the first text data is located and the four vertex coordinates of the text box in which the second text data is located by the computer device.

The step 301-. In another embodiment, the computer device may also use other ways to obtain the fused semantic information and the fused spatial information.

303. And the computer equipment respectively extracts the features of the fused semantic information and the fused spatial information to obtain a first semantic feature and a first spatial feature.

And the computer equipment extracts the features of the fused semantic information to obtain a first semantic feature corresponding to the fused semantic information, wherein the first semantic feature is the feature of the fused semantic information, namely the semantic features of the first text data and the second text data. The computer equipment extracts the features of the fusion spatial information to obtain a first spatial feature corresponding to the fusion spatial information, wherein the first spatial feature is the feature of the fusion spatial information, namely the spatial feature of the first text data and the spatial feature of the second text data.

304. The computer equipment acquires a first query feature, a first key feature and a first value feature corresponding to the first semantic feature, and acquires a second query feature, a second key feature and a second value feature corresponding to the first spatial feature.

The computer equipment performs different spatial transformations on the first semantic features to obtain corresponding first query (query) features, first key (key) features and first value (value) features, performs different linear transformations on the first spatial features to obtain corresponding second query features, second key features and second value features. The query feature, the key feature and the value feature belong to different feature spaces respectively, and the query feature can determine the matching degree between the key feature and the value feature.

In one possible implementation, the process for the computer device to obtain the first query feature, the first key feature and the first value feature includes: and obtaining a semantic matrix by multiplying the first semantic features and the parameter matrix, and acquiring the first query features, the first key features and the first value features based on the semantic matrix.

The computer equipment obtains a parameter matrix, the parameter matrix is used for carrying out space transformation on the first semantic features, the first semantic features are multiplied by the parameter matrix to obtain a semantic matrix, and the semantic matrix is divided into first query features, first key features and first value features. For example, the parameter matrix is a 3-dimensional parameter matrix, the semantic matrix obtained by multiplying the first semantic feature by the parameter matrix is a 3-dimensional semantic matrix, and the computer device takes each dimension of the semantic matrix as the first query feature, the first key feature and the first value feature respectively.

Correspondingly, the computer equipment obtains a space matrix by multiplying the first space characteristic by the parameter matrix, and obtains a second query characteristic, a second key characteristic and a second value characteristic based on the space matrix. The process of obtaining the second query feature, the second key feature and the second value feature is the same as the process of obtaining the first query feature, the first key feature and the first value feature, and is not repeated herein.

305. The computer device obtains a second semantic feature based on the second query feature, the first key feature and the first value feature, and obtains a second spatial feature based on the first query feature, the second key feature and the second value feature.

The first query feature, the first key feature and the first value feature belong to a feature of the first text data, and the second query feature, the second key feature and the second value feature belong to a feature of the second text data. The computer device obtains a second semantic feature based on the second query feature, the first key feature, and the first value feature, so that the second semantic feature covers the first key feature and the first value feature of the first text data, and a second query feature for matching keys and values in the second text data. The computer equipment acquires a second spatial feature based on the first query feature, the second key feature and the second value feature, wherein the second spatial feature covers the second key feature and the second value feature of the second text data, and the first query feature used for matching keys and values in the first text data, which is equivalent to exchanging the query feature corresponding to the first text data with the query feature corresponding to the second text data, so that the first semantic feature of the first text data and the first spatial feature of the second text data are subjected to cross processing.

In a possible implementation manner, the process of acquiring, by the computer device, the second semantic feature based on the second query feature, the first key feature, and the first value feature includes normalizing a product of the second query feature, the first key feature, and the scaling factor to obtain a first normalized feature, and determining a product of the first normalized feature and the first value feature as the second semantic feature.

The computer device obtains a scaling factor that represents a normalized scaling factor. In an embodiment of the present application, the computer device determines a product of the second query feature and the first key feature, the product being capable of representing a correlation between the second query feature and the first key feature. The computer device normalizes the product by using the scaling factor as a normalization parameter to obtain a first normalized feature, and the first normalized feature represents a correlation between the second query feature and the first key feature, so that the computer device can use the first normalized feature as a weight of the first value feature, and thus the computer device determines the product between the first normalized feature and the first value feature as the second semantic feature.

Correspondingly, the process of acquiring, by the computer device, the second spatial feature based on the first query feature, the second key feature and the second value feature includes: and normalizing the product of the first query feature, the second key feature and the scaling factor to obtain a second normalized feature, and determining the product of the second normalized feature and the second value feature as a second spatial feature. The process of obtaining the second spatial feature is the same as the process of obtaining the second semantic feature, and is not repeated here.

In the embodiment of the application, in the process of feature extraction, the first semantic features and the first spatial features are respectively divided into query features, key features and value features, the first query features corresponding to the first text data and the second query features corresponding to the second text data are exchanged to obtain the second semantic features and the second spatial features, so that the information content of the second semantic features and the second spatial features can be improved, and matching processing is subsequently performed according to the second semantic features and the second spatial features, which is favorable for improving the matching accuracy.

In one possible implementation, the computer device divides the first query feature, the first key feature, and the first value feature into a plurality of first query sub-features, a plurality of first key sub-features, and a plurality of first value sub-features, respectively, and divides the second query feature, the second key feature, and the second value feature into a plurality of second query sub-features, a plurality of second key sub-features, and a plurality of second value sub-features, respectively. The process of the computer device obtaining the second semantic features and the second spatial features comprises: and the computer equipment respectively acquires a plurality of second semantic sub-features based on the plurality of second query sub-features, the plurality of first key sub-features and the plurality of first value sub-features, and splices the plurality of second semantic sub-features to obtain second semantic features. And the computer equipment respectively acquires a plurality of second space sub-features based on the plurality of first inquiry sub-features, the plurality of second key sub-features and the plurality of second value sub-features, and splices the plurality of second space sub-features to obtain a second space feature.

The computer device divides the first query feature to obtain a plurality of first query sub-features, divides the first key feature to obtain a plurality of first key sub-features, and divides the first value feature to obtain a plurality of first value sub-features. The number of the plurality of first query sub-features, the number of the plurality of first key sub-features, and the number of the plurality of first value sub-features are equal, and the computer device divides the plurality of first query sub-features, the plurality of first key sub-features, and the plurality of first value sub-features into a plurality of first sub-feature sets, each first sub-feature set including 1 first query sub-feature, 1 first key sub-feature, and 1 first value sub-feature. Taking the first query feature as an example, for example, the first query feature is 1 feature vector, the feature vector is uniformly divided into 8 feature sub-vectors, and the computer device uses each feature sub-vector as the first query sub-feature.

Accordingly, the computer device divides the second query feature, the second key feature and the second value feature into a plurality of second query sub-features, a plurality of second key sub-features and a plurality of second value sub-features into a plurality of second sub-feature sets, and each second sub-feature set comprises 1 second query sub-feature, 1 second key sub-feature and 1 second value sub-feature.

And the number of the plurality of first sub-feature sets is equal to that of the plurality of second sub-feature sets. The computer device groups the first sub-feature set and the second sub-feature set pairwise, e.g., the computer device groups in a segmentation order. For each group of the first sub-feature set and the second sub-feature set, the computer device obtains a second semantic sub-feature based on the first key sub-feature and the first value sub-feature in the first sub-feature set and the second query sub-feature in the second sub-feature set, and obtains a second spatial sub-feature based on the first query sub-feature in the first sub-feature set and the second key sub-feature and the second value sub-feature in the second sub-feature set. For each group of the first sub-feature set and the second sub-feature set, the computer device can obtain one second semantic sub-feature and one second spatial sub-feature, so that the computer device can obtain a plurality of second semantic sub-features and a plurality of second spatial sub-features, the computer device splices the plurality of second semantic sub-features to obtain a second semantic feature, and splices the plurality of second spatial sub-features to obtain a second spatial feature.

Optionally, the computer device splices the plurality of second semantic sub-features to obtain spliced semantic sub-features, and determines the product of the spliced semantic sub-features and the parameter matrix as the second semantic features. And the computer equipment splices the plurality of second space sub-features to obtain spliced space sub-features, and determines the product of the spliced space sub-features and the parameter matrix as the second space features.

For example, taking the second semantic feature as an example, the computer device employs the following algorithm to obtain the second semantic feature.

MSA(y)＝[SA₁(y)；SA₂(y)；...；SA_k(y)]U_msa；

Wherein, SA₁(y)、SA₂(y)……SA_k(y) representing second semantic sub-features, [. respectively]Express splice, U_msaRepresenting a parameter matrix, msa (y) representing a second semantic feature.

The computer device performs the above-mentioned step 304-305 to implement cross processing on the first semantic feature corresponding to the fusion semantic information and the first spatial feature corresponding to the fusion spatial information, so as to obtain the second semantic feature and the second spatial feature respectively. In another embodiment, the computer device may also perform cross processing on the first semantic feature and the second spatial feature in other manners to obtain the second semantic feature and the second spatial feature.

306. And the computer equipment combines the second semantic features and the second spatial features to obtain combined features.

In one possible implementation, the computer device concatenates the second semantic feature and the second spatial feature to obtain the combined feature. Or the computer device adds the second semantic feature and the second spatial feature to obtain the combined feature.

307. And the computer equipment performs matching processing on the combined features to obtain a matching result.

In one possible implementation, the matching result is a first matching result or a second matching result, the first matching result indicates that the first text data and the second text data match, and the second matching result indicates that the first text data and the second text data do not match. Optionally, the first matching result is a first value, and the second matching result is a second value, for example, the first value is 1, and the second value is 0.

In the embodiment of the present application, the matching of the first text data and the second text data means that the contents of the first text data and the second text data are associated. Optionally, the content of the first text data is a keyword (key), and the second text data matching the first text data is a value (value) corresponding to the keyword, and the value corresponding to the keyword can be understood as description information corresponding to the keyword. For example, the first text data is "date of admission", the second text data is "2021-8-8", and the first text data and the second text data match. For example, the first text data is "date of admission", the second text data is "622 meta", and the first text data and the second text data do not match. In the above example, only the first text data is used as a keyword and the second text data is used as a value, but the first text data may also be a value and the second text data may also be a keyword.

In one possible implementation, in the event that the first text data and the second text data match, the computer device determines keywords and values in the first text data and the second text data based on a character type of the first text data and a character type of the second text data. For example, the determination that the character type in the first text data and the second text data belongs to the numeric value type is defined as a value, and the determination that the character type belongs to the other type is defined as a keyword. Therefore, the method for matching the text data of different character types is provided in the embodiment of the present application, and for a plurality of text data in one object, the matching situation among the plurality of text data can be determined by using the method provided in the embodiment of the present application, so that the text structuring is performed on the object according to the matching situation, and the complexity of text structuring is reduced.

According to the method provided by the embodiment of the application, the first semantic features represent the features of the fusion semantic information of the first text data and the second text data, the first spatial features represent the features of the fusion spatial information of the first text data and the second text data, and the second semantic features and the second spatial features after the first semantic features and the first spatial features are subjected to cross processing are combined to obtain the combined features covering the semantic features and the spatial features of the two text data.

In addition, the matching condition between the semantic features and the matching condition between the spatial features of the text data can be considered at the same time, so that the information dimension referred to by the matching processing of the text data is expanded, and the accuracy of the matching of the text data is improved.

In addition, in the process of feature extraction, the first semantic features and the first spatial features are respectively divided into query features, key features and value features, the first query features corresponding to the first text data and the second query features corresponding to the second text data are exchanged to obtain the second semantic features and the second spatial features, so that the information content of the second semantic features and the second spatial features can be improved, and the matching processing is subsequently performed according to the second semantic features and the second spatial features, thereby being beneficial to improving the matching accuracy.

In another embodiment, a text matching model is stored in the computer device and used for matching text data. Fig. 4 is a schematic structural diagram of a text matching model provided in an embodiment of the present application, and as shown in fig. 4, the text matching model includes: a feature extraction network 401, a cross-processing network 402 and a matching network 403. The feature extraction network 401 is connected to a cross processing network 402, the cross processing network 401 is connected to a matching network 403, the feature extraction network 401 is configured to extract semantic features and spatial features of the text data, the cross processing network 402 is configured to cross process the semantic features and the spatial features of the text data, and the matching network 403 is configured to determine whether the text data is matched based on the features of the text data.

In one possible implementation, cross-processing network 402 includes a semantic feature extraction layer, a spatial feature extraction layer, and a cross-processing layer. As shown in fig. 5, a semantic feature extraction layer and a spatial feature extraction layer in the first cross processing network 402 are respectively connected to a cross processing layer, the semantic feature extraction layer is used for extracting semantic features, the spatial feature extraction layer is used for extracting spatial features, and the cross processing layer is used for cross processing the semantic features and the spatial features. The semantic feature extraction layer and the spatial feature extraction layer have the same network parameters and can perform parallel processing.

Optionally, the text matching model includes a plurality of cross-processing networks 402, as shown in fig. 5, the plurality of cross-processing networks 402 are connected in sequence. The semantic feature extraction layer and the spatial feature extraction layer in the first cross processing network 402 are respectively connected with the feature extraction network 401, the semantic feature extraction layer and the spatial feature extraction layer in the cross processing network 402 after the first cross processing network 402 are respectively connected with the cross processing layer in the last cross processing network 402, the semantic feature extraction layer and the spatial feature extraction layer in each cross processing network 402 are also respectively connected with the cross processing layer in the cross processing network 402, and the cross processing layer in the last cross processing network 402 is connected with the matching network.

Fig. 6 is a flowchart of a text data processing method according to an embodiment of the present application. The execution subject of the embodiment of the present application is a computer device, and the computer device invokes the text matching model shown in fig. 4 or fig. 5 to perform matching processing on text data. Wherein the text matching model comprises: feature extraction network, cross-processing network and matching network, referring to fig. 6, the method comprises the following steps:

601. the computer device acquires first semantic information and first spatial information of the first text data, and second semantic information and second spatial information of the second text data.

602. The computer equipment fuses the first semantic information and the second semantic information to obtain fused semantic information, and fuses the first spatial information and the second spatial information to obtain fused spatial information.

The processes of step 601-step 602 are the same as the processes of step 301-step 302, and are not described herein again.

603. And calling a feature extraction network in the text matching model by the computer equipment, and respectively extracting features of the fusion semantic information and the fusion spatial information to obtain a third semantic feature and a third spatial feature.

And the computer equipment inputs the fused semantic information into a feature extraction network, and the feature extraction network performs feature extraction on the fused semantic information and outputs a third semantic feature corresponding to the fused semantic information. And the computer equipment inputs the fusion space information into a feature extraction network, and the feature extraction network performs feature extraction on the fusion space information and outputs a third space feature corresponding to the fusion space information.

In one possible implementation, the fused semantic information and the fused spatial information exist in a vector form, and the third semantic feature and the third spatial feature output by the feature extraction network are a matrix of (N +1) × M, where N is the number of characters in the first text data and M is the feature dimension.

In one possible implementation manner, the text matching model includes 1 feature extraction network, and the computer device inputs the fused semantic information to the feature extraction network to obtain a third semantic feature, and inputs the fused spatial information to the feature extraction network to obtain a third spatial feature. Or the computer equipment firstly inputs the fusion space information into the feature extraction network to obtain a third space feature, and then inputs the fusion semantic information into the feature extraction network to obtain a third semantic feature. In another possible implementation manner, the text matching model includes two feature extraction networks, the computer device inputs the fused semantic information into one feature extraction network, and simultaneously inputs the fused spatial information into the other feature extraction network, and the two feature extraction networks perform parallel processing to obtain a third semantic feature and a third spatial feature respectively. Optionally, the network parameters of the two feature extraction networks are the same.

604. The computer equipment calls a cross processing network in the text matching model, respectively extracts the third semantic features and the third spatial features to obtain first semantic features and first spatial features, respectively obtains second semantic features and second spatial features by cross processing the first semantic features and the first spatial features, and combines the second semantic features and the second spatial features to obtain combined features.

The computer equipment inputs the third semantic feature and the third spatial feature into a cross processing network, the cross processing network respectively extracts the third semantic feature and the third spatial feature to obtain a first semantic feature and a first spatial feature, the cross processing network continuously carries out cross processing on the first semantic feature and the first spatial feature to obtain a second semantic feature and a second spatial feature, the cross processing network continuously combines the second semantic feature and the second spatial feature to obtain a combined feature, and the combined feature comprises the second semantic feature and the second spatial feature.

In one possible implementation, the cross-processing, by the computer device, the first semantic feature and the first spatial feature to obtain a second semantic feature and a second spatial feature, respectively, includes: calling a cross processing network, acquiring a first query feature, a first key feature and a first value feature corresponding to the first semantic feature, acquiring a second query feature, a second key feature and a second value feature corresponding to the first spatial feature, acquiring a second semantic feature based on the second query feature, the first key feature and the first value feature, and acquiring a second spatial feature based on the first query feature, the second key feature and the second value feature. The process is the same as the process of step 304-step 305, and will not be described herein again, except that the process of obtaining the second semantic feature and the second spatial feature is executed by invoking the cross-processing network in step 604.

In another possible implementation, as shown in fig. 5, the cross-processing network includes a semantic feature extraction layer, a spatial feature extraction layer, and a cross-processing layer. And the computer equipment calls a cross processing network to process the third semantic feature and the third spatial feature and acquire the second semantic feature and the second spatial feature.

The computer equipment calls a semantic feature extraction layer in the cross processing network to extract the third semantic feature to obtain the first semantic feature, and calls a spatial feature extraction layer in the cross processing network to extract the third spatial feature to obtain the first spatial feature.

And the computer equipment inputs the third semantic features into the semantic feature extraction layer, and the semantic feature extraction layer performs feature extraction on the third semantic features to obtain the first semantic features. And the computer equipment inputs the third spatial feature into the spatial feature extraction layer, and the spatial feature extraction layer performs feature extraction on the third spatial feature to obtain the first spatial feature.

Optionally, the semantic feature extraction layer is a transform encoder (a machine translation-based encoder), and the computer device performs feature extraction on the third semantic feature and the third spatial feature based on a self-attention mechanism. And the computer equipment calls a semantic feature extraction layer, determines the product of the third semantic feature and the parameter matrix as a first semantic matrix, and acquires a third query feature, a third key feature and a third value feature based on the first semantic matrix. And normalizing the product of the third query feature, the third key feature and the scaling factor to obtain a third normalized feature, and determining the product of the third normalized feature and the third value feature as the first semantic feature. And the computer equipment calls the spatial feature extraction layer, determines the product of the third spatial feature and the parameter matrix as a first spatial matrix, and acquires a fourth query feature, a fourth key feature and a fourth value feature based on the first spatial matrix. And normalizing the product of the fourth query feature, the fourth key feature and the scaling factor to obtain a fourth normalized feature, and determining the product of the fourth normalized feature and the fourth value feature as the first spatial feature.

Optionally, taking the first semantic feature as an example, the computer device invokes a semantic feature extraction layer, and performs feature extraction on the third semantic feature by using the following algorithm to obtain the first semantic feature.

[q，k，v]＝yU_qkv；

SA(y)＝Av；

Wherein y represents a third semantic feature, U_qkvRepresents a parameter matrix, [ q, k, v [ ]]Representing a first semantic matrix, q a third query feature, k a third key feature, and v a third value feature.

Represents a scaling factor, softmax (·) representsA normalization function, a denotes the third normalized feature, sa (y) denotes the first semantic feature.

Optionally, the computer device performs feature extraction on the third semantic features and the third spatial features based on a multi-head self-attention mechanism. The computer device divides the third query feature, the third key feature and the third value feature into a plurality of third query sub-features, a plurality of third key sub-features and a plurality of third value sub-features, respectively, and obtains a plurality of first semantic sub-features based on the plurality of third query sub-features, the plurality of third key sub-features and the plurality of third value sub-features, and splices the plurality of first semantic sub-features to obtain the first semantic feature. The computer device divides the fourth query feature, the fourth key feature and the fourth value feature into a plurality of fourth query sub-features, a plurality of fourth key sub-features and a plurality of fourth value sub-features, respectively, and obtains a plurality of first spatial sub-features based on the plurality of fourth query sub-features, the plurality of fourth key sub-features and the plurality of fourth value sub-features, and splices the plurality of first spatial sub-features to obtain the first spatial feature.

Optionally, the network structures of the semantic feature extraction layer and the spatial feature extraction layer are the same, and the semantic feature extraction layer is taken as an example for illustration, fig. 7 is a schematic structural diagram of a semantic feature extraction layer provided in an embodiment of the present application, and as shown in fig. 7, the semantic feature extraction layer includes a Multi-head self attention layer (Multi-attention) 701, a residual and normalization layer (Add and Norm)702, a Feed-Forward layer (Feed Forward)703, and a residual and normalization layer 704.

And (II) calling a cross processing layer in the cross processing network by the computer equipment, and performing cross processing on the first semantic features and the first spatial features to respectively obtain second semantic features and second spatial features.

The computer equipment calls a cross processing layer in a cross processing network, obtains a first query feature, a first key feature and a first value feature corresponding to the first semantic feature, obtains a second query feature, a second key feature and a second value feature corresponding to the first spatial feature, obtains the second semantic feature based on the second query feature, the first key feature and the first value feature, and obtains the second spatial feature based on the first query feature, the second key feature and the second value feature.

Optionally, the computer device normalizes the product of the second query feature, the first key feature and the scaling factor to obtain a first normalized feature, and determines the product of the first normalized feature and the first value feature as the second semantic feature. And normalizing the product of the first query feature, the second key feature and the scaling factor to obtain a second normalized feature, and determining the product of the second normalized feature and the second value feature as a second spatial feature.

For example, the computer device invokes the cross-processing layer, employing the following algorithm, to determine the second semantic features and the second spatial features.

[q₁，k₁，v₁]＝y₁U_qkv，[q₂，k₂，v₂]＝y₂U_qkv；

SA₁＝A₁v₁，SA₂＝A₂v₂；

Wherein, y₁Representing a first semantic feature, y₂Representing a first spatial feature, U_qkvRepresents a parameter matrix, [ q ]₁，k₁，v₁]Representing a semantic matrix, q in the semantic matrix₁Representing a first query feature, k₁Representing a first key feature, v₁Representing a first value characteristic. [ q ] of₂，k₂，v₂]Representing a spatial matrix, q in the spatial matrix₂Representing a second query feature, k₂Representing a second key feature, v₂Representing the second value characteristic. A. the₁Denotes a first normalized characteristic, A₂A second normalized characteristic is represented that is representative of,

representing a scaling factor, softmax (·) representingA normalizing function. SA₁Representing a second semantic feature, SA₂Representing a second spatial feature.

Fig. 8 is a schematic structural diagram of an intersection processing layer provided in an embodiment of the present application, and referring to fig. 8, the intersection processing layer includes a multi-layer sensing layer 801, a matrix multiplication layer 802, a normalization and normalization layer 803, a matrix multiplication layer 804, a multi-layer sensing layer 805, a matrix multiplication layer 806, a normalization and normalization layer 807, and a matrix multiplication layer 808. Wherein the multi-layer perception layer 801 is used for realizing the first semantic feature y₁Obtaining a first query feature q₁First key feature k₁And a first value characteristic v₁The multi-layer sensing layer 805 is used for sensing the first spatial feature y₂Obtaining a second query feature q₂Second key feature k₂And a second value characteristic v₂。

The matrix multiplication layer 802 is used to multiply the first key feature k₁And a second query feature q₂Multiplication, normalization and normalization layer 803 for the first key feature k₁And a second query feature q₂The matrix multiplication layer 804 is used for normalizing the first normalized feature and the first value feature v₁Multiplying to obtain a second semantic feature SA₁. The matrix multiplication layer 806 is used to multiply the second key feature with the first query feature and the normalization and normalization layer 807 is used to normalize the second key feature k₂And a first query feature q₁The matrix multiplication layer 808 is used for normalizing the second normalized feature and the second value feature v₂Multiplying to obtain a second spatial characteristic SA₂。

In another possible implementation, the text matching model includes a plurality of cross-processing networks. The computer equipment respectively extracts the third semantic features and the third spatial features to obtain first semantic features and first spatial features based on a first cross processing network, respectively performs cross processing on the first semantic features and the first spatial features to obtain fourth semantic features and fourth spatial features, respectively performs feature extraction and cross processing on the fourth semantic features and the fourth spatial features based on a second cross processing network until second semantic features and second spatial features output by the last cross processing network are obtained, and combines the second semantic features and the second spatial features to obtain combined features.

The network structure of each cross processing network is the same but the network parameters are different, the processing process of each cross processing network is the same, the output of the last cross processing network is the second semantic feature and the second spatial feature, and the computer equipment combines the second semantic feature and the second spatial feature to obtain the combined feature.

605. And the computer equipment calls a matching network in the text matching model to perform matching processing on the combined characteristics to obtain a matching result.

The computer device inputs the combined features into the matching network, the matching network performs matching processing on the combined features, and outputs a corresponding matching result, wherein the matching result indicates whether the first text data and the second text data are matched or not. Optionally, the matching network is an MLP (Multi-Layer Perceptron) classification network.

In addition, the text matching model is called to perform matching processing on the first text data and the second text data, so that the text data matching process is simplified, and convenience in performing matching processing on the text data is improved.

In addition, the multi-head self-attention mechanism and the multi-head cross attention mechanism are adopted in the embodiment of the application, the query feature, the key feature and the value feature are respectively divided into the corresponding sub-features to be processed, so that over-fitting in the processes of feature extraction and cross processing can be avoided, and the accuracy of the second semantic feature and the second spatial feature is further improved.

Fig. 9 is a flowchart of a text matching model training method provided in an embodiment of the present application, where an execution subject of the embodiment of the present application is a computer device, and the text matching model trained in the embodiment of the present application can be applied to the above embodiment of fig. 6, and referring to fig. 9, the method includes the following steps:

901. the computer device obtains sample fusion semantic information and sample fusion spatial information based on the first sample text data and the second sample text data.

In order to train the text matching model, the computer device first acquires first sample text data and second sample text data, and acquires sample fusion semantic information and sample fusion spatial information based on the first sample text data and the second sample text data. The process of acquiring the sample fusion semantic information and the sample fusion spatial information by the computer device is the same as the process of steps 301 to 302, and is not described herein again.

It should be noted that the process of training the text matching model based on the first sample text data and the second sample text data includes a plurality of iterative processes, and in each iterative process, training is performed based on a pair of the first sample text data and the second sample text data. The steps 901-905 in the embodiment of the present application are only described by taking one iteration process as an example.

902. And the computer equipment respectively extracts the features of the sample fusion semantic information and the sample fusion spatial information based on a feature extraction network to obtain a first sample semantic feature and a first sample spatial feature.

903. The computer equipment carries out cross processing on the first sample semantic feature and the first sample spatial feature based on a cross processing network to respectively obtain a second sample semantic feature and a second sample spatial feature; and combining the semantic features of the second sample with the spatial features of the second sample to obtain sample combination features.

904. And the computer equipment performs matching processing on the sample combination characteristics based on the matching network to obtain a sample matching result.

The processes of step 902 to step 904 are the same as the processes of step 603 to step 605, and are not described herein again.

905. The computer device trains the text matching model based on the matching result of the first sample text data and the second sample text data and the sample matching result.

The computer device obtains a matching result of the first sample text data and the second sample text data, the matching result being a true matching result of the first sample text data and the second sample text data, the sample matching result obtained by the computer device being a matching result predicted by the text matching model, and the computer device trains the text matching model based on a difference between the sample matching result and the true matching result.

The text matching model aims to match the first sample text data with the second sample text data, so that a sample matching result of the first sample text data and the second sample text data is obtained. The more similar the sample match result is to the true match result, the more accurate the text matching model is. The computer device trains the text matching model according to the difference between the sample matching result and the real matching result to improve the matching capability of the text matching model, thereby improving the accuracy of the text matching model.

In a possible implementation manner, the computer device repeats the

steps

901 and 905, performs iterative training on the text matching model, and stops training the text matching model in response to the iteration round reaching the first threshold; or stopping training the text matching model in response to the loss value obtained in the current iteration turn being not greater than the second threshold value. The first threshold and the second threshold are both arbitrary values, for example, the first threshold is 1000 or 1500, and the second threshold is 0.004 or 0.003.

In one possible implementation, the computer device updates network parameters of the text matching model using an Adam (Adaptive motion Estimation) based gradient descent method. Optionally, the parameters in Adam are set to (0.95, 0.9995). Optionally, the initial learning rate of the computer device training the text matching model is 0.001, and the learning rate is reduced by one fifth every 100 times of iterative training.

In the method provided by the embodiment of the application, the first sample semantic feature represents the feature of the sample fusion semantic information of the first sample text data and the second sample text data, the first sample spatial feature represents the feature of the sample fusion spatial information of the first sample text data and the second sample text data, by combining the semantic features of the second sample and the spatial features of the second sample after the first sample semantic features and the first sample spatial features are processed in an intersecting way, the sample combination features covering the semantic features and the spatial features of the two sample text data can be obtained, therefore, the matching processing is carried out on the sample combination characteristics, the sample matching result of the two sample text data can be obtained, and training the text matching model based on the matching result of the first sample text data and the second sample text data and the sample matching result. The text matching model can match any two text data, and improves the adaptability and robustness of the text matching model.

The embodiment can be applied to any scene needing text data matching. For example, in a scene where a medical text image is structured, the medical text image is described with information such as examination types, examination times, and amounts of money for each medical examination performed by a patient. Fig. 10 is a flowchart of an image clustering method provided in an embodiment of the present application, and referring to fig. 10, the method includes:

1001. the computer equipment acquires the medical text image, performs text recognition on the medical text image, and obtains a plurality of text data in the medical text image. The medical text image is shown in fig. 11, and the content in the black text box in fig. 11 is the recognized text data.

1002. For each two text data in the plurality of text data, the computer device performs text matching processing by using the text data processing method provided by the embodiment of fig. 4 or fig. 6, so as to obtain a matching result of each two text data, where the matching result includes a first matching result and a second matching result, the first matching result indicates that the two text data are matched, and the second matching result indicates that the two text data are not matched.

1003. The computer equipment determines the two text data corresponding to the first matching result as a pair of text data matched with each other, so that a plurality of pairs of text data matched with each other are obtained, and the medical text image is structured.

For example, in the medical text image shown in fig. 11, the identified text data includes "date of admission", "2020-05-18", "assay fee", "16179.5", and the like. Wherein the "date of admission" and "2020-05-18" match each other, and the "assay fee" and "16179.5" match each other.

In addition to the above, the computer device may apply the method provided by the above embodiments of fig. 4 or fig. 6 to any scene, such as the insurance field, the financial field, or the shopping field, that has a requirement for structuring the text image.

Fig. 12 is a schematic structural diagram of a text data processing apparatus according to an embodiment of the present application. Referring to fig. 12, the apparatus includes:

an information obtaining module 1201, configured to obtain fusion semantic information and fusion spatial information based on the first text data and the second text data;

the cross processing module 1202 is configured to perform cross processing on the first semantic feature corresponding to the fused semantic information and the first spatial feature corresponding to the fused spatial information to obtain a second semantic feature and a second spatial feature, respectively; combining the second semantic features and the second spatial features to obtain combined features;

a matching processing module 1203, configured to perform matching processing on the combined features to obtain a matching result, where the matching result indicates whether the first text data and the second text data are matched;

According to the text data processing device provided by the embodiment of the application, the first semantic features represent the features of the fusion semantic information of the first text data and the second text data, the first spatial features represent the features of the fusion spatial information of the first text data and the second text data, and the second semantic features and the second spatial features after the first semantic features and the first spatial features are subjected to cross processing are combined to obtain the combined features covering the semantic features and the spatial features of the two text data.

Optionally, referring to fig. 13, the information obtaining module 1201 includes:

an information acquisition unit 1211 for acquiring first semantic information and first spatial information of the first text data, and second semantic information and second spatial information of the second text data;

the first fusion unit 1221 is configured to fuse the first semantic information and the second semantic information to obtain fused semantic information;

the second fusing unit 1231 is configured to fuse the first spatial information and the second spatial information to obtain fused spatial information.

Alternatively, referring to fig. 13, the information acquisition unit 1211 is configured to:

splicing vectors corresponding to each character in the first text data to obtain first semantic information, and acquiring first spatial information based on vertex coordinates of a text box in which the first text data is positioned in an object;

and splicing vectors corresponding to each character in the second text data to obtain second semantic information, and acquiring second spatial information based on the vertex coordinates of the text box in which the second text data is positioned in the object.

Optionally, referring to fig. 13, the crossover processing module 1202 includes:

a first obtaining unit 1212, configured to obtain a first query feature, a first key feature, and a first value feature corresponding to the first semantic feature;

the first obtaining unit 1212 is further configured to obtain a second query feature, a second key feature, and a second value feature corresponding to the first spatial feature;

a second obtaining unit 1222, configured to obtain a second semantic feature based on the second query feature, the first key feature, and the first value feature;

the second obtaining unit 1222 is further configured to obtain a second spatial feature based on the first query feature, the second key feature and the second value feature.

Optionally, referring to fig. 13, the first obtaining unit 1212 is configured to:

multiplying the first semantic features by the parameter matrix to obtain a semantic matrix, and acquiring first query features, first key features and first value features based on the semantic matrix;

a first obtaining unit 1212, further configured to:

and obtaining a space matrix by multiplying the first space characteristic and the parameter matrix, and acquiring a second query characteristic, a second key characteristic and a second value characteristic based on the space matrix.

Optionally, referring to fig. 13, the second obtaining unit 1222 is configured to:

normalizing the product of the second query feature, the first key feature and the scaling factor to obtain a first normalized feature; determining a product of the first normalized feature and the first value feature as a second semantic feature;

a second obtaining unit 1222, further configured to:

normalizing the product of the first query feature, the second key feature and the scaling factor to obtain a second normalized feature; and determining the product of the second normalized feature and the second value feature as a second spatial feature.

Optionally, referring to fig. 13, the apparatus further comprises:

a feature segmentation module 1204, configured to divide the first query feature, the first key feature, and the first value feature into a plurality of first query sub-features, a plurality of first key sub-features, and a plurality of first value sub-features, respectively; dividing the second query feature, the second key feature and the second value feature into a plurality of second query sub-features, a plurality of second key sub-features and a plurality of second value sub-features;

a second obtaining unit 1222, configured to:

respectively acquiring a plurality of second semantic sub-features based on the plurality of second query sub-features, the plurality of first key sub-features and the plurality of first value sub-features, and splicing the plurality of second semantic sub-features to obtain a second semantic feature;

a second obtaining unit 1222, further configured to:

and respectively acquiring a plurality of second space sub-features based on the plurality of first inquiry sub-features, the plurality of second key sub-features and the plurality of second value sub-features, and splicing the plurality of second space sub-features to obtain a second space feature.

Alternatively, referring to fig. 13, the text matching model includes: a feature extraction network, a cross processing network and a matching network; the device still includes:

the feature extraction module 1205 is configured to invoke a feature extraction network, and perform feature extraction on the fused semantic information and the fused spatial information respectively to obtain a third semantic feature and a third spatial feature;

the cross processing module 1202 is configured to invoke a cross processing network, and perform feature extraction on the third semantic feature and the third spatial feature respectively to obtain a first semantic feature and a first spatial feature; performing cross processing on the first semantic features and the first spatial features to respectively obtain second semantic features and second spatial features; combining the second semantic features and the second spatial features to obtain combined features;

and the matching processing module 1203 is configured to invoke a matching network, and perform matching processing on the combined features to obtain a matching result.

Optionally, a cross-processing network, configured to:

acquiring a first query feature, a first key feature and a first value feature corresponding to the first semantic feature;

acquiring a second query feature, a second key feature and a second value feature corresponding to the first spatial feature;

acquiring a second semantic feature based on the second query feature, the first key feature and the first value feature;

and acquiring a second spatial feature based on the first query feature, the second key feature and the second value feature.

Alternatively, referring to fig. 13, the text matching model includes a plurality of cross-processing networks; a crossover processing module 1202, comprising:

the cross processing unit 1232 is configured to perform feature extraction on the third semantic feature and the third spatial feature respectively based on the first cross processing network to obtain a first semantic feature and a first spatial feature; performing cross processing on the first semantic feature and the first spatial feature to respectively obtain a fourth semantic feature and a fourth spatial feature;

the cross processing unit 1232 is further configured to perform feature extraction and cross processing on the fourth semantic feature and the fourth spatial feature, respectively, based on the second cross processing network until obtaining a second semantic feature and a second spatial feature output by the last cross processing network;

a combining unit 1242, configured to combine the second semantic feature and the second spatial feature to obtain a combined feature.

Optionally, referring to fig. 13, the apparatus further comprises:

a sample obtaining module 1206, configured to obtain sample fusion semantic information and sample fusion spatial information based on the first sample text data and the second sample text data;

a feature extraction module 1205, configured to perform feature extraction on the sample fusion semantic information and the sample fusion spatial information respectively based on a feature extraction network, so as to obtain a first sample semantic feature and a first sample spatial feature;

the cross processing module 1202 is further configured to perform cross processing on the first sample semantic feature and the first sample spatial feature based on a cross processing network to obtain a second sample semantic feature and a second sample spatial feature, respectively; combining the semantic features of the second sample with the spatial features of the second sample to obtain sample combination features;

the matching processing module 1203 is further configured to perform matching processing on the sample combination features based on a matching network to obtain a sample matching result;

and the model training module 1207 is configured to train the text matching model based on the matching result of the first sample text data and the second sample text data and the sample matching result.

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

It should be noted that: in the text data processing apparatus provided in the above embodiment, only the division of the above functional modules is used for illustration when processing text data, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the above described functions. In addition, the text data processing apparatus provided in the above embodiment and the text data processing method embodiment belong to the same concept, and specific implementation processes thereof are described in the method embodiment and are not described herein again.

The embodiment of the present application further provides a computer device, where the computer device includes a processor and a memory, and the memory stores at least one computer program, and the at least one computer program is loaded and executed by the processor, so as to implement the operations executed in the text data processing method of the foregoing embodiment.

Optionally, the computer device is provided as a terminal. Fig. 14 is a schematic structural diagram of a terminal 1400 according to an embodiment of the present disclosure. The terminal 1400 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 1400 can also be referred to as user equipment, a portable terminal, a laptop terminal, a desktop terminal, or other names.

Terminal 1400 includes: a processor 1401, and a memory 1402.

Processor 1401 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 1401 may be implemented in at least one hardware form of DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), or PLA (Programmable Logic Array). Processor 1401 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1401 may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and drawing content that the display screen needs to display. In some embodiments, processor 1401 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 1402 may include one or more computer-readable storage media, which may be non-transitory. Memory 1402 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1402 is used to store at least one computer program for execution by processor 1401 to implement the text data processing method provided by the method embodiments herein.

In some embodiments, terminal 1400 may further optionally include: a peripheral device interface 1403 and at least one peripheral device. The processor 1401, the memory 1402, and the peripheral device interface 1403 may be connected by buses or signal lines. Each peripheral device may be connected to the peripheral device interface 1403 via a bus, signal line, or circuit board. Optionally, the peripheral device comprises: at least one of rf circuitry 1404, a display 1405 and a camera assembly 1406.

The peripheral device interface 1403 can be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 1401 and the memory 1402. In some embodiments, the processor 1401, memory 1402, and peripheral interface 1403 are integrated on the same chip or circuit board; in some other embodiments, any one or both of the processor 1401, the memory 1402, and the peripheral device interface 1403 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 1404 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 1404 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 1404 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1404 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1404 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 1404 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1405 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1405 is a touch display screen, the display screen 1405 also has the ability to capture touch signals at or above the surface of the display screen 1405. The touch signal may be input to the processor 1401 for processing as a control signal. At this point, the display 1405 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 1405 may be one, disposed on the front panel of the terminal 1400; in other embodiments, display 1405 may be at least two, respectively disposed on different surfaces of terminal 1400 or in a folded design; in other embodiments, display 1405 may be a flexible display disposed on a curved surface or on a folded surface of terminal 1400. Even further, the display 1405 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The Display 1405 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The camera assembly 1406 is used to capture images or video. Optionally, camera assembly 1406 includes a front camera and a rear camera. The front camera is arranged on the front panel of the terminal, and the rear camera is arranged on the back of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1406 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Those skilled in the art will appreciate that the configuration shown in fig. 14 is not intended to be limiting with respect to terminal 1400 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be employed.

Optionally, the computer device is provided as a server. Fig. 15 is a schematic structural diagram of a server 1500 according to an embodiment of the present application, where the server 1500 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1501 and one or more memories 1502, where the memory 1502 stores at least one computer program, and the at least one computer program is loaded and executed by the processors 1501 to implement the methods provided by the method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

The embodiment of the present application further provides a computer-readable storage medium, where at least one computer program is stored in the computer-readable storage medium, and the at least one computer program is loaded and executed by a processor to implement the operations executed in the text data processing method of the foregoing embodiment.

Embodiments of the present application also provide a computer program product or a computer program comprising computer program code stored in a computer readable storage medium. The processor of the computer apparatus reads the computer program code from the computer-readable storage medium, and the processor executes the computer program code, so that the computer apparatus realizes the operations performed in the text data processing method of the above-described embodiment. In some embodiments, the computer program according to the embodiments of the present application may be deployed to be executed on one computer device or on multiple computer devices located at one site, or may be executed on multiple computer devices distributed at multiple sites and interconnected by a communication network, and the multiple computer devices distributed at the multiple sites and interconnected by the communication network may constitute a block chain system.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only an alternative embodiment of the present application and is not intended to limit the present application, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of processing text data, the method comprising:

2. The method according to claim 1, wherein the obtaining fused semantic information and fused spatial information based on the first text data and the second text data comprises:

acquiring first semantic information and first spatial information of the first text data, and second semantic information and second spatial information of the second text data;

fusing the first semantic information and the second semantic information to obtain fused semantic information;

and fusing the first spatial information and the second spatial information to obtain the fused spatial information.

3. The method of claim 2, wherein the obtaining the first semantic information and the first spatial information of the first text data and the second semantic information and the second spatial information of the second text data comprises:

4. The method according to claim 1, wherein the cross-processing the first semantic features corresponding to the fused semantic information and the first spatial features corresponding to the fused spatial information to obtain second semantic features and second spatial features, respectively, includes:

5. The method according to claim 4, wherein the obtaining of the first query feature, the first key feature and the first value feature corresponding to the first semantic feature comprises:

the obtaining of the second query feature, the second key feature, and the second value feature corresponding to the first spatial feature includes:

6. The method of claim 4, wherein the obtaining the second semantic features based on the second query features, the first key features, and the first value features comprises:

the obtaining the second spatial feature based on the first query feature, the second key feature, and the second value feature includes:

7. The method of claim 4, further comprising:

dividing the first query feature, the first key feature, and the first value feature into a plurality of first query sub-features, a plurality of first key sub-features, and a plurality of first value sub-features, respectively; dividing the second query feature, the second key feature and the second value feature into a plurality of second query sub-features, a plurality of second key sub-features and a plurality of second value sub-features, respectively;

the obtaining the second semantic features based on the second query features, the first key features, and the first value features includes:

8. The method of claim 1, wherein the text matching model comprises: a feature extraction network, a cross processing network and a matching network;

the feature extraction network is used for respectively extracting features of the fusion semantic information and the fusion spatial information to obtain a third semantic feature and a third spatial feature;

the cross processing network is used for respectively extracting the third semantic feature and the third spatial feature to obtain the first semantic feature and the first spatial feature; performing cross processing on the first semantic features and the first spatial features to obtain second semantic features and second spatial features respectively; combining the second semantic features and the second spatial features to obtain the combined features;

and the matching network is used for matching the combined features to obtain the matching result.

9. The method of claim 8, wherein the cross-processing network is configured to:

10. The method of claim 8, wherein the text matching model comprises a plurality of the cross-processing networks;

respectively extracting the third semantic feature and the third spatial feature to obtain the first semantic feature and the first spatial feature; performing cross processing on the first semantic features and the first spatial features to obtain second semantic features and second spatial features respectively; combining the second semantic features and the second spatial features to obtain combined features, including:

respectively extracting the third semantic feature and the third spatial feature based on the first cross processing network to obtain the first semantic feature and the first spatial feature; performing cross processing on the first semantic feature and the first spatial feature to obtain a fourth semantic feature and a fourth spatial feature respectively;

based on a second cross processing network, respectively performing feature extraction and cross processing on the fourth semantic feature and the fourth spatial feature until the second semantic feature and the second spatial feature output by the last cross processing network are obtained;

and combining the second semantic features and the second spatial features to obtain the combined features.

11. The method according to any one of claims 8-10, further comprising:

acquiring sample fusion semantic information and sample fusion spatial information based on the first sample text data and the second sample text data;

respectively extracting the features of the sample fusion semantic information and the sample fusion spatial information based on the feature extraction network to obtain a first sample semantic feature and a first sample spatial feature;

based on the cross processing network, cross processing is carried out on the first sample semantic feature and the first sample spatial feature to respectively obtain a second sample semantic feature and a second sample spatial feature; combining the semantic features of the second sample with the spatial features of the second sample to obtain sample combination features;

matching the sample combination characteristics based on the matching network to obtain a sample matching result;

and training the text matching model based on the matching result of the first sample text data and the second sample text data and the sample matching result.

12. A text data processing apparatus, characterized in that the apparatus comprises:

13. The apparatus of claim 12, wherein the information obtaining module comprises:

14. A computer device, characterized in that the computer device comprises a processor and a memory, wherein at least one computer program is stored in the memory, and the at least one computer program is loaded and executed by the processor to implement the operations performed in the text data processing method according to any one of claims 1 to 11.

15. A computer-readable storage medium, having stored therein at least one computer program, which is loaded and executed by a processor, to implement the operations performed in the text data processing method according to any one of claims 1 to 11.