CN114970446B

CN114970446B - Text conversion display method and device, equipment, medium and product thereof

Info

Publication number: CN114970446B
Application number: CN202210823578.1A
Authority: CN
Inventors: 陈柯树; 罗伟杰; 陈永红; 彭勇; 何锦源; 王天星
Original assignee: Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Current assignee: Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Priority date: 2022-07-14
Filing date: 2022-07-14
Publication date: 2022-11-01
Anticipated expiration: 2042-07-14
Also published as: CN114970446A

Abstract

The application discloses a text conversion display method, a text conversion display device, a computer device and a storage medium, wherein the method comprises the following steps: reading content information in the graphic file; based on a preset density clustering algorithm, carrying out spatial clustering on the content information to generate a first distribution graph of the content information, wherein the first distribution graph comprises a plurality of content groups; inputting the first distribution diagram into a preset diagram neural network, and performing layout updating on the first distribution diagram to generate a second distribution diagram; and screening a target distribution graph corresponding to the content information in the first distribution graph and the second distribution graph according to a preset attention mechanism. The entities with similar semantics are merged and partitioned by a clustering method of density and a graph neural network, and finally the effect of adaptively dynamically partitioning the whole document can be achieved, so that the displayed text is regular and regular, and the readability is enhanced.

Description

Text conversion display method and device, equipment, medium and product thereof

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a text conversion display method, a text conversion display apparatus, a computer device, and a storage medium.

Background

Under the modern digitalized wave, the storage form of the main business data of the financial science and technology and even various industries is generally converted into an unformatted file represented by the forms of PDF, scanned parts, pictures and the like by online paper files, so that the aims of saving corresponding cost, improving daily work efficiency and the like are fulfilled.

Most of the existing document information extraction related technologies are used for processing documents of a specific format in a targeted manner, the documents adopt an artificial + statistical mode, and the final element information extraction is completed through a machine learning algorithm or statistical learning of small samples after format features are extracted by providing priori experience manually for data pre-labeling.

The inventor of the invention finds out in research that: the technical characteristics of the prior art determine that the text content can only be extracted, the extracted characters cannot be divided according to the display form of the text content, the extracted content is disordered, and the readability is poor.

Disclosure of Invention

In order to solve the above technical problem, embodiments of the present application provide a method, an apparatus, a device, and a storage medium for content-divided display of converted text information.

In order to achieve the above object, the present application provides a text conversion display method, including:

reading content information in a graphic file, wherein the content information is generated by splicing text information and spatial position information of each element in the graphic file, and the text information and the spatial position are extracted by a preset graphic-text conversion model;

based on a preset density clustering algorithm, carrying out spatial clustering on the content information to generate a first distribution graph of the content information, wherein the first distribution graph comprises a plurality of content groups;

inputting the first distribution diagram into a preset diagram neural network, and performing layout updating on the first distribution diagram to generate a second distribution diagram;

screening a target distribution graph corresponding to the content information in the first distribution graph and the second distribution graph according to a preset attention mechanism, wherein the attention mechanism generates layout scores of the first distribution graph and the second distribution graph, and the target distribution graph is the layout graph with the highest layout score in the first distribution graph and the second distribution graph.

Optionally, the reading the content information in the graphic file includes:

extracting text information of characters in the graphic file and spatial position information corresponding to the text information;

and splicing the text information and the spatial position information to generate the content information.

Optionally, the spatially clustering the content information based on a preset density clustering algorithm, and generating the first distribution graph of the content information includes:

performing spatial clustering on the content information based on the density clustering algorithm to generate a plurality of clustering clusters;

respectively displaying the plurality of clustering clusters in a distinguishing manner to generate a plurality of content groups;

generating the first profile from the plurality of content packets.

Optionally, the graph neural network comprises: the first graph neural network inputs the first distribution graph into a preset graph neural network, and the layout updating of the first distribution graph to generate a second distribution graph comprises the following steps:

inputting the first distribution diagram into the first diagram neural network, and respectively performing layout updating on the content among the content components in the first distribution diagram to generate the second distribution diagram.

Optionally, the graph neural network comprises: the second graph neural network, after inputting the first distribution graph into the first graph neural network, and respectively performing layout update on the content among the content components in the first distribution graph, and generating the second distribution graph, includes:

inputting the second distribution diagram into the second diagram neural network, and respectively performing layout updating on elements in each content group in the second distribution diagram to generate a third distribution diagram.

Optionally, the inputting the first distribution graph into the first graph neural network, and performing layout update on contents among the content components in the first distribution graph respectively, and the generating the second distribution graph includes:

inputting the first distribution graph into the first graph neural network, and sequentially reading the vertex groups of each content group;

calculating a first attention coefficient and a first feature distance between the vertex packet and the content packet adjacent to the vertex packet;

performing weighted calculation on the first attention coefficient and the first feature distance to generate a first similar feature of the vertex group;

and performing layout updating on the content among the content groups according to the first similar characteristics to generate the second distribution diagram.

Optionally, the inputting the second distribution graph into the second graph neural network, and performing layout update on elements in each content group in the second distribution graph respectively, and the generating a third distribution graph includes:

inputting the second distribution graph into the second graph neural network, and sequentially reading vertex elements in each content packet;

calculating a second attention coefficient and a second feature distance between the vertex element and its neighboring elements;

performing weighted calculation on the second attention coefficient and the second feature distance to generate a second similar feature of the vertex element;

and carrying out layout updating on elements in each content group according to the second similar characteristics to generate the third distribution diagram.

Optionally, the attention mechanism includes an attention scoring model, and the screening the target distribution map corresponding to the content information in the first distribution map and the second distribution map according to a preset attention mechanism includes:

inputting the first distribution map and the second distribution map into the attention scoring model respectively, and generating layout scores of the first distribution map and the second distribution map;

and selecting the distribution graph with the highest layout score as the target distribution graph.

inputting the first distribution diagram, the second distribution diagram and the third distribution diagram into the attention scoring model respectively, and generating layout scores of the first distribution diagram, the second distribution diagram and the third distribution diagram;

and selecting the distribution graph with the highest score of the layout as the target distribution graph.

To achieve the above object, the present application also provides a text conversion display device including:

the reading module is used for reading content information in a graphic file, wherein the content information is generated by splicing text information and spatial position information of each element in the graphic file, and the text information and the spatial position are extracted by a preset graphic conversion model;

the clustering module is used for carrying out spatial clustering on the content information based on a preset density clustering algorithm to generate a first distribution graph of the content information, wherein the first distribution graph comprises a plurality of content groups;

the processing module is used for inputting the first distribution diagram into a preset diagram neural network and performing layout updating on the first distribution diagram to generate a second distribution diagram;

and the execution module is used for screening a target distribution map corresponding to the content information in the first distribution map and the second distribution map according to a preset attention mechanism, wherein the attention mechanism generates the layout scores of the first distribution map and the second distribution map, and the target distribution map is the layout map with the highest layout score in the first distribution map and the second distribution map.

Optionally, the text conversion display device further includes:

the first extraction submodule is used for extracting text information of characters in the graphic file and spatial position information corresponding to the text information;

and the first generation submodule is used for splicing the text information and the spatial position information to generate the content information.

Optionally, the text conversion display device further includes:

the first clustering submodule is used for carrying out spatial clustering on the content information based on the density clustering algorithm to generate a plurality of clustering clusters;

the second generation submodule is used for respectively carrying out differentiated display on the plurality of clustering clusters to generate a plurality of content groups;

a first execution submodule to generate the first distribution graph from the plurality of content packets.

Optionally, the graph neural network comprises: the first graph neural network, the text conversion display device further includes:

and the first processing submodule is used for inputting the first distribution diagram into the first diagram neural network, respectively performing layout updating on the content among the content components in the first distribution diagram, and generating the second distribution diagram.

Optionally, the graph neural network comprises: a second graph neural network, the text conversion display device further comprising:

and the second processing submodule is used for inputting the second distribution diagram into the second diagram neural network, and respectively performing layout updating on elements in each content group in the second distribution diagram to generate a third distribution diagram.

Optionally, the text conversion display device further includes:

a first input submodule, configured to input the first distribution graph into the first graph neural network, and sequentially read vertex groups of each content group;

a first calculation submodule for calculating a first attention coefficient and a first feature distance between the vertex packet and the content packet adjacent thereto;

the third processing submodule is used for performing weighted calculation on the first attention coefficient and the first feature distance to generate a first similar feature of the vertex grouping;

and the second execution submodule is used for carrying out layout updating on the content among the content groups according to the first similar characteristic to generate the second distribution diagram.

Optionally, the text conversion display device further comprises:

a second input submodule, configured to input the second distribution graph into the second graph neural network, and sequentially read vertex elements in each content packet;

a second calculation submodule for calculating a second attention coefficient and a second feature distance between the vertex element and its neighboring elements;

the fourth processing submodule is used for performing weighted calculation on the second attention coefficient and the second feature distance to generate a second similar feature of the vertex element;

and the third execution submodule is used for carrying out layout updating on elements in each content group according to the second similar characteristics to generate the third distribution diagram.

Optionally, the attention mechanism includes an attention scoring model, and the text conversion display device further includes:

a fifth processing sub-module, configured to input the first distribution map and the second distribution map into the attention scoring model, respectively, and generate layout scores of the first distribution map and the second distribution map;

and the fourth execution submodule is used for selecting the distribution diagram with the highest layout scoring score as the target distribution diagram.

a sixth processing submodule, configured to input the first distribution map, the second distribution map, and the third distribution map into the attention scoring model, respectively, and generate layout scores of the first distribution map, the second distribution map, and the third distribution map;

and the fifth execution sub-module is used for selecting the distribution map with the highest layout scoring score as the target distribution map.

To achieve the above object, the present application also provides a computer device, which includes a memory and a processor; the memory for storing a computer program; the processor is configured to execute the computer program and implement the text conversion display method according to any one of the embodiments of the present application when executing the computer program.

To achieve the above object, the present application further provides a computer-readable storage medium storing a computer program, which when executed by a processor, causes the processor to implement any one of the text conversion display methods provided in the embodiments of the present application.

The beneficial effects of the embodiment of the application are that: when text conversion is performed, the extracted content information is subjected to spatial clustering through a density clustering algorithm, and a first distribution graph grouped by different contents is generated. Then, each content group in the first distribution diagram and the elements in the content group are updated through the graph neural network, and an updated second distribution diagram is generated. Finally, a distribution diagram with the optimal parts is selected from the first distribution diagram and the second distribution diagram through an attention mechanism to serve as a target distribution diagram. The density clustering method and the graph neural network are used for merging and partitioning entities with similar semantics, so that the effect of adaptively dynamically dividing the whole document can be finally achieved, the displayed text is neat and regular, and the readability is enhanced.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic basic flow chart of a text conversion display method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a basic structure of a text conversion display device according to an embodiment of the present application;

fig. 3 is a block diagram of a basic structure of a computer device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As used herein, a "terminal" includes both wireless signal receiver devices, which include only wireless signal receiver devices without transmit capability, and receiving and transmitting hardware devices, which include receiving and transmitting hardware devices capable of performing two-way communication over a two-way communication link, as will be understood by those skilled in the art. Such a device may include: a cellular or other communications device having a single line display or a multi-line display or a cellular or other communications device without a multi-line display; PCS (Personal Communications Service), which may combine voice, data processing, facsimile and/or data communication capabilities; a PDA (Personal Digital Assistant) which may include a radio frequency receiver, a pager, internet/intranet access, web browser, notepad, calendar and/or GPS (Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other device having and/or including a radio frequency receiver. As used herein, a "terminal" may be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or situated and/or configured to operate locally and/or in a distributed fashion at any other location(s) on earth and/or in space. The "terminal" used herein may also be a communication terminal, a web-enabled terminal, a music/video playing terminal, such as a PDA, an MID (Mobile Internet Device) and/or a Mobile phone with music/video playing function, and may also be a smart tv, a set-top box, etc.

The hardware referred to by the names "server", "client", "service node", etc. in the present application is essentially an electronic device with the performance of a personal computer, and is a hardware device having necessary components disclosed by the von neumann principles such as a central processing unit (including an arithmetic unit and a controller), a memory, an input device, and an output device, in which a computer program is stored in the memory, and the central processing unit loads a program stored in an external memory into the internal memory to run, executes instructions in the program, and interacts with the input and output devices, thereby accomplishing specific functions.

It should be noted that the concept of "server" as referred to in this application can be extended to the case of a server cluster. According to the network deployment principle understood by those skilled in the art, the servers should be logically divided, and in physical space, the servers may be independent from each other but can be called through an interface, or may be integrated into one physical computer or a set of computer clusters. Those skilled in the art should understand this variation and should not be so constrained as to implement the network deployment of the present application.

One or more technical features of the present application, unless expressly specified otherwise, may be deployed to a server for implementation by a client remotely invoking an online service interface provided by a capture server for access, or may be deployed directly and run on the client for access.

Unless specified in clear text, the neural network model referred to or possibly referred to in the application can be deployed in a remote server and used for remote call at a client, and can also be deployed in a client with qualified equipment capability for direct call.

Various data referred to in the present application may be stored in a server remotely or in a local terminal device unless specified in the clear text, as long as the data is suitable for being called by the technical solution of the present application.

The person skilled in the art will know this: although the various methods of the present application are described based on the same concept so as to be common to each other, they may be independently performed unless otherwise specified. In the same way, for each embodiment disclosed in the present application, it is proposed based on the same inventive concept, and therefore, concepts of the same expression and concepts of which expressions are different but are appropriately changed only for convenience should be equally understood.

Unless expressly stated otherwise, the technical features of the embodiments disclosed in the present application may be cross-linked to form a new embodiment, so long as the combination does not depart from the spirit of the present application and can satisfy the requirements of the prior art or solve the disadvantages of the prior art. Those skilled in the art will appreciate variations to this.

Referring to fig. 1, fig. 1 is a basic flowchart of the file conversion method according to the embodiment.

As shown in the figure 1 of the drawings,

s1100, reading content information in the graphic file;

in this embodiment, when converting a paper document into an electronic document, the paper document needs to be converted into a graphic document. When the paper file is converted into the graphic file, the paper file can be converted in a scanning or photographing mode.

And after the generated graphic file is read, extracting the text information in the graphic file through a graphic-text conversion model. In some embodiments, the teletext model is a spatial graph neural network model, and when the text information is read, the spatial position information of each character representation element in the text information is read, that is, each element in the teletext model has two vector information, namely a text vector representing element visual information and a spatial position vector representing element spatial position information.

After the text information and the spatial position information of each element are obtained through the image-text conversion model, the text information and the spatial position information of each element are spliced, and the text information and the spatial position information of each element are connected together. The content information of the graphic file is generated by connecting the text information and the spatial position information, and thus, the content information of the graphic file includes information contents of two dimensions, constituting a 2-dimensional spatial vector. In the subsequent processing process through the 2-dimensional space vector, not only the text vector of each element can be read, but also the spatial relationship between each element can be calculated through the spatial position information.

S1200, based on a preset density clustering algorithm, carrying out spatial clustering on the content information to generate a first distribution graph of the content information, wherein the first distribution graph comprises a plurality of content groups;

and after the content information of the graph file is read, clustering the content information in a spatial clustering mode through a density clustering algorithm. The density clustering algorithm can be (without limitation): DBSCAN (sensitivity Based Spatial Clustering of Application with Noise) algorithm, OPTICS (Ordering Points To identity the Clustering Structure) algorithm, or DENCLUE (Density Based CLUstEring) algorithm.

The content information is clustered through a density clustering algorithm to generate a plurality of clustering clusters, and the number of the clustering clusters can be 2, 3, 4 or more. Each cluster characterizes a content grouping in the content information, each content grouping comprising a plurality of elements.

After cluster clusters are generated, each cluster is displayed in a distinguishing mode: and covering color block masks on the content groups represented by each cluster, so that different content groups have different color groups. The differentiated display is not limited to this, and depending on the specific application scenario, in some embodiments, the differentiated display can also be: a differentiated display of character colors, a differentiated display of font shapes, a differentiated display of font sizes, and the like.

After each content group is rendered and displayed in a differentiated manner, a first distribution diagram is formed by a plurality of content groups. The first distribution pattern includes a plurality of content packets distributed over different location areas of the background page. In some embodiments, in order to make the content readability of the first distribution map stronger, when the content packets are distributed, the distribution positions of the content packets with different lengths are set, so that the first distribution map is more reasonable to construct. In some embodiments, when the content groups are distributed, the textual meaning of each content group representation is identified, for example, the content of the content group is identified as: the text paragraphs such as title, abstract and text, then according to the common reading habit of people, the different content groups are arranged according to the paragraphs.

S1300, inputting the first distribution diagram into a preset diagram neural network, and performing layout updating on the first distribution diagram to generate a second distribution diagram;

after the content information is clustered through a density clustering algorithm to generate a first distribution graph, the first distribution graph is input into a preset graph neural network, and the graph neural network in the embodiment is trained to a convergence state in advance and used for updating content division of the first distribution graph.

In this embodiment, the graph neural network can be a first graph neural network, and specifically, the first graph neural network is Ex-GAT. The first graph neural network is used for performing grouping update on the content groups in the first distribution graph. The first graph neural network only splits or merges the content groups in the first distribution graph, and does not adjust the detail of the elements in the content of each group. Specifically, after reading the content groups of the first distribution graph, the first graph neural network sequentially takes each content group as a vertex group, takes the position relation between each vertex group and the adjacent group thereof as an edge, calculates a first attention coefficient between the vertex group and the adjacent content group thereof, namely calculates a first characteristic distance between the vertex group and the adjacent content group thereof, generates a correlation coefficient between the vertex group and any adjacent content group thereof by taking the sum of the characteristic distance between the vertex group and any adjacent content group thereof and all the first characteristic distances as a ratio, and maps the correlation coefficient in a numerical value interval of o-1 after activating the correlation coefficient, thereby generating the first attention coefficient between the vertex group and any adjacent content group thereof.

After a first attention coefficient and a first feature distance between a vertex packet and its adjacent content packet are generated, the first attention coefficient and the first feature distance are weighted to generate a first similarity feature of the vertex packet. And sequentially grouping each content group as a vertex group, and calculating a first similarity characteristic of each vertex group.

And after the first similarity characteristic of each vertex group is obtained through calculation, the first distribution diagram is subjected to layout updating according to the first similarity characteristic, and the updating mode is that according to the first similarity characteristic of each vertex group, content groups with the first similarity characteristic difference value smaller than the group threshold value are combined to generate a second distribution diagram of the content information. The setting of the grouping threshold can be arbitrarily set according to the actual needs of the user, and can be adaptively adjusted according to different application scenarios.

In some embodiments, the graph neural network can be a graph that further comprises a second graph neural network, in particular, the first graph neural network is In-GAT. The second graph neural network is used for performing layout updating on elements in each content group in the second distribution graph. The second graph neural network is trained to a convergence state in advance and is used for adjusting and updating the elements in the second distribution graph content grouping in the content grouping component.

Specifically, the second graph neural network reads the content packets of the second distribution graph, and then sequentially processes with the single content packets as the calculation units. And sequentially reading content, grouping each element as a vertex element, taking the position relation between each vertex element and adjacent elements thereof as an edge, calculating a second attention coefficient between the vertex elements, namely calculating a second characteristic distance between the vertex element and the adjacent elements thereof, then generating a correlation coefficient between the vertex element and any adjacent element thereof by taking the ratio of the characteristic distance between the vertex element and any adjacent element thereof to the sum of all the second characteristic distances, and mapping the correlation coefficient in a numerical value interval of o-1 after activating the correlation coefficient to generate the second attention coefficient between the vertex element and any adjacent element thereof.

And after generating a second attention coefficient and a second feature distance between the vertex element and the adjacent element, performing weighted calculation on the second attention coefficient and the second feature distance to generate a second similarity feature of the vertex element. And taking each element as a vertex element in turn, and calculating a second similarity characteristic of each element.

And after the second similarity characteristic of each vertex element is obtained through calculation, the second distribution diagram is subjected to layout updating according to the second similarity characteristic, and the updating mode is that according to the second similarity characteristic of each vertex element, elements with the second similarity characteristic phase difference value smaller than the element threshold value are combined to generate a third distribution diagram of the content information. The setting of the element threshold can be arbitrarily set according to the actual needs of the user, and can also be adaptively adjusted according to different application scenes.

In some embodiments, the graph neural network is not limited thereto, and according to different application scenarios, to meet the adaptive requirements of the scenario, the graph neural network can also be (without limitation): graph Convolutional Networks (Graph relational Networks), graph Attention Networks (Graph Attention Networks), graph Auto-coders (Graph Auto-encoders), graph generation Networks (Graph generating Networks), or Graph space-time Networks (Graph Spatial-Temporal Networks). The number of profiles is not limited to the first or second profile, and the profiles can be increased in correspondence with each other according to the number of neural networks.

S1400, screening a target distribution graph corresponding to the content information in the first distribution graph and the second distribution graph according to a preset attention mechanism.

After the first distribution diagram and the second distribution diagram are generated, the first distribution diagram and the second distribution diagram need to be subjected to image scoring, and the scoring mode is to calculate the layout scoring of the first distribution diagram and the second distribution diagram through an attention mechanism.

The attention mechanism includes an attention scoring model, which is an attention model trained to a convergent state in advance for scoring the layout of the image. The attention mechanism can also be (without limitation): multi-head Self-Attention mechanism (Multi-head Self-Attention), SNAIL (Self-Attention generation versus network Self-Attention GAN), or Neural Turing Machines (Neural learning Machines).

The first distribution diagram and the second distribution diagram are respectively input into an attention scoring model, and the attention scoring model respectively calculates the layout scores of the first distribution diagram and the second distribution diagram.

And after the layout scores of the first layout and the second layout are calculated, determining the layout with the highest layout score as a target layout. Since the second distribution pattern is adjusted on the basis of the first distribution pattern, there is a certain probability that an overfitting problem occurs during the process, and the overfitting process causes the distribution of the second distribution pattern to be less reasonable than that of the first distribution pattern. After the target distribution diagram is obtained by screening the first distribution diagram and the second distribution diagram, the overfitting problem existing when the second distribution diagram is directly used as the target distribution diagram can be avoided.

In some embodiments, the first, second, and third profiles are input into an attention scoring model, respectively, which calculates a layout score for the first, second, and third profiles, respectively.

And after the layout scores of the first layout diagram, the second layout diagram and the third layout diagram are obtained through calculation, determining a distribution diagram with the highest layout score as a target distribution diagram. Since the second profile is adjusted on the basis of the first profile, the third profile is updated on the basis of the second profile. There is a certain probability of overfitting during these two processes, which results in the second distribution profile being less reasonable in distribution than the first distribution profile, and the third distribution profile being less reasonable in distribution than either the first or second distribution profile. After the first distribution diagram, the second distribution diagram and the third distribution diagram are screened to obtain the target distribution diagram, the overfitting problem existing when the third distribution diagram is directly used as the target distribution diagram can be avoided.

When the number of the distribution diagrams is not limited to the first distribution diagram, the second distribution diagram and the third distribution diagram, other distribution diagrams are sequentially input into the idea scoring model, the distribution scores are scored, and then the distribution diagram with the highest score is selected from the plurality of distribution scores to serve as the target distribution diagram.

In the present embodiment, the density clustering algorithm is followed by designing a cross entropy loss function as its loss function. The loss function of the first graph neural network is:

where N is the number of vertex groupings, A represents the adjacency matrix, 194represents the predicted adjacency matrix, and ξ is a constant of size 1 e-7.

The loss function for the second graph neural network is:

where assign represents the grouping matrix where the vertex element is located.

The loss functions of the first graph neural network and the second graph neural network are regularization loss functions, so that the difference of clustering results can be enlarged, and insufficient entity clustering can be prevented.

In the above embodiment, when text conversion is performed, the extracted content information is spatially clustered by the density clustering algorithm, and a first distribution graph grouped by different contents is generated. Then, each content group in the first distribution diagram and the elements in the content group are updated through the graph neural network, and an updated second distribution diagram is generated. Finally, the distribution diagram with the optimal parts is selected from the first distribution diagram and the second distribution diagram through an attention mechanism to serve as a target distribution diagram. The density clustering method and the graph neural network are used for merging and partitioning entities with similar semantics, so that the effect of adaptively dynamically dividing the whole document can be finally achieved, the displayed text is neat and regular, and the readability is enhanced.

In some implementations, the content information includes textual information and spatial location information. Specifically, S1100 includes:

s1111, extracting text information of characters in the graphic file and spatial position information corresponding to the text information;

and after the generated graphic file is read, extracting the text information in the graphic file through a graphic-text conversion model. In some embodiments, the teletext model is a spatial map neural network model, and when the text information is read, the spatial position information of each character representation element in the text information is read, that is, each element in the teletext model has two vector information, namely a text vector representing visual information of the element and a spatial position vector representing spatial position information of the element.

S1112, the text information and the spatial position information are spliced to generate the content information.

After the text information and the space position information of each element are obtained through the image-text conversion model, the text information and the space position information of each element are spliced, and the text information and the space position information of each element are connected together. The content information of the graphic file is generated by connecting the text information and the spatial position information, and therefore, the content information of the graphic file includes information contents of two dimensions, which constitute a 2-dimensional space vector. In the subsequent processing process through the 2-dimensional space vector, not only the text vector of each element can be read, but also the space relation between each element can be calculated through the space position information.

In some implementations, the first distribution graph includes a plurality of content packets. Specifically, S1200 includes:

s1211, performing spatial clustering on the content information based on the density clustering algorithm to generate a plurality of clustering clusters;

the content information is clustered through a density clustering algorithm to generate a plurality of clustering clusters, and the number of the clustering clusters can be 2, 3, 4 or more. Each cluster characterizes a content grouping in the content information, each content grouping including a plurality of elements.

S1212, respectively displaying the plurality of clustering clusters in a differentiated manner to generate a plurality of content groups;

after cluster clusters are generated, each cluster is differentially displayed, and the differential display mode is as follows: and covering color block masks on the content groups represented by each cluster, so that different content groups have different color groups. The differentiated display is not limited to this, and according to the specific application scenario, in some embodiments, the differentiated display can also be: a character color distinction display, a font shape distinction display, a font size distinction display, and the like.

S1213, generating the first distribution graph from the plurality of content packets.

After each content group is rendered and displayed in a differentiated mode, a first distribution graph is composed of a plurality of content groups. The first distribution pattern includes a plurality of content packets distributed over different location areas of the background page. In some embodiments, in order to make the content readability of the first distribution map stronger, when the content packets are distributed, the distribution positions of the content packets with different lengths are set, so that the first distribution map is more reasonable to construct. In some embodiments, when the content groups are distributed, the textual meaning of each content group representation is identified, for example, the content of the content group is identified as: the text paragraphs such as title, abstract and text, then according to the common reading habit of people, the different content groups are arranged according to the paragraphs.

In some embodiments the graph neural network comprises: a first graph neural network. Specifically, S1300 includes:

s1310, inputting the first distribution map into the first graph neural network, and performing layout update on the content among the content components in the first distribution map, respectively, to generate the second distribution map.

Further, S1310 includes:

s1311, inputting the first distribution graph into the first graph neural network, and sequentially reading vertex groups of each content group;

in this embodiment, the graph neural network can be a first graph neural network, and specifically, the first graph neural network is Ex-GAT. The first graph neural network is used for performing grouping updating on the content groups in the first distribution graph. The first graph neural network only splits or merges the content groups in the first distribution graph and does not perform detail adjustment on elements in the content of each group. After the first graph neural network reads the content groupings of the first profile, each content grouping is taken as a vertex grouping in turn.

S1312, calculating a first attention coefficient and a first characteristic distance between the vertex packet and the adjacent content packet;

the method comprises the steps of taking the position relation between each vertex group and the adjacent group thereof as an edge, calculating a first attention coefficient between the vertex group and the adjacent content group thereof, namely calculating a first characteristic distance between the vertex group and the adjacent content group thereof, then generating a correlation coefficient between the vertex group and any adjacent content group thereof by taking the sum of the characteristic distance between the vertex group and any adjacent content group thereof and all the first characteristic distances as a ratio, mapping the correlation coefficient in a numerical value interval of 0-1 after activating the correlation coefficient, and generating the first attention coefficient between the vertex group and any adjacent content group thereof.

S1313, performing weighted calculation on the first attention coefficient and the first feature distance to generate a first similar feature of the vertex group;

after generating a first attention coefficient and a first feature distance between a vertex packet and its adjacent content packet, performing weighted calculation on the first attention coefficient and the first feature distance to generate a first similarity feature of the vertex packet. And taking each content group as a vertex group in turn, and calculating a first similarity characteristic of each vertex group.

And S1314, performing layout updating on the content among the content groups according to the first similar characteristics to generate a second distribution graph.

In some embodiments, the graph neural network can be a graph neural network that further includes a second graph neural network. Specifically, S1300 includes:

s1320, inputting the second distribution diagram into the second diagram neural network, and respectively performing layout updating on elements in each content group in the second distribution diagram to generate a third distribution diagram.

Further, S1320 further includes:

s1321, inputting the second distribution diagram into the second graph neural network, and sequentially reading vertex elements in each content group;

the graph neural network can be a network that further comprises a second graph neural network, in particular the first graph neural network is In-GAT. The second graph neural network is used for performing layout updating on elements in each content group in the second distribution graph. The second graph neural network is trained to a convergence state in advance and used for adjusting and updating the elements in the second distribution graph content grouping in the content grouping component.

S1322, calculating a second attention coefficient and a second characteristic distance between the vertex element and the adjacent element;

the second graph neural network reads the content packets of the second distribution graph and then processes the content packets sequentially with the single content packets as computing units. And sequentially reading content, grouping each element as a vertex element, taking the position relation between each vertex element and adjacent elements thereof as an edge, calculating a second attention coefficient between the vertex elements, namely calculating a second characteristic distance between the vertex element and the adjacent elements thereof, then generating a correlation coefficient between the vertex element and any adjacent element thereof by taking the ratio of the characteristic distance between the vertex element and any adjacent element thereof to the sum of all the second characteristic distances, and mapping the correlation coefficient in a numerical value interval of o-1 after activating the correlation coefficient to generate the second attention coefficient between the vertex element and any adjacent element thereof.

S1323, performing weighted calculation on the second attention coefficient and the second feature distance to generate a second similar feature of the vertex element;

S1324, carrying out layout updating on elements in each content group according to the second similar characteristics to generate the third distribution diagram.

And after the second similarity characteristic of each vertex element is obtained through calculation, the second distribution diagram is subjected to layout updating according to the second similarity characteristic, and the updating mode is that elements with the second similarity characteristic difference value smaller than the element threshold value are combined according to the second similarity characteristic of each vertex element to generate a third distribution diagram of the content information. The setting of the element threshold can be arbitrarily set according to the actual needs of the user, and can be adaptively adjusted according to different application scenes.

In some embodiments, the target profile is screened in the first profile and the second profile. Specifically, S1400 includes:

s1411, inputting the first distribution map and the second distribution map into the attention scoring model respectively, and generating layout scores of the first distribution map and the second distribution map;

the attention mechanism includes an attention scoring model, which is an attention model pre-trained to a converged state for scoring the layout of the image.

And S1412, selecting the distribution graph with the highest grading score of the layout as the target distribution graph.

And after the layout scores of the first layout diagram and the second layout diagram are obtained through calculation, determining the distribution diagram with the highest layout score as a target distribution diagram. Since the second distribution pattern is obtained by adjusting on the basis of the first distribution pattern, there is a certain probability that an overfitting problem occurs in the process, and the overfitting process causes the distribution of the second distribution pattern to be less reasonable than that of the first distribution pattern. After the target distribution diagram is obtained by screening the first distribution diagram and the second distribution diagram, the overfitting problem existing when the second distribution diagram is directly used as the target distribution diagram can be avoided.

In some embodiments, the target profile is screened among the first profile, the second profile, and the third profile. Specifically, S1400 further includes:

s1421, inputting the first distribution diagram, the second distribution diagram and the third distribution diagram into the attention scoring model respectively, and generating layout scores of the first distribution diagram, the second distribution diagram and the third distribution diagram;

the attention mechanism includes an attention scoring model, which is an attention model trained to a convergent state in advance for scoring the layout of the image.

Inputting the first distribution diagram, the second distribution diagram and the third distribution diagram into an attention scoring model respectively, and calculating the layout scores of the first distribution diagram, the second distribution diagram and the third distribution diagram by the attention scoring model respectively.

S1422, selecting the distribution graph with the highest layout scoring score to be the target distribution graph.

Referring to fig. 2, fig. 2 is a schematic diagram of a basic structure of the text conversion display device according to the embodiment.

As shown in fig. 2, a text conversion display device includes: a reading module 1100, a clustering module 1200, a processing module 1300, and an execution module 1400. The reading module 1100 is used for reading content information in the graphic file; the clustering module 1200 is configured to perform spatial clustering on the content information based on a preset density clustering algorithm to generate a first distribution graph of the content information, where the first distribution graph includes a plurality of content groups; the processing module 1300 is configured to input the first distribution map into a preset map neural network, and perform layout update on the first distribution map to generate a second distribution map; the execution module 1400 is configured to screen a target distribution map corresponding to the content information in the first distribution map and the second distribution map according to a preset attention mechanism.

When the text conversion display device performs text conversion, the extracted content information is spatially clustered by a density clustering algorithm to generate a first distribution graph grouped by different contents. Then, each content group in the first distribution diagram and the elements in the content group are updated through the graph neural network, and an updated second distribution diagram is generated. Finally, the distribution diagram with the optimal parts is selected from the first distribution diagram and the second distribution diagram through an attention mechanism to serve as a target distribution diagram. The density clustering method and the graph neural network are used for merging and partitioning entities with similar semantics, so that the effect of adaptively dynamically dividing the whole document can be finally achieved, the displayed text is neat and regular, and the readability is enhanced.

Optionally, the text conversion display device further includes:

the first clustering sub-module is used for carrying out spatial clustering on the content information based on the density clustering algorithm to generate a plurality of clustering clusters;

and the first processing submodule is used for inputting the first distribution diagram into the first diagram neural network, and respectively performing layout updating on contents among the content components in the first distribution diagram to generate the second distribution diagram.

Optionally, the text conversion display device further includes:

and the second execution submodule is used for carrying out layout updating on the content among the content groups according to the first similar characteristics to generate the second distribution diagram.

Optionally, the text conversion display device further comprises:

a fifth processing submodule, configured to input the first distribution map and the second distribution map into the attention scoring model, respectively, and generate layout scores of the first distribution map and the second distribution map;

a sixth processing submodule, configured to input the first distribution diagram, the second distribution diagram, and the third distribution diagram into the attention scoring model, respectively, and generate layout scores of the first distribution diagram, the second distribution diagram, and the third distribution diagram;

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 3 in detail, fig. 3 is a block diagram of a basic structure of a computer device according to the embodiment.

As shown in fig. 3, the internal structure of the computer device is schematic. The computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected by a system bus. The non-volatile storage medium of the computer device stores an operating system, a database, and computer readable instructions, the database may store control information sequences, and the computer readable instructions, when executed by the processor, may cause the processor to implement a text conversion display method. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, cause the processor to perform a text conversion presentation. The network interface of the computer device is used for connecting and communicating with the terminal. It will be appreciated by those skilled in the art that the configuration shown in fig. 3 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In this embodiment, the processor is configured to execute specific functions of the reading module 1100, the clustering module 1200, the processing module 1300, and the executing module 1400 in fig. 2, and the memory stores program codes and various data required for executing the modules. The network interface is used for data transmission to and from a user terminal or a server. The memory in this embodiment stores program codes and data necessary for executing all the submodules in the text conversion display device, and the server can call the program codes and data of the server to execute the functions of all the submodules.

When the computer equipment performs text conversion, the extracted content information is subjected to spatial clustering through a density clustering algorithm, and a first distribution graph grouped by different contents is generated. Then, each content group in the first distribution diagram and the elements in the content group are updated through the graph neural network, and an updated second distribution diagram is generated. Finally, a distribution diagram with the optimal parts is selected from the first distribution diagram and the second distribution diagram through an attention mechanism to serve as a target distribution diagram. The density clustering method and the graph neural network are used for merging and partitioning entities with similar semantics, so that the effect of adaptively dynamically dividing the whole document can be finally achieved, the displayed text is neat and regular, and the readability is enhanced.

The present application further provides a storage medium storing computer readable instructions, which when executed by one or more processors, cause the one or more processors to perform the steps of any of the above embodiments of the text conversion display method.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program, which may be stored in a computer readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

Those of skill in the art will appreciate that the various operations, methods, steps in the processes, acts, or solutions discussed in this application can be interchanged, modified, combined, or eliminated. Further, various operations, methods, steps, measures, schemes in the various processes, methods, procedures that have been discussed in this application may be alternated, modified, rearranged, decomposed, combined, or eliminated. Further, steps, measures, schemes in the prior art having various operations, methods, procedures disclosed in the present application may also be alternated, modified, rearranged, decomposed, combined, or deleted.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. A text conversion display method, comprising:

2. The text conversion display method according to claim 1, wherein the reading of the content information in the graphic file includes:

3. The text conversion display method according to claim 1, wherein the spatially clustering the content information based on a preset density clustering algorithm, and the generating the first distribution graph of the content information includes:

generating the first profile from the plurality of content packets.

4. The text conversion display method according to claim 1, wherein the graph neural network includes: the first graph neural network inputs the first distribution graph into a preset graph neural network, and the layout updating of the first distribution graph to generate a second distribution graph comprises the following steps:

inputting the first distribution diagram into the first graph neural network, and performing layout updating on the content among the content groups in the first distribution diagram respectively to generate the second distribution diagram.

5. The text conversion display method according to claim 4, wherein the graph neural network includes: the second graph neural network, after inputting the first distribution graph into the first graph neural network, and respectively performing layout update on the content among the content components in the first distribution graph, and generating the second distribution graph, includes:

6. The text conversion display method according to claim 4 or 5, wherein the inputting the first distribution diagram into the first graph neural network, performing layout update on the content among the content components in the first distribution diagram, and generating the second distribution diagram comprises:

7. The text conversion display method of claim 5, wherein inputting the second distribution graph into the second graph neural network, and performing layout update on elements in each content group in the second distribution graph respectively, generates a third distribution graph comprising:

inputting the second distribution graph into the second graph neural network, and sequentially reading vertex elements in each content grouping;

8. The text conversion display method according to claim 1, wherein the attention mechanism includes an attention scoring model, and the screening the target distribution map corresponding to the content information in the first distribution map and the second distribution map according to a preset attention mechanism includes:

9. The text conversion display method according to claim 5, wherein the attention mechanism includes an attention scoring model, and the screening the target distribution map corresponding to the content information in the first distribution map and the second distribution map according to a preset attention mechanism includes:

inputting the first distribution diagram, the second distribution diagram and the third distribution diagram into the attention scoring model respectively to generate layout scores of the first distribution diagram, the second distribution diagram and the third distribution diagram;

10. A text conversion display apparatus, comprising:

the reading module is used for reading content information in the graphic file, wherein the content information is generated by splicing text information and spatial position information of each element in the graphic file, and the text information and the spatial position are extracted by a preset image-text conversion model;

11. A computer device, wherein the computer device comprises a memory and a processor;

the memory for storing a computer program;

the processor is used for executing the computer program and realizing the following when the computer program is executed:

the text conversion display method according to any one of claims 1 to 9.

12. A computer-readable storage medium, characterized in that a computer program is stored, which, when executed by a processor, causes the processor to implement the text conversion display method according to any one of claims 1 to 9.