CN113901834A - Text display method and device, computer storage medium and electronic equipment - Google Patents
Text display method and device, computer storage medium and electronic equipment Download PDFInfo
- Publication number
- CN113901834A CN113901834A CN202111198145.3A CN202111198145A CN113901834A CN 113901834 A CN113901834 A CN 113901834A CN 202111198145 A CN202111198145 A CN 202111198145A CN 113901834 A CN113901834 A CN 113901834A
- Authority
- CN
- China
- Prior art keywords
- change
- text content
- text
- entity
- item
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 230000008859 change Effects 0.000 claims abstract description 405
- 238000012545 processing Methods 0.000 claims description 19
- 230000011218 segmentation Effects 0.000 claims description 12
- 230000004075 alteration Effects 0.000 claims description 10
- 238000000586 desensitisation Methods 0.000 claims description 8
- 230000015654 memory Effects 0.000 claims description 5
- 230000004048 modification Effects 0.000 description 14
- 238000012986 modification Methods 0.000 description 14
- 239000003086 colorant Substances 0.000 description 5
- 238000002372 labelling Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000005065 mining Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000003068 static effect Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Bioethics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a text display method and a device thereof, a computer storage medium and electronic equipment, wherein the text display method comprises the following steps: acquiring a text to be processed, wherein the text to be processed comprises pre-change text content and post-change text content corresponding to at least one change item; identifying entity characteristic words in the text content before and after the change corresponding to at least part of the change items; distributing change state labels for the entity characteristic words of the text content before the change and the text content after the change; and based on the entity characteristic words of the changed text content and the change state labels distributed to the entity characteristic words, formatting the text content before the change and the text content after the change respectively to obtain the formatted text content before the change and the text content after the change for displaying. The embodiment of the invention obviously and intuitively realizes the display of the text.
Description
Technical Field
The invention relates to the technical field of text processing, in particular to a text display method and device, a computer storage medium and electronic equipment.
Background
Based on a big data solution, a series of deep mining such as cleaning, analysis, sorting and the like are carried out on collected enterprise data, so that data comprehensive query or classified query service is provided, for example, enterprise-related information such as investment conditions and stockholder conditions is queried, and when the information changes, the information before and after the change can be queried. However, in the conventional technique, for a certain change item, all texts before the change are massaged and displayed together, and all texts after the change are similarly massaged and displayed together, so that the texts before and after the change are not obvious and intuitive to the user.
Disclosure of Invention
Embodiments of the present invention provide a text display method and apparatus, a computer storage medium, and an electronic device, so as to overcome or alleviate the above technical problems in the prior art.
The technical scheme adopted by the invention is as follows:
in a first aspect of the embodiments of the present invention, a text display method is provided, which includes:
acquiring a text to be processed, wherein the text to be processed comprises pre-change text content and post-change text content corresponding to at least one change item;
identifying entity characteristic words in the text content before and after the change corresponding to at least part of the change items;
distributing change state labels for the entity characteristic words of the text content before the change and the text content after the change; and
and based on the entity characteristic words of the changed text content and the change state labels distributed to the entity characteristic words, formatting the text content before the change and the text content after the change respectively to obtain the formatted text content before the change and the text content after the change for displaying.
Optionally, in an embodiment of the present invention, the identifying, for the text content before the change and the text content after the change corresponding to at least part of the changed item, an entity feature word therein includes: and performing text processing on at least part of the text content before the change and the text content after the change corresponding to at least part of the changed items based on a set semantic regularization analysis model to identify entity feature words in the text content.
Optionally, in an embodiment of the present invention, the identifying, for the text content before the change and the text content after the change corresponding to at least part of the changed item, an entity feature word therein includes: and performing character matching on at least part of the text content before the change and the text content after the change corresponding to at least part of the changed items based on the constructed first regular expression so as to identify entity characteristic words in the text content.
Optionally, in an embodiment of the present invention, the identifying, for the text content before the change and the text content after the change corresponding to at least part of the changed item, entity feature words therein includes: performing text segmentation on at least part of text content before change and text content after change corresponding to the changed items to obtain at least one text block;
the identifying of the entity feature words in the text content before the change and the text content after the change corresponding to at least part of the changed items specifically includes: and identifying entity characteristic words in each text block of the text content before the change and the text content after the change corresponding to at least part of the change items.
Optionally, in an embodiment of the present invention, the identifying, for the text content before the change and the text content after the change corresponding to at least part of the changed item, an entity feature word therein includes: respectively storing the entity characteristic words identified from the text content before the change and the text content after the change into a pre-established entity characteristic word list;
the allocating of the change state labels to the entity feature words of the text content before the change and the text content after the change comprises: and distributing change state labels for the entity characteristic words of the text content before the change and the text content after the change based on the corresponding entity characteristic word list.
Optionally, in an embodiment of the present invention, after acquiring the text to be processed, the method includes: desensitizing the text to be processed to obtain a desensitized text to be processed, wherein the desensitized text to be processed comprises desensitized text content before change and desensitized text content after change corresponding to the at least one change item;
correspondingly, for at least part of the text content before and after the change corresponding to the changed item, identifying entity feature words therein as: and identifying entity characteristic words in the desensitized pre-change text content and the desensitized post-change text content corresponding to at least part of the change items.
Optionally, in an embodiment of the present invention, the desensitizing the to-be-processed document to obtain a desensitized to-be-processed document includes: and performing desensitization treatment on the text to be processed based on the constructed second regular expression to obtain a desensitized text to be processed.
Optionally, in an embodiment of the present invention, the method further includes: and performing inner chain processing on the formatted text content before the change and the text content after the change so as to skip the access content during display.
Optionally, in an embodiment of the present invention, the method further includes: and configuring display format labels for the formatted text content before the change and the text content after the change so as to adjust the display formats of the formatted text content before the change and the text content after the change.
Optionally, in an embodiment of the present invention, after acquiring the text to be processed, the method further includes: judging whether the changed item is valid data or not, and if the changed item is valid data, identifying entity characteristic words in the text content before and after the change corresponding to the changed item; otherwise, if the changed item is invalid data, determining a difference text between the text content before the change and the text content after the change for the changed item corresponding to the invalid data, and highlighting and marking the difference text, or analyzing the text content before the change and the text content after the change to determine the valid data of the corresponding changed item, so as to return to execute the step of identifying the entity characteristic words in the text content before the change and the text content after the change corresponding to the changed item.
Optionally, in an embodiment of the present invention, the determining whether the changed item is valid data includes: the change item is matched with a preset field library, and if the matching degree of the field of the change item and any field in the preset field library is greater than or equal to the set matching degree, the change item is judged to be valid data; otherwise, if the matching degree between the field of the changed item and any field in the preset field library is smaller than the set matching degree, the changed item is considered as invalid data, wherein the condition that the matching degree between the field of the changed item and any field in the preset field library is smaller than the set matching degree includes that the field of the changed item is empty.
In a second aspect of the embodiments of the present invention, there is provided a text display apparatus, including:
the text acquisition unit is used for acquiring a text to be processed, wherein the text to be processed comprises pre-change text content and post-change text content corresponding to at least one change item;
the entity identification unit is used for identifying entity characteristic words in the text content before the change and the text content after the change corresponding to at least part of the change items;
a label distribution unit, configured to distribute a change state label to the entity feature words of the text content before the change and the text content after the change; and
and the formatting unit is used for respectively formatting the text content before the change and the text content after the change based on the entity characteristic words of the text content before the change and the text content after the change and the change state labels distributed to the entity characteristic words to obtain the formatted text content before the change and the text content after the change for displaying.
In a third aspect of the embodiments of the present invention, a computer storage medium is provided, where a computer executable program is stored on the computer storage medium, and the computer executable program is executed to implement the text presentation method according to any embodiment of the present invention.
In a fourth aspect of the embodiments of the present invention, an electronic device is provided, where the electronic device includes a memory and a processor, the memory is used for storing a computer executable program, and the processor is used for running the computer executable program to implement the text presentation method according to any one of the embodiments of the present invention.
In the embodiment of the invention, a to-be-processed text is obtained, wherein the to-be-processed text comprises pre-change text content and post-change text content corresponding to at least one change item; identifying entity characteristic words in the text content before and after the change corresponding to at least part of the change items; distributing change state labels for the entity characteristic words of the text content before the change and the text content after the change; and based on the entity characteristic words of the changed text content and the change state labels distributed to the entity characteristic words, formatting the text content before the change and the text content after the change respectively to obtain the formatted text content before the change and the text content after the change for displaying, thereby obviously and intuitively realizing the display of the text.
Drawings
Fig. 1 is a schematic view of an application scenario in a first embodiment of the present invention;
FIG. 2 is a flowchart illustrating a text display method according to a second embodiment of the present invention;
FIG. 3 is a flowchart illustrating a text display method according to a third embodiment of the present invention;
FIG. 4 is a flowchart illustrating a text display method according to a fourth embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a text display apparatus according to a fifth embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device in a sixth embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, fig. 1 is a schematic view of an application scenario in the first embodiment of the present invention; the application scenario is directed to a text display system, the text display system comprises a terminal device 101 and a text display server 102, a text display device is arranged on the text display server 102, the text display server 102 can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, big data and an artificial intelligence platform. The terminal device 101 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal device 101 and the text display server may be directly or indirectly connected through a wireless communication manner (such as a network), and the present invention is not limited thereto.
The text display server 102 is mainly configured to execute the following text display method: acquiring a text to be processed, wherein the text to be processed comprises pre-change text content and post-change text content corresponding to at least one change item; identifying entity characteristic words in the text content before and after the change corresponding to at least part of the change items; distributing change state labels for the entity characteristic words of the text content before the change and the text content after the change; based on the entity characteristic words of the changed text content and the change state labels distributed to the entity characteristic words, formatting processing is respectively carried out on the text content before the change and the text content after the change, and the formatted text content before the change and the text content after the change are obtained to be displayed, so that the text display is obviously and intuitively realized.
The change item is data related to the enterprise business information, or can be understood as data related to the enterprise business information, such as investment situation, stockholder situation and the like. The division of the change items may be specifically determined according to the requirements of the application scenario.
Illustratively, a certain Application (APP) is installed on the terminal device 101, the APP includes a display page, and the text display server 102 can push formatted text content before change and text content after change to the display page for display, so as to improve the intuitiveness of data display and increase the depth of data display.
Illustratively, the above-mentioned changed text content and changed text content are, for example, content related to the business of the enterprise, including but not limited to the addition, withdrawal, etc. of the high-head and stockholders. Of course, in other examples, the modified text content and the modified text content may be determined according to the requirements of the application scenario.
As shown in fig. 2, fig. 2 is a schematic flow chart of a text display method according to a second embodiment of the present invention; the text presentation method may be specifically executed by the text presentation server in fig. 1, and the text presentation method includes:
s201, acquiring a text to be processed;
in this embodiment, the text to be processed includes pre-change text content and post-change text content corresponding to at least one change item.
Illustratively, if the solution of the embodiment of the present invention is applied to the deep mining scenario of enterprise data, the change items include, but are not limited to, enterprise contact persons, financial personnel change items, senior management personnel change items, legal representatives change items, investors (equities) change items, etc.
Specifically, the text to be processed can be acquired by a crawler from a third-party platform capable of ensuring authority and correctness of the data, and the pre-change text content and the post-change text content corresponding to the changed item can be obtained by analyzing the crawled text to be processed.
Of course, if the solution of the embodiment of the present invention is to be applied to other scenarios, it may be determined by a person of ordinary skill in the art according to the scenarios.
S202, aiming at least part of text content before and after change corresponding to the changed items, identifying entity characteristic words in the text content;
in this embodiment, the entity feature words include keywords representing entities with specific meanings in the text content before the change and the text content after the change.
Illustratively, such as in a deep mining scenario for enterprise data, the entity feature words include natural person names, legal person names, amounts, and so on.
Here, it should be noted that, in other examples, the entity feature words may be specifically divided according to application scenarios.
In this embodiment, for the text content before the change and the text content after the change corresponding to at least part of the change items in step S202, when identifying the entity feature words therein, the method specifically includes: and based on a set semantic regularization analysis model, performing text processing on at least part of text content before and after the change corresponding to the changed items to identify entity feature words therein, so as to improve the identification accuracy of the entity feature words.
Illustratively, the semantic regularization analysis model may be specifically obtained by pre-training using sample text content.
Illustratively, the semantic regularization analysis model may be a BERT + LSTM + CRF hybrid model, where BERT is called all directional Encoder reproduction from transforms, chinese definitions are deformed bi-directional encoders, LSTM is called Long Short-Term Memory, chinese definitions are Long Short-Term Memory artificial neural networks, CRF is called all conditional random field algorithm, and chinese definitions are conditional random field algorithms.
Exemplarily, the text processing is performed on at least part of text contents before and after the change corresponding to the changed item based on the set semantic regularization analysis model to identify the entity feature words therein, and specifically includes: based on BERT, extracting keywords from the text content before change and the text content after change to form keyword vectors, based on LSTM, carrying out sequence labeling on the keyword vectors to obtain entity feature word labels, and based on CRF, decoding the entity identification labels to identify the entity feature words of the text content before change and the text content after change. The text serialization and the labeling are realized through a BERT + LSTM + CRF mixed model, so that the identification accuracy and the comprehensiveness of entity feature words are improved.
For example, in a specific application scenario, from the pre-change text content and the post-change text content of the situation that the changed item is a shareholder, it is recognized that the entity feature words are a natural person name, a legal person name, registered capital and an investment amount, a shareholder newly-added status label and a shareholder quit label are allocated, for the investment amount therein, a registered capital increase proportion label is allocated, a registered capital decrease proportion label is allocated, for example, the registered capital is changed from 100 to 200, a label (+ 100%) is added after 200, and a change in the shareholder invested amount before and after the change is also added to the changed content in the same manner.
Or, alternatively, in addition to the above-mentioned manner of identifying the entity feature words based on the set semantic regularization analysis model, in another embodiment, when the entity feature words are identified for the text content before modification and the text content after modification corresponding to at least part of the modification items in step S202, the method may also include: based on the constructed first regular expression, character matching is carried out on at least part of text content before change and text content after change corresponding to at least part of the changed items so as to identify entity feature words in the text content, and therefore the identification speed of the entity feature words is improved.
Specifically, when the first regular expression is constructed, a series of character string patterns (which may include single characters, character sets, character ranges, selections among characters, or any combination of all these components) may be constructed by analyzing the contents of the sample before modification and the contents of the sample after modification, and the characters of the contents of the text before modification and the contents of the text after modification are character-matched, so as to identify the entity feature words therein.
Or, alternatively, in other embodiments, the entity feature words are identified based on the set semantic regularization analysis model and the constructed first regular expression, that is, the following steps are performed:
optionally, in an embodiment of the present invention, the identifying, for the text content before the change and the text content after the change corresponding to at least part of the changed item, an entity feature word therein includes:
based on the constructed first regular expression, performing character matching on at least part of text content before change and text content after change corresponding to at least part of the changed items to identify entity feature words in the text content;
and performing text processing on the remaining part of the text content before the change and the remaining part of the text content after the change, which correspond to at least part of the changed items, based on a set semantic regularization analysis model to identify entity feature words in the text content.
A part of entity feature words are quickly identified through the first regular expression, and for the first regular expression which cannot be processed, the other part of entity feature words are identified through the set semantic regularization analysis model, so that the identification speed is ensured, and the identification accuracy is also ensured.
S203, distributing change state labels for the entity characteristic words of the text content before the change and the text content after the change;
for example, in this embodiment, the change status label is used to label a change of a specific entity to which the entity feature word points, and if the change of the entity content relates to a dynamic attribute of the entity, the change status label may be a dynamic change label, and if the change of the entity content relates to a static attribute, the change status label may be a static change label. For example, in a deep mining scenario for enterprise data, for a change of an investment amount or a share, a dynamic change of an entity is involved, and then the change state label may be a proportion of an increase or a decrease of the investment amount, and a proportion of an increase or a decrease of the share. For the change of the names of the contact persons of the enterprise, the static change of the entity is involved, the change state labeling can be in differentiated color labeling, and the color-based differentiated display of the names of the persons before and after the change is realized.
And S204, based on the entity characteristic words of the text content before the change and the text content after the change and the change state labels distributed to the entity characteristic words, formatting the text content before the change and the text content after the change respectively to obtain the formatted text content before the change and the text content after the change for displaying.
For example, when formatting the text content before the change and the text content after the change respectively, specifically, the change status label is loaded onto the entity feature words of the text content before the change and the text content after the change, for example, an increased or decreased ratio or the like is displayed on the investment amount (for example, displayed next to the investment amount), and for example, different display colors are set for the names of people before the change and the names of people after the change (for example, display colors of names of people are directly changed).
As shown in fig. 3, fig. 3 is a schematic flow chart of a text presentation method in the third embodiment of the present invention; the text presentation method may be specifically executed by the text presentation server in fig. 2, and the text presentation method includes:
s301, acquiring a text to be processed;
the text to be processed comprises pre-change text content and post-change text content corresponding to at least one change item.
In this embodiment, step S301 is similar to step S201.
S302, performing text segmentation on at least part of text content before change and text content after change corresponding to the changed items to obtain at least one text block;
in this embodiment, step S302 may specifically include: and based on the set semantics of the sample keywords and the punctuation marks, performing text segmentation on at least part of text contents before and after the change corresponding to the changed items to obtain at least one text block. Specifically, when text segmentation is performed, the obtained text block is a text line, so that visual display of data is facilitated.
Say, in the pre-alteration text content; contact person name: (ii) a; ........ ", the pre-alteration text content is actually displayed in a natural paragraph. Obtaining the following text after segmentation:
.......;
contact person name: AAAA;
.......。
after segmentation, the line is equivalent to a text block.
The segmentation of the text content before the change and the following change is similar to the segmentation result of the text content before the change, and is not described again here.
The text content after the change is obtained after being segmented according to the text:
.......;
contact person name: BBBB;
.......。
s303, identifying entity characteristic words in the text content before and after the change corresponding to at least part of the changed items;
specifically, in this embodiment, in step S303, the identifying entity feature words in the text content before the change and the text content after the change corresponding to at least part of the change items includes: and identifying entity characteristic words in each text block of the text content before the change and the text content after the change corresponding to at least part of the change items.
With reference to the above specific embodiment, for a text block of the changed text content: "contact person name: AAAA; "the extracted entity feature word is" AAAA ", for the text block of the changed text: "contact person name: BBBB ", and the extracted entity feature word is 'BBBB'.
In the embodiment, the entity characteristic words are identified based on the text blocks, so that the identification efficiency and accuracy are improved.
S304, distributing change state labels for the entity characteristic words of the text content before the change and the text content after the change;
as described above, with reference to the above embodiments, different display colors are assigned to the entity feature word "AAAA" and the entity feature word "BBBB", and the different display colors are labeled as the change status.
In this embodiment, if the entity feature word includes a natural person name, a legal person name, an amount of money, and a position, the change status label assigned to the natural person name and the legal person name includes at least one of an increase label, a quit label, and a display color, and the change status label assigned to the amount of money includes at least one of an increase proportion label and a decrease proportion label.
S305, based on the entity characteristic words of the text content before the change and the text content after the change and the change state labels distributed to the entity characteristic words, formatting the text content before the change and the text content after the change respectively to obtain the formatted text content before the change and the text content after the change for displaying.
In this embodiment, step S305 may specifically include: and based on the entity characteristic words of the changed text content and the change state labels distributed to the entity characteristic words, formatting the text content before the change and the text content after the change respectively to obtain the formatted text content before the change and the text content after the change for displaying.
In this embodiment, when step S305 is executed, add labels, quit labels, display colors, add labels with holding ratio, and add labels with holding ratio to the entity feature words of the text content before and after the change.
As shown in fig. 4, fig. 4 is a schematic flow chart of a text display method in the fourth embodiment of the present invention; the text presentation method may be specifically executed by the text presentation server in fig. 2, and the text presentation method includes:
s401, acquiring a text to be processed;
the text to be processed comprises pre-change text content and post-change text content corresponding to at least one change item.
S402, judging whether the changed items are valid data or not;
in this embodiment, since there is a preset field library for each change item, step S402 includes: matching the change item with a preset field library, and if the matching degree of the field of the change item and any field in the preset field library is greater than or equal to the set matching degree, judging that the change item is valid data; otherwise, if the matching degree between the field of the changed item and any field in the preset field library is smaller than the set matching degree, the changed item is considered as invalid data, wherein the condition that the matching degree between the field of the changed item and any field in the preset field library is smaller than the set matching degree includes that the field of the changed item is empty.
If yes, go to step S403.
If the changed item is invalid data, executing step S404;
s403, performing text segmentation on at least part of text content before change and text content after change corresponding to the changed items to obtain at least one text block;
in this embodiment, in step S403, at least some of the change items are change items having valid data.
Step S404, analyzing the text content before the change and the text content after the change to determine the effective data of the corresponding change item, and returning to the step S403;
in this embodiment, through the above steps S402 to S404, it is avoided that the subsequent steps cannot be executed due to the lack of changed items in the text content before the change and/or the text content after the change, or the error of the changed items, and further the formatting process cannot be realized.
Or, alternatively, in other embodiments, if the change item is invalid data, the following is performed on the change item corresponding to the invalid data: and determining a difference text between the content of the text before the change and the content of the text after the change according to the changed item corresponding to the invalid data, and highlighting and marking the difference text, without performing recognition processing of entity characteristic words on the content of the text before the change and the content of the text after the change of the changed item corresponding to the invalid data.
S405, identifying entity characteristic words in the text content before the change and the text content after the change corresponding to at least part of the changed items;
specifically, in this embodiment, in step S405, the identifying entity feature words in the text content before the change and the text content after the change corresponding to at least part of the change items includes: and identifying entity characteristic words in each text block of the text content before the change and the text content after the change corresponding to at least part of the change items.
In step S405, the text content before the change of the changed item and the text content after the change, which participate in the entity feature word recognition, are valid data.
S406, distributing change state labels for the entity characteristic words of the text content before the change and the text content after the change;
s407, based on the entity characteristic words of the text content before the change and the text content after the change and the change state labels distributed to the entity characteristic words, formatting the text content before the change and the text content after the change respectively to obtain the formatted text content before the change and the text content after the change for displaying.
In this embodiment, the steps S405-407 can refer to the embodiment of FIG. 4.
On the basis of any of the foregoing embodiments, in another embodiment, identifying entity feature words in pre-change text content and post-change text content corresponding to at least part of the change items, and then may include: respectively storing the entity characteristic words identified from the text content before the change and the text content after the change into a pre-established entity characteristic word list;
correspondingly, the allocating a change state label to the entity feature words of the text content before the change and the text content after the change includes: and distributing change state labels for the entity characteristic words of the text content before the change and the text content after the change based on the corresponding entity characteristic word list.
The change state labeling distribution of the entity feature words is realized based on the entity feature word list, the independent management of the entity feature words and the text content before and after the change can be realized, and the data processing efficiency is improved.
On the basis of any one of the above embodiments, in another embodiment, after the obtaining the text to be processed, the method may further include: desensitizing the text to be processed to obtain a desensitized text to be processed, wherein the desensitized text to be processed comprises desensitized text content before change and desensitized text content after change corresponding to the at least one change item;
correspondingly, for at least part of the text content before and after the change corresponding to the changed item, identifying entity feature words therein as: and identifying entity characteristic words in the desensitized pre-change text content and the desensitized post-change text content corresponding to at least part of the change items.
Desensitization treatment is carried out on the text to be treated to obtain a desensitized text to be treated, the desensitized text to be treated comprises desensitized text content before change and desensitized text content after change, and identification of entity characteristic words is further carried out, so that the treated data volume is further reduced, the data security is increased, and the privacy of sensitive data is protected.
Illustratively, the desensitizing treatment of the text to be processed to obtain a desensitized text to be processed includes: and performing desensitization treatment on the text to be processed based on the constructed second regular expression to obtain a desensitized text to be processed. The construction of the second regular expression is similar to the construction principle of the first regular expression, and is not described herein again.
Optionally, on the basis of any of the foregoing embodiments, in another embodiment, after the formatting process, the method may further include: and performing inner chain processing on the formatted text content before the change and the text content after the change so as to skip the access content during display. Specifically, the inlining process can skip the access content when displaying the natural person name and the legal person name in the text content, and know the investment, the job hold and the like of the natural person and the legal person.
Optionally, on the basis of any of the foregoing embodiments, in another embodiment, after the formatting process, the method further includes: and configuring display format labels for the formatted text content before the change and the text content after the change so as to adjust the display formats of the formatted text content before the change and the text content after the change. The display format label includes at least one of a label for adjusting a line interval and a paragraph interval, for example.
As shown in fig. 5, fig. 5 is a schematic structural diagram of a text processing apparatus according to a fifth embodiment of the present invention; the text display apparatus may be specifically configured on the text display server in fig. 2, and the text display apparatus includes:
a text obtaining unit 501, configured to obtain a to-be-processed text, where the to-be-processed text includes pre-change text content and post-change text content corresponding to at least one change item;
an entity identification unit 502, configured to identify entity feature words in pre-change text content and post-change text content corresponding to at least some of the change items;
a label assigning unit 503, configured to assign a change state label to the entity feature word of the changed text content and the changed text content;
a formatting unit 504, configured to perform formatting processing on the text content before the change and the text content after the change respectively based on the entity feature words of the text content before the change and the text content after the change and the change status labels assigned to the entity feature words, so as to obtain a formatted text content before the change and a formatted text content after the change for displaying.
Optionally, in an embodiment, the entity identifying unit 502 is specifically configured to perform text processing on at least part of text content before modification and text content after modification corresponding to at least part of the modification items based on a set semantic regularization analysis model, so as to identify entity feature words therein.
Optionally, in an embodiment, the entity identifying unit 502 is specifically configured to perform character matching on at least part of text content before modification and text content after modification, which correspond to at least part of the modification items, based on the constructed first regular expression, so as to identify the entity feature words therein.
Optionally, in an embodiment, the apparatus further includes: the text segmentation unit is used for performing text segmentation on at least part of text content before change and text content after change corresponding to the changed item to obtain at least one text block before entity feature words in the text content before change and the text content after change corresponding to the changed item are identified;
the entity identifying unit 502 is specifically configured to identify an entity feature word in each text block of the text content before the change and the text content after the change, which correspond to at least part of the change items.
Optionally, in an embodiment, the apparatus further includes: the data validity judging unit is used for judging whether the change item is valid data or not after the text to be processed is acquired; if yes, the entity identifying unit 502 identifies entity feature words in each text block of the text content before the change and the text content after the change corresponding to at least part of the change items.
Optionally, in an embodiment, the apparatus further includes: and a change category analysis unit for analyzing the text content before the change and the text content after the change to determine the valid data of the corresponding change item when the change item is the invalid data.
Optionally, in an embodiment, the entity identifying unit 502 is further configured to store the entity feature words identified from the pre-change text content and the post-change text content into a pre-established entity feature word list respectively;
the label assigning unit 503 is specifically configured to assign a change state label to the entity feature words of the text content before the change and the text content after the change based on the corresponding entity feature word list.
Optionally, in an embodiment, the apparatus further includes: a desensitization unit, configured to perform desensitization processing on the to-be-processed text after the to-be-processed text is obtained, so as to obtain a desensitized to-be-processed text, where the desensitized to-be-processed text includes desensitized pre-change text content and desensitized post-change text content corresponding to the at least one change item;
correspondingly, the entity identifying unit 502 specifically includes: and identifying entity characteristic words in the desensitized pre-change text content and the desensitized post-change text content corresponding to at least part of the change items.
Optionally, in an embodiment, the desensitization unit is specifically configured to perform desensitization processing on the to-be-processed text based on the constructed second regular expression, so as to obtain a desensitized to-be-processed text.
Optionally, in an embodiment, the apparatus further includes an in-linking unit, configured to perform in-linking processing on the formatted text content before the change and the text content after the change, so that a jump to access the content can be performed when displaying.
Optionally, in an embodiment, the apparatus further includes: and the display configuration unit is used for configuring display format labels for the formatted text content before the change and the text content after the change so as to adjust the display formats of the formatted text content before the change and the text content after the change.
Optionally, in an embodiment of the present invention, the apparatus further includes: a judging unit, configured to judge whether the changed item is valid data, and if the changed item is valid data, execute a step of identifying an entity feature word in a text content before change and a text content after change corresponding to the changed item; otherwise, if the changed item is invalid data, determining a difference text between the text content before the change and the text content after the change according to the changed item corresponding to the invalid data, and highlighting and marking the difference text.
Optionally, in an embodiment of the present invention, the determining unit is specifically configured to match the change item with a preset field library, and if a matching degree between a field of the change item and any field in the preset field library is greater than or equal to a set matching degree, determine that the change item is valid data; otherwise, if the matching degree between the field of the changed item and any field in the preset field library is smaller than the set matching degree, the changed item is considered as invalid data, wherein the condition that the matching degree between the field of the changed item and any field in the preset field library is smaller than the set matching degree includes that the field of the changed item is empty.
The chaining-in unit and the display configuration unit can be structures independent of the formatting processing unit.
The embodiment of the invention also provides a computer storage medium, wherein a computer executable program is stored on the computer storage medium, and the computer executable program is operated to implement the text display method in any embodiment of the invention.
As shown in fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to a sixth embodiment of the present invention; the electronic device comprises a memory 601 and a processor 602, wherein the memory is used for storing a computer executable program, and the processor is used for running the computer executable program to implement the text presentation method according to any embodiment of the invention.
The text presentation server may be the electronic device shown in fig. 6.
The above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (14)
1. A text presentation method, comprising:
acquiring a text to be processed, wherein the text to be processed comprises pre-change text content and post-change text content corresponding to at least one change item;
identifying entity characteristic words in the text content before and after the change corresponding to at least part of the change items;
distributing change state labels for the entity characteristic words of the text content before the change and the text content after the change; and
and based on the entity characteristic words of the changed text content and the change state labels distributed to the entity characteristic words, formatting the text content before the change and the text content after the change respectively to obtain the formatted text content before the change and the text content after the change for displaying.
2. The method of claim 1, wherein identifying entity feature words in pre-alteration text content and post-alteration text content corresponding to at least some of the alteration items comprises: and performing text processing on at least part of the text content before the change and the text content after the change corresponding to at least part of the changed items based on a set semantic regularization analysis model to identify entity feature words in the text content.
3. The method of claim 1, wherein identifying entity feature words in pre-alteration text content and post-alteration text content corresponding to at least some of the alteration items comprises: and performing character matching on at least part of the text content before the change and the text content after the change corresponding to at least part of the changed items based on the constructed first regular expression so as to identify entity characteristic words in the text content.
4. The method of claim 1, wherein the identifying entity feature words in the pre-alteration text content and the post-alteration text content corresponding to at least part of the alteration items comprises: performing text segmentation on at least part of text content before change and text content after change corresponding to the changed items to obtain at least one text block;
the identifying of the entity feature words in the text content before the change and the text content after the change corresponding to at least part of the changed items specifically includes: and identifying entity characteristic words in each text block of the text content before the change and the text content after the change corresponding to at least part of the change items.
5. The method according to any one of claims 1-4, wherein the identifying entity feature words in the pre-alteration text content and the post-alteration text content corresponding to at least part of the alteration items comprises: respectively storing the entity characteristic words identified from the text content before the change and the text content after the change into a pre-established entity characteristic word list;
the allocating of the change state labels to the entity feature words of the text content before the change and the text content after the change comprises: and distributing change state labels for the entity characteristic words of the text content before the change and the text content after the change based on the corresponding entity characteristic word list.
6. The method according to any one of claims 1-5, wherein after obtaining the text to be processed, the method comprises: desensitizing the text to be processed to obtain a desensitized text to be processed, wherein the desensitized text to be processed comprises desensitized text content before change and desensitized text content after change corresponding to the at least one change item;
correspondingly, for at least part of the text content before and after the change corresponding to the changed item, identifying entity feature words therein as: and identifying entity characteristic words in the desensitized pre-change text content and the desensitized post-change text content corresponding to at least part of the change items.
7. The method according to claim 6, wherein the desensitizing treatment of the text to be processed to obtain a desensitized text to be processed comprises: and performing desensitization treatment on the text to be processed based on the constructed second regular expression to obtain a desensitized text to be processed.
8. The method according to any one of claims 1-7, further comprising: and performing inner chain processing on the formatted text content before the change and the text content after the change so as to skip the access content during display.
9. The method according to any one of claims 1-8, further comprising: and configuring display format labels for the formatted text content before the change and the text content after the change so as to adjust the display formats of the formatted text content before the change and the text content after the change.
10. The method according to any one of claims 1-9, wherein after obtaining the text to be processed, the method further comprises: judging whether the changed item is valid data or not, and if the changed item is valid data, identifying entity characteristic words in the text content before and after the change corresponding to the changed item; otherwise, if the changed item is invalid data, determining a difference text between the text content before the change and the text content after the change according to the changed item corresponding to the invalid data, and highlighting and marking the difference text; or analyzing the text content before the change and the text content after the change to determine the effective data of the corresponding change item, and returning to execute the step of identifying the entity characteristic words in the text content before the change and the text content after the change corresponding to the change item.
11. The method of claim 10, wherein said determining whether the changed item is valid data comprises: the change item is matched with a preset field library, and if the matching degree of the field of the change item and any field in the preset field library is greater than or equal to the set matching degree, the change item is judged to be valid data; otherwise, if the matching degree between the field of the changed item and any field in the preset field library is smaller than the set matching degree, the changed item is considered as invalid data, wherein the condition that the matching degree between the field of the changed item and any field in the preset field library is smaller than the set matching degree includes that the field of the changed item is empty.
12. A text presentation device, comprising:
the text acquisition unit is used for acquiring a text to be processed, wherein the text to be processed comprises pre-change text content and post-change text content corresponding to at least one change item;
the entity identification unit is used for identifying entity characteristic words in the text content before the change and the text content after the change corresponding to at least part of the change items;
a label distribution unit, configured to distribute a change state label to the entity feature words of the text content before the change and the text content after the change; and
and the formatting unit is used for respectively formatting the text content before the change and the text content after the change based on the entity characteristic words of the text content before the change and the text content after the change and the change state labels distributed to the entity characteristic words to obtain the formatted text content before the change and the text content after the change for displaying.
13. A computer storage medium having a computer-executable program stored thereon, the computer-executable program being operative to implement the text presentation method of any one of claims 1-11.
14. An electronic device, comprising a memory for storing a computer-executable program and a processor for executing the computer-executable program to implement the text presentation method of any one of claims 1-11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111198145.3A CN113901834A (en) | 2021-10-14 | 2021-10-14 | Text display method and device, computer storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111198145.3A CN113901834A (en) | 2021-10-14 | 2021-10-14 | Text display method and device, computer storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113901834A true CN113901834A (en) | 2022-01-07 |
Family
ID=79192085
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111198145.3A Pending CN113901834A (en) | 2021-10-14 | 2021-10-14 | Text display method and device, computer storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113901834A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005250690A (en) * | 2004-03-02 | 2005-09-15 | Ntt Electornics Corp | Information display system, information display device and identification information allocation device |
CN107644090A (en) * | 2017-09-26 | 2018-01-30 | 北京金堤科技有限公司 | A kind of modification information processing method and processing device |
CN109388805A (en) * | 2018-10-23 | 2019-02-26 | 重庆誉存大数据科技有限公司 | A kind of industrial and commercial analysis on altered project method extracted based on entity |
CN112330459A (en) * | 2020-10-22 | 2021-02-05 | 北京华彬立成科技有限公司 | Method and device for mining enterprise investment and financing event based on business data |
-
2021
- 2021-10-14 CN CN202111198145.3A patent/CN113901834A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005250690A (en) * | 2004-03-02 | 2005-09-15 | Ntt Electornics Corp | Information display system, information display device and identification information allocation device |
CN107644090A (en) * | 2017-09-26 | 2018-01-30 | 北京金堤科技有限公司 | A kind of modification information processing method and processing device |
CN109388805A (en) * | 2018-10-23 | 2019-02-26 | 重庆誉存大数据科技有限公司 | A kind of industrial and commercial analysis on altered project method extracted based on entity |
CN112330459A (en) * | 2020-10-22 | 2021-02-05 | 北京华彬立成科技有限公司 | Method and device for mining enterprise investment and financing event based on business data |
Non-Patent Citations (1)
Title |
---|
上海市软件行业协会组: "2019年软件工程论文专集", 31 December 2019, 上海科学技术出版社, pages: 79 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10095780B2 (en) | Automatically mining patterns for rule based data standardization systems | |
US10162823B2 (en) | Populating user contact entries | |
US20130066818A1 (en) | Automatic Crowd Sourcing for Machine Learning in Information Extraction | |
US20220121668A1 (en) | Method for recommending document, electronic device and storage medium | |
US11487801B2 (en) | Dynamic data visualization from factual statements in text | |
CN107741972A (en) | A kind of searching method of picture, terminal device and storage medium | |
WO2020023156A1 (en) | Language agnostic data insight handling for user application data | |
CN114610845A (en) | Multisystem-based intelligent question answering method, device and equipment | |
CN111259207A (en) | Short message identification method, device and equipment | |
CN113626576A (en) | Method and device for extracting relational characteristics in remote supervision, terminal and storage medium | |
US8620918B1 (en) | Contextual text interpretation | |
CN113127621A (en) | Dialogue module pushing method, device, equipment and storage medium | |
US9898467B1 (en) | System for data normalization | |
CN113434542B (en) | Data relationship identification method and device, electronic equipment and storage medium | |
CN113869789A (en) | Risk monitoring method and device, computer equipment and storage medium | |
CN117077668A (en) | Risk image display method, apparatus, computer device, and readable storage medium | |
CN114969385B (en) | Knowledge graph optimization method and device based on document attribute assignment entity weight | |
CN116450723A (en) | Data extraction method, device, computer equipment and storage medium | |
CN113792232B (en) | Page feature calculation method, page feature calculation device, electronic equipment, page feature calculation medium and page feature calculation program product | |
CN113901834A (en) | Text display method and device, computer storage medium and electronic equipment | |
CN115099680A (en) | Risk management method, device, equipment and storage medium | |
CN113761906B (en) | Method, apparatus, device and computer readable medium for parsing document | |
CN114297380A (en) | Data processing method, device, equipment and storage medium | |
CN114329164A (en) | Method, apparatus, device, medium and product for processing data | |
CN113157964A (en) | Method and device for searching data set through voice and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20230801 Address after: Room 404-405, 504, Building B-17-1, Big data Industrial Park, Kecheng Street, Yannan High tech Zone, Yancheng, Jiangsu Province, 224000 Applicant after: Yancheng Tianyanchawei Technology Co.,Ltd. Address before: 224000 room 501-503, building b-17-1, Xuehai road big data Industrial Park, Kecheng street, Yannan high tech Zone, Yancheng City, Jiangsu Province (CNK) Applicant before: Yancheng Jindi Technology Co.,Ltd. |
|
TA01 | Transfer of patent application right |