CN115759035A - Text processing method and device, electronic equipment and computer readable storage medium - Google Patents

Text processing method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN115759035A
CN115759035A CN202211583805.4A CN202211583805A CN115759035A CN 115759035 A CN115759035 A CN 115759035A CN 202211583805 A CN202211583805 A CN 202211583805A CN 115759035 A CN115759035 A CN 115759035A
Authority
CN
China
Prior art keywords
text
processed
preset
words
sentences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211583805.4A
Other languages
Chinese (zh)
Inventor
周相进
肖雪松
严骊
韩威俊
罗桂林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Minto Technology Co ltd
Original Assignee
Chengdu Minto Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Minto Technology Co ltd filed Critical Chengdu Minto Technology Co ltd
Priority to CN202211583805.4A priority Critical patent/CN115759035A/en
Publication of CN115759035A publication Critical patent/CN115759035A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The application provides a text processing method, a text processing device, electronic equipment and a computer readable storage medium. The method comprises the following steps: acquiring a text to be processed, wherein the text to be processed comprises project related information; and processing the text to be processed through a preset extraction model to obtain words and sentences representing preset intentions, wherein the preset intentions comprise at least one of target, plan, execution, inspection and processing. By the mode, the user can directly check the extracted words and sentences. And because the extracted words and sentences represent at least one of targets, plans, executes, checks and processes, namely the words and sentences can represent core ideas of different stages of items corresponding to the text to be processed, the user can more accurately understand the meaning to be expressed by the text to be processed by checking the words and sentences, so that each user can better and more accurately execute the operation corresponding to the text to be processed, and better operation results can be generated.

Description

Text processing method and device, electronic equipment and computer readable storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a text processing method and apparatus, an electronic device, and a computer-readable storage medium.
Background
Currently, for texts related to projects, production and the like, users are required to check and understand the texts by themselves and then implement the texts to execute the texts. However, the understanding of the same text by each user may not be the same. Therefore, the user's understanding of the same text may be deviated, and thus, when each user performs an operation corresponding to the text, the performance may be deviated, thereby producing a bad result.
Disclosure of Invention
An object of the embodiments of the present application is to provide a text processing method, an apparatus, an electronic device, and a computer-readable storage medium, so that each user can better understand a text and perform a corresponding operation based on the text, thereby generating a better operation result.
The invention is realized by the following steps:
in a first aspect, an embodiment of the present application provides a text processing method, including: acquiring a text to be processed, wherein the text to be processed comprises project related information; and processing the text to be processed through a preset extraction model to obtain words and sentences representing preset intentions, wherein the preset intentions comprise at least one of target, plan, execution, inspection and processing.
In the embodiment of the application, after the text to be processed is obtained, the text to be processed is processed through the preset extraction model to obtain words and sentences representing the preset intentions, so that a user can directly check the extracted words and sentences. Moreover, at least one of the extracted words and sentences represents a target, plans, executes, checks and processes, namely the words and sentences can represent core ideas of different stages of items corresponding to the text to be processed, so that the user can more accurately understand the meaning of the text to be processed by looking over the words and sentences, and each user can better and more accurately execute the operation corresponding to the text to be processed, and better operation results can be generated.
With reference to the technical solution provided by the first aspect, in some possible implementation manners, after the to-be-processed text is processed by using a preset extraction model to obtain a word or a sentence representing a preset intention, the method further includes: and filling the words and sentences into a preset table, and displaying the preset table.
In the embodiment of the application, the extracted words and sentences are filled in the preset table, and the preset table is displayed, so that the user can view the extracted words and sentences more visually, and can view the information corresponding to different stages of the project corresponding to the text to be processed more visually, and the user can understand the project conveniently.
With reference to the technical solution provided by the first aspect, in some possible implementation manners, after the to-be-processed text is processed through a preset extraction model to obtain a word or a sentence representing a preset intention, the method further includes: and generating a structured document based on the words and the sentences, wherein the structured document is a file formed according to a frame corresponding to the preset intention.
In the embodiment of the application, the structured document is generated based on the words and the sentences, and the structured document is a file formed by a frame corresponding to the preset intention, so that a user can more visually check the extracted words and sentences and can more visually see information corresponding to different stages of an item corresponding to the text to be processed, and the user can conveniently understand the item. In addition, the structured document can be generated, so that the user can conveniently and directly use the structured document.
With reference to the technical solution provided by the first aspect, in some possible implementation manners, generating a structured document based on the words and phrases includes: generating a target data packet based on the words and the sentences; transmitting the target data packet to a target management system; and generating a structured document based on the target data packet and the target management system.
In the embodiment of the application, the target data packet is generated based on the words and sentences, the target data is transmitted to the target management system, and the structured document is generated based on the target data packet and the target management system, so that different users can view the structured document from the target management system, and the sharing of the structured document can be realized. In addition, the structured document can be conveniently checked by the user through the method.
With reference to the technical solution provided by the first aspect, in some possible implementation manners, the text to be processed is an agricultural planting technology text or a task report text.
In the examples of the present application, the agro-planting technical text often includes knowledge about planting a certain type of plant, such as: different planting operations, targets at different stages, and the like are required for the plants at different stages. These knowledge are usually complex, so users often understand the knowledge differently and easily miss some points when directly reading the relevant knowledge. Then, if the text to be processed is an agricultural planting technology text, words and phrases representing at least one of targets, plans, executions, checks and processes in the agricultural planting technology text may be extracted, for example: if the agricultural planting technology text comprises the target, plan and execution process to be achieved by planting the potatoes, the target, plan and execution can be directly extracted from the agricultural planting technology text, so that the user can conveniently check and understand the target, plan and execution process, the user can conveniently execute the target, plan and execution process, and a good planting effect is achieved. In addition, the task-specific message book often includes the related requirements of a certain task, such as: what the task is targeted, how it should be implemented (i.e., specifically executed), and so on. These related requirements are usually more, so each user often understands differently when directly reading the related requirements, and easily omits some points. Then, if the text to be processed is the task report text, words and sentences representing at least one of a target, a plan, an execution, an inspection and a processing in the task report text can be extracted, so that the user can conveniently check and understand the text, the user can conveniently execute the text, and a better task effect is achieved.
With reference to the technical solution provided by the first aspect, in some possible implementation manners, the extraction model includes a BiLSTM layer and a CRF layer, the BiLSTM layer is connected to the CRF layer, the BiLSTM layer includes a word embedding layer, a bidirectional LSTM layer and a full connection layer, which are sequentially connected, and the full connection layer is connected to the CRF layer; processing the text to be processed through a preset extraction model to obtain words and sentences representing preset intentions, wherein the steps of: processing the text to be processed through the BilSTM layer to obtain various category scores corresponding to words and sentences in the text to be processed; and processing each category score corresponding to each word and sentence through the CRF layer to obtain the word and sentence.
In the embodiment of the application, the extraction model is set to have the structure, so that the extraction model can accurately extract words and sentences representing the preset intentions from the text to be processed, a user can more accurately understand items corresponding to the text to be processed based on the extracted words and sentences, and further more accurately execute operations corresponding to the text to be processed.
With reference to the technical solution provided by the first aspect, in some possible implementation manners, the loss function of the CRF layer is formed by a score of a real path and a total score of all paths, where the score of the real path is the highest score in all paths.
In a second aspect, an embodiment of the present application provides a text processing apparatus, including: the acquisition module is used for acquiring a text to be processed, wherein the text to be processed comprises project related information; and the processing module is used for processing the text to be processed through a preset extraction model to obtain words and sentences representing preset intentions, and the preset intentions comprise at least one of targets, plans, executions, checks and processes.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory, the processor and the memory connected; the memory is used for storing programs; the processor is configured to invoke a program stored in the memory, and to perform a method as provided in the foregoing first aspect embodiment and/or in conjunction with some possible implementations of the foregoing first aspect embodiment.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, performs the method as set forth in the above first aspect embodiment and/or in combination with some possible implementations of the above first aspect embodiment.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a flowchart illustrating steps of a method for constructing an extraction model according to an embodiment of the present disclosure.
Fig. 2 is a flowchart illustrating steps of a text processing method according to an embodiment of the present application.
Fig. 3 is a schematic diagram of a structured document provided in an embodiment of the present application.
Fig. 4 is a block diagram of a text processing apparatus according to an embodiment of the present application.
Fig. 5 is a block diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
Since the user's understanding of the same text may be different, the user's understanding of the same text may be deviated, and thus, when each user performs an operation corresponding to the text, the user may be deviated, thereby generating a bad result. The present inventors have studied and found that the following examples are proposed to solve the above problems.
Referring to fig. 1, an embodiment of the present application provides a method for constructing an extraction model, which can be applied to various electronic devices, for example: calculators, tablet computers, and the like. The following describes a specific procedure and steps of a training method for extracting a model.
The method for constructing the extraction model provided in the embodiment of the present application is not limited to the order shown below.
Step S101: and constructing an initial extraction model.
The initial extraction model comprises a BilSTM layer and a CRF layer which are connected, wherein the BilSTM layer comprises a word embedding layer, a bidirectional LSTM layer and a full connection layer which are sequentially connected, and the full connection layer and the CRF layer are connected.
Step S102: a training set is obtained that includes text that includes project-related information and has been subjected to a preset intent tag.
The above related information of the project may be information of a process and a purpose of doing a certain event or project, such as: the method comprises the following steps of aiming at a text of a crawfish breeding technology, wherein project related information in the text can be a target for crawfish breeding, a plan, a specific execution operation, an inspection aiming at the operation and a treatment measure of the result of the inspection; for another example: for project management for building a building, the project-related information in this text may be a target for building a building, a plan, a specific execution operation, an inspection for the operation, and a measure for processing the result of the inspection.
The preset intent (GPDCA) includes at least one of a target (Goal), a Plan (Plan), an execution (Do), a Check (Check), and an Action. Wherein, the target indicates what the target corresponding to the item is, such as: and pest and disease damage management is carried out on crayfish breeding. Plans represent corresponding measures to achieve the goal, such as: the plan for pest management for crayfish breeding is to implement a method for preventing pests and a corresponding treatment method after the occurrence of pests. Execution represents specific execution operations for the plan, such as: aiming at executing the method for preventing the insect pests, the specific measures are as follows: the pond water is replaced, the residual bait and the putrefaction in the pond are cleared away in time, and the crayfish pond is disinfected regularly. The check represents a check on the target, operations, such as: the disease and pest management is carried out on the crayfish breeding, and the examination can be as follows: the breeding quality of the crayfishes is patrolled every day. The processing means processing for the inspection result, such as: the management of plant diseases and insect pests is carried out aiming at the crayfish breeding, and after the breeding quality of the crayfish is patrolled every day, the adjustment measures can be taken according to the inspection suggestions.
It should be noted that each of the preset intents may also be subdivided, for example: the targets can comprise primary targets, secondary targets and other segmentation intents, and the execution can comprise task names, task requirements and other segmentation intents.
It is to be understood that acquiring the training set may specifically include: acquiring a plurality of initial texts containing project related information; and carrying out preset intention marking on the initial text, wherein the marked initial text forms a training set.
Wherein, the initial text can be agricultural planting technology text, such as: tomato planting technology text, rice planting technology text, wheat planting technology text and the like. The text can also be a task report file, such as: the text of building a certain building, the text of planning a certain meeting for a certain area, etc. And an automatic capturing mode can be adopted to acquire the initial text, wherein the automatic capturing mode is to capture the precipitated service data in the existing service system, and through the mode, the integrity of the acquired initial text can be ensured, and the efficiency of acquiring the initial text can be improved. In addition, in the case of being unable to automatically capture, the initial text may also be obtained manually, such as: in a network, various types of unstructured data text containing information about items are collected.
When the initial text is labeled with the preset icon, the initial text can be labeled by various tools or platforms in the prior art, such as: and using the constructed marking platform, after selecting various entity categories needing to be marked, inputting each initial text into the marking platform so as to mark each initial text through the marking platform. It should be noted that each entity category corresponds to one of the preset intents, or each entity category corresponds to a subdivision intention of one of the preset intents.
In addition, marking may also be performed manually.
It should be further noted that the marked data may be in a JSON format, and includes the marked words and phrases, the tags corresponding to the marked words and phrases, and the specific positions of the marked words and phrases appearing in the initial text, such as: one word is labeled "ciliate disease prevention" which is correspondingly labeled "target" and the corresponding format may be: the "target": { "ciliate disease prevention": [11,16] }, wherein [11,16] denotes a specific position in the original text where the word "ciliate prevention" appears, i.e., "ciliate prevention" is at the position of the 11 th to 16 th words in the original text.
It is understood that the construction of the initial extraction model (step S101) and the acquisition of the training set (step S102) may be performed simultaneously or sequentially. Specifically, an initial extraction model can be constructed and a training set can be obtained at the same time; or an initial extraction model can be constructed first, and a training set is obtained after the initial extraction model is constructed; the training set can also be obtained first, and after the training set is obtained, an initial extraction model is constructed.
After the extraction model is constructed and the training set is obtained, the method may proceed to step S103.
Step S103: and training the initial extraction model by using a training set to obtain an extraction model.
Specifically, the training set may be randomly divided into a training data set and a verification data set, and the ratio of the two numbers may be 4. Inputting a training data set into an initial extraction model, training according to parameters such as a preset training round, a preset learning rate and the like, stopping training when the training round is reached, and taking the trained model as the extraction model; or after each training, calculating a loss value of the model, stopping the training when the loss value reaches a preset range, wherein the trained model is an extraction model, a loss function of the CRF layer is composed of a fraction of a real path and a total fraction of all paths, and the fraction of the real path is the highest fraction of all paths.
After the extraction model is obtained according to the above construction method, the extraction model may be used to process a text to be processed including the item-related information, so as to obtain a word or a sentence representing a preset intention.
The following describes a specific flow and steps of a text processing method with reference to fig. 2. The embodiment of the application provides a text processing method which can be applied to various electronic devices.
It should be noted that the text processing method provided in the embodiment of the present application is not limited to the order shown in fig. 2 and below.
Step S201: and acquiring a text to be processed.
The text to be processed is a text which needs to be read by a user, and the text to be processed comprises project related information. It should be noted that, for the description of the related information of the item, reference may be made to the description of the related information of the item in the foregoing embodiment, so as to avoid repeated description, and the description is not repeated here.
In addition, the text to be processed may be agricultural planting technology text such as: tomato planting technology text, rice planting technology text, wheat planting technology text and the like. Since the agricultural planting technology text often includes the related knowledge of planting a certain type of plants, such as: different planting operations, targets at different stages, and the like are required for the plants at different stages. These knowledge are usually complex, so that users can read the related knowledge directly, the understanding is different, and some points are easy to miss. Then, if the text to be processed is an agricultural planting technology text, words and phrases representing at least one of targets, plans, executions, checks and processes in the agricultural planting technology text can be subsequently extracted, such as: if the agricultural planting technology text comprises the target, plan and execution process to be achieved for planting the potatoes, the target, plan and execution can be directly extracted from the agricultural planting technology text, so that a user can conveniently check and understand the target, plan and execution process, the execution of the user is facilitated, and a good planting effect is achieved.
The text to be processed may also be a task report file, such as: the text of building a certain building, the text of planning a certain meeting for a certain area, etc. Task specific messages often include requirements related to a certain task, such as: what the task is targeted, how it should be implemented (i.e., specifically executed), etc. These related requirements are usually more, so users often understand differently and easily miss some points when directly reading the related requirements. Then, if the text to be processed is the task report text, words and sentences of at least one of the representation target, the plan, the execution, the check and the processing in the task report text can be extracted subsequently, so that the user can conveniently check and understand the words and sentences, the user can conveniently execute the words and sentences, and a better task effect is achieved.
After the text to be processed is acquired, the method may proceed to step S202.
Step S202: and processing the text to be processed through a preset extraction model to obtain words and sentences representing preset intentions.
Wherein the preset intent includes at least one of a goal, a plan, an execution, an examination, and a treatment. It should be noted that, for the description of the preset diagram, reference may be made to the description of the preset diagram in the foregoing embodiment to avoid repeated description, and the description is not repeated here.
The words and sentences are words or sentences, and the words and sentences can be data in JSON format, such as: 1, label, target: { "Soft Shell disease prevention": 21,25 } }, { "id":2, "label": { "plan": the method comprises the steps of { 'prevention method' [30,35] } } } and the like, wherein id refers to the sequence number of an extracted word and sentence, label of the word and sentence is behind label, namely the word and sentence belongs to which one of target, plan, execution, check and processing, the label is followed by the word and sentence, and the word and sentence is followed by the concrete position of the word and sentence in the text.
In the embodiment of the application, after the text to be processed is obtained, the text to be processed is processed through the preset extraction model to obtain words and sentences representing preset intentions, so that a user can directly check the extracted words and sentences. Moreover, at least one of the extracted words and sentences represents a target, plans, executes, checks and processes, namely the words and sentences can represent core ideas of different stages of items corresponding to the text to be processed, so that the user can more accurately understand the meaning of the text to be processed by looking over the words and sentences, and each user can better and more accurately execute the operation corresponding to the text to be processed, and better operation results can be generated.
Furthermore, the extraction model comprises a BilSTM layer and a CRF layer, wherein the BilSTM layer is connected with the CRF layer, the BilSTM layer comprises a word embedding layer, a bidirectional LSTM layer and a full connection layer which are sequentially connected, and the full connection layer is connected with the CRF layer. Correspondingly, the processing the text to be processed through the preset extraction model to obtain the words and sentences representing the preset intention may specifically include: processing the text to be processed through a BilSTM layer to obtain each category score corresponding to each word and sentence in the text to be processed; and processing each category score corresponding to each word and sentence through a CRF layer to obtain the word and sentence.
In the embodiment of the application, the extraction model is set to have the structure, so that the extraction model can accurately extract words and sentences representing the preset intentions from the text to be processed, a user can more accurately understand items corresponding to the text to be processed based on the extracted words and sentences, and further more accurately execute operations corresponding to the text to be processed.
It should be noted that the word embedding layer is used for mapping each word Of the input text to be processed into a word vector, and may be implemented by using a CBOW (Continuous Bag Of Words), skip-gram or GloVe model. The specific principle of the CBOW, skip-gram or GloVe model can refer to the principle in the prior art, and is not described herein for avoiding repeated description. The bidirectional LSTM layer is a chain structure of a repeated cyclic Neural Network module formed by a plurality of RNNs (Recurrent Neural networks), the output result of the word embedding layer enters the bidirectional LSTM layer, and the score probability of each word corresponding to each label can be output through the information of the learning context. And each neuron in the full connection layer is in full connection with all neurons in the previous layer, the output value of the last full connection layer is transmitted to a Sigmoid function for classification, and the classification result is input to the CRF layer. Because token-level adopts multi-label (label) classification, a sigmoid function is used for multi-label problems, a plurality of labels are selected as correct answers, and arbitrary values are normalized to be between [0-1], so that the problem of poor simple association precision among different probabilities can be better solved.
Further, the loss function of the CRF layer is composed of the score of the real path, which is the highest score among all paths, and the total score of all paths.
Optionally, after the text to be processed is processed through the preset extraction model to obtain words and sentences representing the preset intention, the text processing method may further include: and filling words and sentences into a preset table, and displaying the preset table.
And after words and sentences representing the preset intentions are obtained, filling each word and sentence into corresponding items of a preset form respectively according to the label of each word and sentence.
For example, please refer to table 1, where table 1 is a preset table provided in the embodiments of the present application.
Target Plan for Execute Examination of Treatment of
When the preset table is a table 1, and a certain word is extracted as { "id":1, "label": target ": { "Soft Shell prevention" [21,25] } } } according to the target tag, the "Soft Shell prevention" can be filled under the target item in Table 1. Correspondingly, when the extracted label of the word is other labels, the word can be directly filled in the corresponding label in the table 1.
It can be understood that, when no word corresponding to a certain tag exists in all the extracted words, the tag column in the preset table may be empty, that is, filling is not performed.
When the extraction model can output a segmentation intention of a certain intention, the preset table may be changed accordingly, that is, the segmentation intention item corresponding to the extraction model is added under the corresponding intention, and then the word or phrase corresponding to the segmentation intention item is filled in the position corresponding to the item.
In the embodiment of the application, the extracted words and sentences are filled in the preset table, and the preset table is displayed, so that the user can view the extracted words and sentences more visually, and can view the information corresponding to different stages of the project corresponding to the text to be processed more visually, and the user can understand the project conveniently.
As another optional implementation manner, after the text to be processed is processed through the preset extraction model to obtain words and sentences representing the preset intention, the text processing method may further include: and generating a structured document based on the words and the sentences, wherein the structured document is a file formed by frames corresponding to preset intentions.
Referring to FIG. 3, the structured document may be a word-formatted document having different stages of a project (i.e., target, plan, execute, check, and process). It is understood that the goal in fig. 3 is managed as a goal in the preset intention, and the task is managed as a plan in the preset intention.
In the embodiment of the application, the structured document is generated based on words and sentences, and because the structured document is a file formed by a frame corresponding to the preset intention, a user can more visually check the extracted words and sentences and more visually see information corresponding to different stages of the project corresponding to the text to be processed, so that the user can conveniently understand the project. In addition, the structured document can be generated, so that the user can conveniently and directly use the structured document.
Further, generating the structured document based on the words and sentences may specifically include: generating a target data packet based on the words and the sentences; transmitting the target data packet to a target management system; and generating a structured document based on the target data packet and the target management system.
The target data packet is a data packet which is generated based on words and sentences and is suitable for a target management system, and the target data packet comprises words and sentences and labels corresponding to the words and sentences. It should be noted that, for the specific principle of generating the target data packet based on the words and phrases, reference may be made to the principle in the prior art, so as to avoid repeated description, and the description is not repeated here.
There are multiple metadata in the target management system, the metadata is used to characterize each object in the target management system, and each tag in the target data packet corresponds to one metadata. The transmitting the target data packet to the target management system may specifically include: and uploading each word and sentence to a metadata position corresponding to the label of the word and sentence.
It should be further noted that, based on the target data package and the target management system, the specific principle of generating the structured document may refer to the principle in the prior art, so as to avoid repeated description, and will not be described here again.
In the embodiment of the application, the target data packet is generated based on the words and sentences, the target data is transmitted to the target management system, and the structured document is generated based on the target data packet and the target management system, so that different users can look up the structured document from the target management system, and the sharing of the structured document can be realized. In addition, the structured document can be conveniently checked by the user in the mode.
In addition, after the target data packet is transmitted to the target management system, other systems connected to the target management system may use the metadata and the corresponding words of the element data in the target management system, such as: other systems may generate corresponding structured documents based on the elements of the target management system and the words and phrases corresponding to the elements.
Referring to fig. 4, based on the same inventive concept, an embodiment of the present application further provides a text processing apparatus 100, where the apparatus 100 includes: an acquisition module 101 and a processing module 102.
The acquiring module 101 is configured to acquire a text to be processed, where the text to be processed includes information related to an item.
The processing module 102 is configured to process the text to be processed through a preset extraction model to obtain words and sentences representing preset intentions, where the preset intentions include at least one of a target, a plan, an execution, an inspection, and a process.
Optionally, after the text to be processed is processed through the preset extraction model to obtain words and sentences representing the preset intention, the processing module 102 is further configured to fill the words and sentences into the preset table and display the preset table.
Optionally, after the text to be processed is processed through the preset extraction model to obtain words and sentences representing the preset intentions, the processing module 102 is further configured to generate a structured document based on the words and sentences, where the structured document is a file formed according to a frame corresponding to the preset intentions.
Optionally, the processing module 102 is specifically configured to generate a target data packet based on words and sentences; transmitting the target data packet to a target management system; and generating a structured document based on the target data packet and the target management system.
Optionally, the extraction model comprises a BilSTM layer and a CRF layer, the BilSTM layer is connected with the CRF layer, the BilSTM layer comprises a word embedding layer, a bidirectional LSTM layer and a full connection layer which are sequentially connected, and the full connection layer is connected with the CRF layer; correspondingly, the processing module 102 is specifically configured to process the text to be processed through the BiLSTM layer to obtain each category score corresponding to each word and sentence in the text to be processed; and processing each category score corresponding to each word and sentence through a CRF layer to obtain the word and sentence.
Referring to fig. 5, based on the same inventive concept, a schematic block diagram of an electronic device 200 is provided in the embodiment of the present application, and the electronic device 200 may be configured to implement the above-mentioned text processing method. In the embodiment of the present application, the electronic Device 200 may be, but is not limited to, a Personal Computer (PC), a smart phone, a tablet PC, a Personal Digital Assistant (PDA), a Mobile Internet Device (MID), and the like. Structurally, the electronic device 200 may include a processor 210 and a memory 220.
The processor 210 and the memory 220 are electrically connected, directly or indirectly, to enable data transmission or interaction, for example, the components may be electrically connected to each other via one or more communication buses or signal lines. The processor 210 may be an integrated circuit chip having signal processing capabilities. The Processor 210 may also be a general-purpose Processor, for example, a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a discrete gate or transistor logic device, or a discrete hardware component, which can implement or execute the methods, steps, and logic blocks disclosed in the embodiments of the present Application. Further, a general purpose processor may be a microprocessor or any conventional processor or the like.
The Memory 220 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), and an electrically Erasable Programmable Read-Only Memory (EEPROM). The memory 220 is used for storing a program, and the processor 210 executes the program after receiving the execution instruction.
It should be understood that the structure shown in fig. 5 is merely an illustration, and the electronic device 200 provided in the embodiments of the present application may have fewer or more components than those shown in fig. 5, or may have a different configuration than that shown in fig. 5. Further, the components shown in fig. 5 may be implemented by software, hardware, or a combination thereof.
It should be noted that, as those skilled in the art can clearly understand, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Based on the same inventive concept, embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed, the computer program performs the methods provided in the above embodiments.
The storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more integrated servers, data centers, and the like. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A method of text processing, comprising:
acquiring a text to be processed, wherein the text to be processed comprises project related information;
and processing the text to be processed through a preset extraction model to obtain words and sentences representing preset intentions, wherein the preset intentions comprise at least one of target, plan, execution, inspection and processing.
2. The method according to claim 1, wherein after the text to be processed is processed through a preset extraction model to obtain a word or a sentence representing a preset intention, the method further comprises:
and filling the words and sentences into a preset table, and displaying the preset table.
3. The method according to claim 1, wherein after the text to be processed is processed through a preset extraction model to obtain a word or a sentence representing a preset intention, the method further comprises:
and generating a structured document based on the words and the sentences, wherein the structured document is a file formed according to a frame corresponding to the preset intention.
4. The method of claim 3, wherein generating a structured document based on the words comprises:
generating a target data packet based on the words and sentences;
transmitting the target data packet to a target management system;
and generating a structured document based on the target data packet and the target management system.
5. The method according to claim 1, wherein the text to be processed is an agro-farming technology text or a mission report text.
6. The method of claim 1, wherein the extraction model comprises a BilSTM layer and a CRF layer, wherein the BilSTM layer is connected to the CRF layer, and the BilSTM layer comprises a word embedding layer, a bidirectional LSTM layer and a fully connected layer connected to the CRF layer in this order;
processing the text to be processed through a preset extraction model to obtain words and sentences representing preset intentions, wherein the steps of:
processing the text to be processed through the BilSTM layer to obtain various category scores corresponding to various words and sentences in the text to be processed;
and processing each category score corresponding to each word and sentence through the CRF layer to obtain the word and sentence.
7. The method of claim 6, wherein the loss function of the CRF layer consists of a fraction of real paths and a total fraction of all paths, the fraction of real paths being the highest fraction of all paths.
8. A text processing apparatus, comprising:
the acquisition module is used for acquiring a text to be processed, wherein the text to be processed comprises project related information;
and the processing module is used for processing the text to be processed through a preset extraction model to obtain words and sentences representing preset intentions, and the preset intentions comprise at least one of targets, plans, executions, checks and processes.
9. An electronic device, comprising: a processor and a memory, the processor and the memory connected;
the memory is used for storing programs;
the processor is configured to execute a program stored in the memory to perform the method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when executed by a computer, performs the method of any one of claims 1-7.
CN202211583805.4A 2022-12-09 2022-12-09 Text processing method and device, electronic equipment and computer readable storage medium Pending CN115759035A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211583805.4A CN115759035A (en) 2022-12-09 2022-12-09 Text processing method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211583805.4A CN115759035A (en) 2022-12-09 2022-12-09 Text processing method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN115759035A true CN115759035A (en) 2023-03-07

Family

ID=85345211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211583805.4A Pending CN115759035A (en) 2022-12-09 2022-12-09 Text processing method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115759035A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750801A (en) * 2015-03-24 2015-07-01 华迪计算机集团有限公司 Generation method and system of structured document
CN113435582A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Text processing method based on sentence vector pre-training model and related equipment
CN113792818A (en) * 2021-10-18 2021-12-14 平安科技(深圳)有限公司 Intention classification method and device, electronic equipment and computer readable storage medium
CN114171147A (en) * 2021-11-30 2022-03-11 中国医学科学院北京协和医院 Novel medical text preprocessing system
CN114648029A (en) * 2022-03-31 2022-06-21 河海大学 Electric power field named entity identification method based on BiLSTM-CRF model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750801A (en) * 2015-03-24 2015-07-01 华迪计算机集团有限公司 Generation method and system of structured document
CN113435582A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Text processing method based on sentence vector pre-training model and related equipment
CN113792818A (en) * 2021-10-18 2021-12-14 平安科技(深圳)有限公司 Intention classification method and device, electronic equipment and computer readable storage medium
CN114171147A (en) * 2021-11-30 2022-03-11 中国医学科学院北京协和医院 Novel medical text preprocessing system
CN114648029A (en) * 2022-03-31 2022-06-21 河海大学 Electric power field named entity identification method based on BiLSTM-CRF model

Similar Documents

Publication Publication Date Title
TWI788529B (en) Credit risk prediction method and device based on LSTM model
Temniranrat et al. A system for automatic rice disease detection from rice paddy images serviced via a Chatbot
WO2021068601A1 (en) Medical record detection method and apparatus, device and storage medium
CN106611375A (en) Text analysis-based credit risk assessment method and apparatus
CN112988963B (en) User intention prediction method, device, equipment and medium based on multi-flow nodes
CN112380344B (en) Text classification method, topic generation method, device, equipment and medium
US20200175314A1 (en) Predictive data analytics with automatic feature extraction
CN113707299A (en) Auxiliary diagnosis method and device based on inquiry session and computer equipment
CN112507095A (en) Information identification method based on weak supervised learning and related equipment
CN112837142A (en) Financial risk model training method and device
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
CN113626576A (en) Method and device for extracting relational characteristics in remote supervision, terminal and storage medium
CN111859933A (en) Training method, recognition method, device and equipment of Malay recognition model
CN117114901A (en) Method, device, equipment and medium for processing insurance data based on artificial intelligence
Thandapani et al. Decision support system for plant disease identification
CN115759035A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN114067343A (en) Data set construction method, model training method and corresponding device
CN114676307A (en) Ranking model training method, device, equipment and medium based on user retrieval
CN113688232A (en) Method and device for classifying bidding texts, storage medium and terminal
CN110909777A (en) Multi-dimensional feature map embedding method, device, equipment and medium
Rose Mary et al. Text based smart answering system in agriculture using RNN.
Wong et al. Language independent models for COVID-19 fake news detection: Black box versus white box models
CN116303624B (en) Agricultural data processing method and device, electronic equipment and storage medium
CN117171353A (en) Method, device and equipment for resolving event co-reference and readable storage medium
CN118012995A (en) Knowledge graph-based agricultural consultation system, method, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination