CN117076595A

CN117076595A - Text processing method, device, equipment and storage medium based on artificial intelligence

Info

Publication number: CN117076595A
Application number: CN202311177526.2A
Authority: CN
Inventors: 姜宽; 陈奕宇
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2023-09-12
Filing date: 2023-09-12
Publication date: 2023-11-17

Abstract

The application belongs to the field of artificial intelligence and the field of financial science and technology, and relates to a text processing method based on artificial intelligence, which comprises the following steps: acquiring initial service data from a service system; washing the initial business data based on a preset designated data type to obtain a corresponding unstructured text; preprocessing the unstructured text to obtain a corresponding target text; invoking a pre-constructed text processing model; and processing the target text based on the text processing model to obtain target formula analysis information and a target formula result corresponding to the target text. The application also provides a text processing device, computer equipment and a storage medium based on the artificial intelligence. Furthermore, the present application relates to blockchain technology, in which text processing models can be stored. The method and the device can be applied to a formula text analysis scene in the financial field, realize automatic analysis of the formula of the target text, and improve the processing efficiency of formula information analysis of the target text.

Description

Text processing method, device, equipment and storage medium based on artificial intelligence

Technical Field

The application relates to the technical field of artificial intelligence development and the technical field of finance, in particular to a text processing method, a text processing device, computer equipment and a storage medium based on artificial intelligence.

Background

With the rapid development of big data technology, in the field of financial science and technology, data mining on financial data is increasingly paid attention to financial science and technology companies, such as insurance companies, banks, and the like. The data mining is to extract valuable information from a large amount of text data in a specific service scene, and analyze the extracted information to analyze the content in the text data. Among the many valuable information contained within the finance and technology company, formulas have a direct impact on the parsing of text content as a data calculation tool, and thus it is often necessary to accurately extract formulas that appear in text data when data mining is performed.

In the prior art, a method of information analysis of formulas of internal business data by a finance and science company mainly adopts a method of manual processing and machine calculation, data analysts specially set up in the finance and science company perform formula analysis on the business data to obtain corresponding formulas and summarize formula information, then the extracted formulas are input into a calculation machine to perform data calculation to generate formula results, and the data processing method needs to occupy more resource cost and has low processing efficiency.

Disclosure of Invention

The embodiment of the application aims to provide a text processing method, a device, computer equipment and a storage medium based on artificial intelligence, which are used for solving the technical problems that an existing data analysis personnel performs formula analysis on service data to obtain a corresponding formula and summarizes formula information, then the extracted formula is input into a computing machine to perform data calculation to generate a formula result, and the data processing mode needs to occupy more labor cost and has low processing efficiency.

In order to solve the technical problems, the embodiment of the application provides a text processing method based on artificial intelligence, which adopts the following technical scheme:

acquiring initial service data to be processed from a service system;

cleaning the initial service data based on a preset designated data type to obtain a corresponding unstructured text;

preprocessing the unstructured text to obtain a corresponding target text;

invoking a pre-constructed text processing model; the text processing model is generated by training a preset language model based on a compacting template and pre-collected text data containing formulas;

And processing the target text based on the text processing model to obtain target formula analysis information and a target formula result corresponding to the target text.

Further, the step of cleaning the initial service data based on the preset designated data type to obtain a corresponding unstructured text specifically includes:

calling a preset cleaning tool;

cleaning the initial service data based on the cleaning tool to obtain first service data corresponding to the designated data type;

removing the first service data from the initial service data to obtain corresponding second service data;

and taking the second service data as the unstructured text.

Further, the step of preprocessing the unstructured text to obtain a corresponding target text specifically includes:

performing data standardization processing on the unstructured text to obtain a corresponding first text;

performing data conversion processing on the first text to obtain a corresponding second text;

and taking the second text as the target text.

Further, before the step of calling the pre-built text processing model, the method further comprises:

Acquiring a preset number of historical text data containing formulas, wherein the historical text data is acquired in advance;

marking the historical text data based on a preset data marking system to obtain corresponding marking data;

processing the labeling data based on the labeling template, and constructing triples corresponding to the labeling data; the triplet comprises the labeling data, formula analysis information corresponding to the labeling data and a formula result corresponding to the labeling data;

invoking the language model;

training and testing the language model by using the triples to construct the text processing model.

Further, the step of labeling the historical text data based on the preset data labeling system to obtain corresponding labeling data specifically includes:

invoking the data labeling system;

inputting the historical text data into the data labeling system;

acquiring a labeling document corresponding to the historical text data input by a specified user;

and based on the labeling document, labeling processing corresponding to the historical text data is executed in the data labeling system, and labeling data corresponding to the historical text data is obtained.

Further, the step of training and testing the language model by using the triples to construct the text processing model specifically includes:

dividing the triples based on a preset dividing proportion to obtain a corresponding training set and a corresponding testing set;

training the language model based on the training set to obtain a trained language model;

testing the trained language model based on the test set, and judging whether the trained language model accords with a preset accuracy condition;

if yes, the trained language model is used as the text processing model.

Further, after the step of processing the target text based on the text processing model to obtain the target formula analysis information and the target formula result corresponding to the target text, the method further includes:

generating target processing data based on the target formula analysis information and the target formula result;

generating a data association relationship between the initial service data and the target processing data;

and storing the initial business data and the target processing data based on the data association relation.

In order to solve the technical problems, the embodiment of the application also provides a text processing device based on artificial intelligence, which adopts the following technical scheme:

the first acquisition module is used for acquiring initial service data to be processed from the service system;

the cleaning module is used for cleaning the initial service data based on a preset designated data type to obtain a corresponding unstructured text;

the first processing module is used for preprocessing the unstructured text to obtain a corresponding target text;

the first calling module is used for calling a pre-constructed text processing model; the text processing model is generated by training a preset language model based on a compacting template and pre-collected text data containing formulas;

and the second processing module is used for processing the target text based on the text processing model to obtain target formula analysis information and a target formula result corresponding to the target text.

In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:

acquiring initial service data to be processed from a service system;

preprocessing the unstructured text to obtain a corresponding target text;

In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:

acquiring initial service data to be processed from a service system;

preprocessing the unstructured text to obtain a corresponding target text;

Compared with the prior art, the embodiment of the application has the following main beneficial effects:

the embodiment of the application firstly acquires initial service data to be processed from a service system; then cleaning the initial service data based on a preset designated data type to obtain a corresponding unstructured text; preprocessing the unstructured text to obtain a corresponding target text; subsequently calling a pre-constructed text processing model; and finally, processing the target text based on the text processing model to obtain target formula analysis information and a target formula result corresponding to the target text. After the initial business data to be processed, which is acquired from a business system, is cleaned and preprocessed to obtain the target text, the target text is processed by using the pre-constructed text processing model to obtain the target formula analysis information and the target formula result corresponding to the target text, so that the automatic analysis of the formulas in the target text is realized, the corresponding target formula analysis information and the target formula result are generated, the resource cost required by text processing of the target text is effectively saved, and the processing efficiency of formula information analysis of the target text is improved.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of an artificial intelligence based text processing method in accordance with the present application;

FIG. 3 is a schematic diagram of one embodiment of an artificial intelligence based text processing device in accordance with the present application;

FIG. 4 is a schematic structural diagram of one embodiment of a computer device in accordance with the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.

It should be noted that, the text processing method based on artificial intelligence provided by the embodiment of the application is generally executed by a server/terminal device, and correspondingly, the text processing device based on artificial intelligence is generally arranged in the server/terminal device.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow chart of one embodiment of an artificial intelligence based text processing method in accordance with the present application is shown. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs. The text processing method based on the artificial intelligence provided by the embodiment of the application can be applied to any scene needing formula text analysis, and can be applied to products of the scenes, such as financial formula text analysis in the field of financial insurance. The text processing method based on artificial intelligence comprises the following steps:

Step S201, obtaining initial service data to be processed from the service system.

In this embodiment, the electronic device (for example, the server/terminal device shown in fig. 1) on which the text processing method based on artificial intelligence operates may acquire the initial service data to be processed through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection may include, but is not limited to, 3G/4G/5G connection, wiFi connection, bluetooth connection, wiMAX connection, zigbee connection, UWB (ultra wideband) connection, and other now known or later developed wireless connection. The initial service data to be processed refers to text data which is subjected to formula information analysis processing in a service system. Illustratively, in the business scenario of text formula extraction of financial insurance, the business system may include an insurance system, a banking system, a transaction system, an order system, and the like. The initial business data may include data of insurance contracts, insurance propaganda text, financial business specifications, financial news, and the like in the business system.

Step S202, cleaning the initial service data based on a preset designated data type to obtain a corresponding unstructured text.

In this embodiment, the specified data types may include non-text types and non-numeric types. The specific implementation process of cleaning the initial service data based on the preset specified data type to obtain the corresponding unstructured text will be described in further detail in the following specific embodiments, which will not be described herein.

And step S203, preprocessing the unstructured text to obtain a corresponding target text.

In this embodiment, the above-mentioned preprocessing is performed on the unstructured text to obtain a specific implementation process of the corresponding target text, and this will be described in further detail in the following specific embodiments, which will not be described herein.

Step S204, calling a pre-constructed text processing model;

in this embodiment, the text processing model is a model generated by training a preset language model based on a compacting template and pre-collected text data including formulas, and for the training generating process of the text processing model, the present application will be described in further detail in the following specific embodiments, which will not be described herein.

Step S205, processing the target text based on the text processing model, to obtain target formula analysis information and a target formula result corresponding to the target text.

In this embodiment, the target text may be input into the text processing model, and the target text is analyzed and processed by the text processing model, so as to output the target formula analysis information and the target formula result corresponding to the target text.

Firstly, obtaining initial service data to be processed from a service system; then cleaning the initial service data based on a preset designated data type to obtain a corresponding unstructured text; preprocessing the unstructured text to obtain a corresponding target text; subsequently calling a pre-constructed text processing model; and finally, processing the target text based on the text processing model to obtain target formula analysis information and a target formula result corresponding to the target text. After the initial business data to be processed, which is acquired from a business system, is cleaned and preprocessed to obtain the target text, the target text is processed by using the pre-constructed text processing model to obtain the target formula analysis information and the target formula result corresponding to the target text, so that the automatic analysis of the formulas in the target text is realized, the corresponding target formula analysis information and the target formula result are generated, the resource cost required by text processing of the target text is effectively saved, and the processing efficiency of formula information analysis of the target text is improved.

In some alternative implementations, step S202 includes the steps of:

calling a preset cleaning tool.

In the present embodiment, the selection of the cleaning tool is not particularly limited, and for example, a cleaning tool based on Python language may be used.

And cleaning the initial service data based on the cleaning tool to obtain first service data corresponding to the specified data type.

In this embodiment, the specified data types include non-text type and non-digital type, and the cleaning tool based on Python language may be used to clean the initial service data to obtain the non-text type and non-digital type data in the original service data, so as to obtain the corresponding first service data

And eliminating the first service data from the initial service data to obtain corresponding second service data.

In this embodiment, the first service data belonging to the non-text type and the non-digital type is removed from the initial service data, so that the data including text and digital in the initial service data, that is, the second service data, can be obtained.

And taking the second service data as the unstructured text.

The application calls the preset cleaning tool; then cleaning the initial service data based on the cleaning tool to obtain first service data corresponding to the designated data type; then, the first service data is removed from the initial service data to obtain corresponding second service data; and taking the second service data as the unstructured text. According to the application, the initial service data is cleaned based on the preset cleaning tool and the designated data type, so that the unstructured text can be rapidly and accurately screened from the initial service data, the generation efficiency of the unstructured text is improved, and the accuracy of the generated unstructured text is ensured.

In some alternative implementations of the present embodiment, step S203 includes the steps of:

and carrying out data standardization processing on the unstructured text to obtain a corresponding first text.

In this embodiment, the unstructured text may be normalized using a processing function included in the numpy library for data normalization to obtain the corresponding second data.

And carrying out data conversion processing on the first text to obtain a corresponding second text.

In this embodiment, the data conversion process may be performed on the first text by using the data conversion tool to obtain data conforming to a data format that can be processed by the model, that is, the above-described second text.

And taking the second text as the target text.

The method comprises the steps of carrying out data standardization processing on the unstructured text to obtain a corresponding first text; then, carrying out data conversion processing on the first text to obtain a corresponding second text; and taking the second text as the target text. According to the method, the target text with the data format which accords with the readable processing of the text processing model can be obtained quickly by carrying out standardized processing and data conversion processing on the obtained unstructured text, so that the smooth proceeding of the subsequent data processing flow of the target text through the text processing model is ensured.

In some alternative implementations, before step S204, the electronic device may further perform the following steps:

acquiring a preset number of pre-acquired historical text data containing formulas.

In this embodiment, the above-mentioned historical text data refers to text data containing formulas used by financial science and technology companies, such as insurance companies, banks, etc., to conduct business in a business scenario of financial science and technology. For example, in the business scenario of text formula parsing of financial insurance, the text material data may include: insurance contracts, financial business specifications, insurance propaganda text, financial information, and the like. The preset number of values are not particularly limited, and may be set according to actual use requirements.

And labeling the historical text data based on a preset data labeling system to obtain corresponding labeling data.

In this embodiment, the foregoing labeling process is performed on the historical text data based on the preset data labeling system to obtain a specific implementation process of the corresponding labeling data, which will be described in further detail in the following specific embodiments, and will not be described herein.

Processing the labeling data based on the labeling template, and constructing triples corresponding to the labeling data; the triplet includes the labeling data, formula analysis information corresponding to the labeling data, and formula results corresponding to the labeling data.

In this embodiment, promt is an input form and template designed by researchers for downstream tasks, which can help the language model "recall" its own knowledge "learned" during pre-training, a means of activating language model techniques. The template is a Chain-of-Thought Prompting template. The result of the triplet is < input, chain-of-sample, output >, input refers to the labeling data, chain-of-sample refers to formula analysis information corresponding to the labeling data, the formula analysis information comprises analysis process and description information, and output refers to formula result corresponding to the labeling data. For example, if the labeling data is medical fee (50-3 days) 100 yuan/day, the chain-of-sample is corresponding formula analysis information formulated according to formula characteristics in the labeling data, such as "50 days" for hospitalization days, "3 days" for claim-free days, and "100 yuan/day" for daily benefits; output is structured entity information- -hospitalization benefits total: 4700 yuan.

And calling the language model.

In this embodiment, the language model is not specifically limited, and may be selected according to actual usage requirements, for example, models such as GPT-3 and PaLM, galactica, LLaMA.

In this embodiment, the foregoing training and testing the language model using the triples to construct a concrete implementation process of the text processing model will be described in further detail in the following embodiments, which will not be described herein.

The method comprises the steps of acquiring a preset number of historical text data containing formulas, wherein the historical text data are acquired in advance; then, marking the historical text data based on a preset data marking system to obtain corresponding marking data; processing the labeling data based on the labeling template, and constructing a triplet corresponding to the labeling data; and subsequently calling the language model, and training and testing the language model by using the triples to construct the text processing model. According to the method, the historical text data can be rapidly and intelligently marked based on the use of the data marking system, and further the marking data is processed based on the marking template, so that triples required by model construction can be accurately generated, the language model can be trained and tested based on the obtained triples, the construction of the text processing model is rapidly completed, and the efficiency of creating the text processing model is improved.

In some optional implementations, the labeling processing is performed on the historical text data based on the preset data labeling system to obtain corresponding labeling data, and the method includes the following steps:

and calling the data labeling system.

In this embodiment, the data labeling system is a system platform constructed according to the processing requirement of actual sample data labeling, and the data labeling system can provide automatic labeling processing for sample data and can also be used for interacting with labeling service personnel, so as to assist labeling personnel in rapidly performing data labeling work in the data labeling system.

And inputting the historical text data into the data labeling system.

And acquiring the annotation document corresponding to the historical text data input by the appointed user.

In this embodiment, the designated user may be a business person responsible for the data labeling work. The labeling text is a document obtained by extracting and analyzing formula information contained in the historical text data according to the actual service information extraction requirement. Illustratively, if the historical text data includes the following: medical fee (50-8 days) 100 yuan/day, the content in the prepared labeling document can comprise: "50 days" means the number of days in hospital, "8 days" means the number of days in claim, and "100 yuan/day" means the daily subsidy, and "4200 yuan" is inferred to express the total amount of subsidy.

In the present embodiment, the above example is accepted, if the history text data includes the contents of: medical fee (50-8 days) 100 yuan/day, labeling data may include: the formula analysis information corresponding to the historical text data comprises: "50 days" means the number of days in hospital, "8 days" means the number of days in claim, and "100 yuan/day" means the number of benefits per day, and the result of the formula corresponding to the historical text data is "4200 yuan".

The application calls the data labeling system; then inputting the historical text data into the data labeling system; then, obtaining a labeling document corresponding to the historical text data input by a specified user; and executing marking processing corresponding to the historical text data in the data marking system based on the marking document to obtain marking data corresponding to the historical text data. The method and the system can rapidly and intelligently label the historical text data based on the use of the data labeling system so as to accurately generate the labeling data required by the text data model construction, improve the generating efficiency of the labeling data and ensure the accuracy of the generated labeling data. And the preset language model can be trained and tested based on the obtained labeling data, so that the construction of the text processing model is completed rapidly, and the construction efficiency of the text processing model is improved.

In some optional implementations of this embodiment, the training and testing the language model using the triples to construct the text processing model includes the steps of:

and dividing the triples based on a preset dividing proportion to obtain a corresponding training set and a corresponding testing set.

In this embodiment, the number of the dividing ratio is not limited, and may be set according to actual use requirements, for example, may be set to 8:2.

And training the language model based on the training set to obtain a trained language model.

In this embodiment, the training set is used to train the language model, so that the language model learns the potential relationship between the text and the formula in the triplet, and automatically generates the chain-of-sample and output corresponding to the labeling data, that is, the formula analysis information corresponding to the labeling data, and the formula result corresponding to the labeling data according to the labeling data input in the triplet.

And testing the trained language model based on the test set, and judging whether the trained language model accords with a preset accuracy condition.

In this embodiment, the accuracy condition may refer to whether the model effect of the trained language model is greater than a preset accuracy threshold, and if the model effect of the trained language model is greater than the accuracy threshold, it is determined that the trained language model meets the accuracy condition. The value of the accuracy threshold is not specifically limited, and may be set according to actual use requirements.

If yes, the trained language model is used as the text processing model.

The application obtains a corresponding training set and a corresponding testing set by dividing the triples based on a preset dividing proportion; then training the language model based on the training set to obtain a trained language model; subsequently, testing the trained language model based on the test set, and judging whether the trained language model accords with a preset accuracy condition; if yes, the trained language model is used as the text processing model. According to the application, the preset language model is trained and tested by using the triples, so that the text processing model is built rapidly and intelligently, and the construction efficiency of the text processing model is improved.

In some optional implementations of this embodiment, after step S205, the electronic device may further perform the following steps:

and generating target processing data based on the target formula analysis information and the target formula result.

In this embodiment, the target processing data may be obtained by integrating the target formula analysis information with the target formula result.

And generating a data association relation between the initial service data and the target processing data.

In this embodiment, the above-mentioned data association relationship refers to constructing a data correspondence relationship between the initial service data and the target processing data.

In this embodiment, the association between the initial service data and the target processing data is clearly referred to later by storing the initial service data and the target processing data correspondingly based on the data association relationship.

Generating target processing data based on the target formula analysis information and the target formula result; then generating a data association relationship between the initial service data and the target processing data; and storing the initial service data and the target processing data based on the data association relation. According to the method, the target text is processed based on the text processing model to obtain the target formula analysis information and the target formula result corresponding to the target text, the target processing data are intelligently generated based on the target formula analysis information and the target formula result, the initial business data and the target processing data are stored based on the generated data association relation between the initial business data and the target processing data, the storage intelligence of the initial business data and the target processing data is improved, and the data safety of the initial business data and the target processing data is ensured.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

It is emphasized that to further guarantee the privacy and security of the text processing model, the text processing model may also be stored in a blockchain node.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an artificial intelligence-based text processing apparatus, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 3, the text processing device 300 based on artificial intelligence according to the present embodiment includes: a first acquisition module 301, a cleaning module 302, a first processing module 303, a first invoking module 304, and a second processing module 305. Wherein:

a first obtaining module 301, configured to obtain initial service data to be processed from a service system;

the cleaning module 302 is configured to clean the initial service data based on a preset specified data type to obtain a corresponding unstructured text;

the first processing module 303 is configured to pre-process the unstructured text to obtain a corresponding target text;

a first calling module 304, configured to call a text processing model that is built in advance; the text processing model is generated by training a preset language model based on a compacting template and pre-collected text data containing formulas;

And the second processing module 305 is configured to process the target text based on the text processing model, so as to obtain target formula analysis information and a target formula result corresponding to the target text.

In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the text processing method based on artificial intelligence in the foregoing embodiment one by one, which is not described herein again.

In some alternative implementations of the present embodiment, the cleaning module 302 includes:

the first calling sub-module is used for calling a preset cleaning tool;

the cleaning sub-module is used for cleaning the initial service data based on the cleaning tool to obtain first service data corresponding to the designated data type;

the rejecting sub-module is used for rejecting the first service data from the initial service data to obtain corresponding second service data;

and the first determining submodule is used for taking the second service data as the unstructured text.

In some alternative implementations of the present embodiment, the first processing module 303 includes:

the first processing sub-module is used for carrying out data standardization processing on the unstructured text to obtain a corresponding first text;

the second processing sub-module is used for carrying out data conversion processing on the first text to obtain a corresponding second text;

and the second determining submodule is used for taking the second text as the target text.

In some optional implementations of this embodiment, the artificial intelligence based text processing apparatus further includes:

the second acquisition module is used for acquiring a preset number of historical text data containing formulas, which are acquired in advance;

the marking module is used for marking the historical text data based on a preset data marking system to obtain corresponding marking data;

the construction module is used for processing the labeling data based on the labeling template and constructing triples corresponding to the labeling data; the triplet comprises the labeling data, formula analysis information corresponding to the labeling data and a formula result corresponding to the labeling data;

The second calling module is used for calling the language model;

and the construction module is used for training and testing the language model by using the triples so as to construct the text processing model.

In some optional implementations of this embodiment, the labeling module includes:

the second calling sub-module is used for calling the data marking system;

the input sub-module is used for inputting the historical text data into the data labeling system;

the obtaining sub-module is used for obtaining the annotation document corresponding to the historical text data input by the appointed user;

and the labeling sub-module is used for executing labeling processing corresponding to the historical text data in the data labeling system based on the labeling document to obtain labeling data corresponding to the historical text data.

In some optional implementations of this embodiment, the building block includes:

dividing sub-module, which is used for dividing the triples based on preset dividing proportion to obtain corresponding training set and testing set;

the training sub-module is used for training the language model based on the training set to obtain a trained language model;

the test sub-module is used for testing the trained language model based on the test set and judging whether the trained language model accords with a preset accuracy condition or not;

and the third determining submodule is used for taking the trained language model as the text processing model if yes.

the first generation module is used for generating target processing data based on the target formula analysis information and the target formula result;

the second generation module is used for generating a data association relation between the initial service data and the target processing data;

And the storage module is used for storing the initial service data and the target processing data based on the data association relation.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It should be noted that only computer device 4 having components 41-43 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used to store an operating system and various application software installed on the computer device 4, such as computer readable instructions of an artificial intelligence based text processing method. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as computer readable instructions for executing the artificial intelligence based text processing method.

The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.

in the embodiment of the application, initial service data to be processed is firstly obtained from a service system; then cleaning the initial service data based on a preset designated data type to obtain a corresponding unstructured text; preprocessing the unstructured text to obtain a corresponding target text; subsequently calling a pre-constructed text processing model; and finally, processing the target text based on the text processing model to obtain target formula analysis information and a target formula result corresponding to the target text. After the initial business data to be processed, which is acquired from a business system, is cleaned and preprocessed to obtain the target text, the target text is processed by using the pre-constructed text processing model to obtain the target formula analysis information and the target formula result corresponding to the target text, so that the automatic analysis of the formulas in the target text is realized, the corresponding target formula analysis information and the target formula result are generated, the resource cost required by text processing of the target text is effectively saved, and the processing efficiency of formula information analysis of the target text is improved.

The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of an artificial intelligence-based text processing method as described above.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims

1. A text processing method based on artificial intelligence, comprising the steps of:

acquiring initial service data to be processed from a service system;

preprocessing the unstructured text to obtain a corresponding target text;

invoking a pre-constructed text processing model; the text processing model is generated by training a preset language model based on a Prompti ng template and pre-collected text data containing formulas;

2. The method for processing text based on artificial intelligence according to claim 1, wherein the step of cleaning the initial business data based on a preset specified data type to obtain a corresponding unstructured text specifically comprises:

calling a preset cleaning tool;

and taking the second service data as the unstructured text.

3. The artificial intelligence based text processing method according to claim 1, wherein the step of preprocessing the unstructured text to obtain the corresponding target text specifically comprises:

and taking the second text as the target text.

4. The artificial intelligence based text processing method according to claim 1, further comprising, before the step of calling a pre-built text processing model:

Invoking the language model;

5. The text processing method based on artificial intelligence according to claim 4, wherein the step of labeling the historical text data based on a preset data labeling system to obtain corresponding labeled data specifically comprises:

invoking the data labeling system;

inputting the historical text data into the data labeling system;

6. The artificial intelligence based text processing method according to claim 4, wherein the training and testing the language model using the triples to construct the text processing model comprises:

if yes, the trained language model is used as the text processing model.

7. The artificial intelligence based text processing method according to claim 1, further comprising, after the step of processing the target text based on the text processing model to obtain target formula analysis information and a target formula result corresponding to the target text:

8. An artificial intelligence based text processing apparatus comprising:

9. A computer device comprising a memory having stored therein computer readable instructions which when executed implement the steps of the artificial intelligence based text processing method of any of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the artificial intelligence based text processing method of any of claims 1 to 7.