CN112182058B

CN112182058B - Content acquisition method, device, computer equipment and medium combining RPA and AI

Info

Publication number: CN112182058B
Application number: CN202010824571.2A
Authority: CN
Inventors: 胡一川; 汪冠春; 褚瑞; 李玮; 白龙飞
Original assignee: Beijing Laiye Network Technology Co Ltd; Laiye Technology Beijing Co Ltd
Current assignee: Beijing Laiye Network Technology Co Ltd; Laiye Technology Beijing Co Ltd
Priority date: 2020-08-17
Filing date: 2020-08-17
Publication date: 2024-04-09
Anticipated expiration: 2040-08-17
Also published as: CN112182058A

Abstract

The application provides a content acquisition method, a device, computer equipment and a medium combining RPA and AI, wherein the method comprises the steps of acquiring a text to be identified by adopting a robot flow automatic RPA method; matching a first candidate content and a second candidate content in a text to be identified by adopting an RPA method and combining an extraction model, wherein the extraction model comprises: the method comprises the steps of obtaining a precise matching item and a fuzzy matching item, wherein a first candidate content is obtained based on the precise matching item, and a second candidate content is obtained based on the fuzzy matching item; target content is determined from the first candidate content and the second candidate content based on a model of a shallow neural network among natural language processing (Natural Language Processing, NLP) of the artificial intelligence AI. According to the text content acquisition method and device, time consumption for acquiring the text content can be saved, and convenience for acquiring the text content is improved, so that the application performance of acquiring the text content is effectively improved, and the industrial application effect of acquiring the text content is improved.

Description

Content acquisition method, device, computer equipment and medium combining RPA and AI

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, a computer device, and a medium for acquiring content by combining RPA (Robotic Process Automation, robot flow automation) and AI (Artificial Intelligence ).

Background

Robot process automation (Robotic Process Automation, RPA for short) is to simulate the operation of a human on a computer by specific "robot software" and automatically execute process tasks according to rules. Artificial intelligence (Artificial Intelligence, AI for short) is a piece of technical science that studies, develops theories, methods, techniques and application systems for simulating, extending and expanding human intelligence.

In the application scenario of natural language processing (Natural Language Processing, NLP) in the field of computer technology, fuzzy matching of text is usually required, and similar editing distance or word co-occurrence rate are adopted as the basis for measuring similarity, but these methods are based on literal expressions, and in order to identify the semantics contained in the text, a complex Deep Learning (DL) model is usually used to acquire text content (such as characters, words, sentences and the like in the text), so as to analyze the semantics.

In this way, the text content is obtained in a time-consuming manner, and has no good application performance, and may affect the industrial application effect of the text content obtaining.

Disclosure of Invention

The present application aims to solve, at least to some extent, one of the technical problems in the related art.

Therefore, the purpose of the application is to provide a content acquisition method, a device, a computer device and a medium combining RPA and AI, which can save time consumption for acquiring text content and improve convenience for acquiring the text content, thereby effectively improving application performance of acquiring the text content and improving industrial application effect of acquiring the text content.

To achieve the above object, a content acquisition method combining RPA and AI according to an embodiment of a first aspect of the present application includes: acquiring a text to be identified by adopting a robot process automation RPA method; matching a first candidate content and a second candidate content in the text to be identified by adopting the RPA method and combining an extraction model, wherein the extraction model comprises the following components: the first candidate content is obtained based on the matching of the precise matching item, and the second candidate content is obtained based on the matching of the fuzzy matching item; target content is determined from the first candidate content and the second candidate content based on a model of a shallow neural network among natural language processing (Natural Language Processing, NLP) of the artificial intelligence AI.

According to the content acquisition method combining the RPA and the AI, which is provided by the embodiment of the first aspect, the content acquisition process based on full-flow automation is realized, the full-flow automation content acquisition process is divided into the precise matching process and the fuzzy matching process, and the shallow neural network model is combined to determine the text content, so that the time consumption for acquiring the text content can be saved, the convenience for acquiring the text content is improved, the application performance of acquiring the text content is effectively improved, and the industrialized application effect of acquiring the text content is improved.

To achieve the above object, a content acquiring apparatus combining RPA and AI according to an embodiment of the second aspect of the present application includes: the acquisition module is used for acquiring a text to be identified by adopting a Robot Process Automation (RPA) method; the matching module is used for matching the first candidate content and the second candidate content in the text to be identified by adopting the RPA method and combining with a extraction model, and the extraction model comprises the following components: the first candidate content is obtained based on the matching of the precise matching item, and the second candidate content is obtained based on the matching of the fuzzy matching item; and the determining module is used for determining target content from the first candidate content and the second candidate content based on a shallow neural network model in natural language processing NLP of the artificial intelligence AI.

According to the content acquisition device combining the RPA and the AI, which is provided by the embodiment of the second aspect of the application, the content acquisition process based on full-flow automation is realized, the full-flow automation content acquisition process is divided into the precise matching process and the fuzzy matching process, and the shallow neural network model is combined to determine the text content, so that the time consumption for acquiring the text content can be saved, the convenience for acquiring the text content is improved, the application performance of acquiring the text content is effectively improved, and the industrialized application effect of acquiring the text content is improved.

To achieve the above object, a computer device according to an embodiment of a third aspect of the present application includes: at least one processor and memory; the memory stores computer-executable instructions; the at least one processor executes the computer-executable instructions stored in the memory, so that the at least one processor executes the content acquisition method combining RPA and AI according to the embodiment of the first aspect of the present application.

According to the computer equipment provided by the embodiment of the third aspect of the application, the content acquisition process based on full-flow automation is realized, the full-flow automation content acquisition process is divided into the precise matching process and the fuzzy matching process, and the shallow neural network model is combined to determine the text content, so that the time consumption for acquiring the text content can be saved, the convenience for acquiring the text content is improved, the application performance of acquiring the text content is effectively improved, and the industrialized application effect of acquiring the text content is improved.

To achieve the above objective, a computer readable storage medium according to an embodiment of the fourth aspect of the present application stores computer executable instructions, and when a processor executes the computer executable instructions, the content acquisition method combining RPA and AI according to the embodiment of the first aspect of the present application is implemented.

The computer readable storage medium provided by the embodiment of the fourth aspect of the application realizes a content acquisition process based on full-flow automation, divides the full-flow automation content acquisition process into an accurate matching process and a fuzzy matching process, and combines a shallow neural network model to determine text content, so that time consumption for acquiring the text content can be saved, convenience for acquiring the text content is improved, application performance of the text content is effectively improved, and industrial application effect of acquiring the text content is improved.

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flow chart of a content acquisition method combining RPA and AI according to one embodiment of the present disclosure;

FIG. 2 is a schematic drawing of an extraction model in an embodiment of the present application;

FIG. 3 is a flow chart of a content acquisition method combining RPA and AI according to another embodiment of the invention;

FIG. 4 is a schematic diagram of a graphical model of an embodiment of the present application;

FIG. 5 is a flow chart of a content acquisition method combining RPA and AI according to another embodiment of the invention;

fig. 6 is a schematic structural diagram of a content acquiring apparatus combining RPA and AI according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a content acquisition apparatus combining RPA and AI according to another embodiment of the present application;

fig. 8 is a schematic hardware structure of a computer device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. On the contrary, the embodiments of the present application include all alternatives, modifications, and equivalents as may be included within the spirit and scope of the appended claims.

Fig. 1 is a flowchart of a content acquisition method combining RPA and AI according to an embodiment of the present application.

The present embodiment is exemplified in a content acquisition apparatus in which a content acquisition method combining RPA and AI is configured to combine RPA and AI.

The content acquiring method combining the RPA and the AI in the present embodiment may be configured in the content acquiring apparatus combining the RPA and the AI, and the content acquiring apparatus combining the RPA and the AI may be provided in a server or may also be provided in a computer device, which is not limited in the embodiment of the present application.

The present embodiment takes an example in which the content acquisition method combining RPA and AI is configured in a computer device.

The execution body of the embodiment of the present application may be, for example, a central processing unit (Central Processing Unit, CPU) in a server or a computer device in hardware, and may be, for example, a relevant background service in a server or a computer device in software, which is not limited.

In one application scenario, when the server is used as the execution subject of the content acquisition method combining RPA and AI provided in this implementation. The user can upload the text to be identified through a text uploading interface provided by the terminal, or the user can also input audio data through a voice input interface provided by the terminal, the terminal sends the text to be identified or the audio data to a background server, and the server receives the text to be identified uploaded by the user through the text uploading interface; and/or receiving audio data input by a user through a voice input interface, analyzing semantic content in the audio data, performing text conversion on the semantic content to obtain a text to be identified, and then executing a content acquisition method combining RPA and AI by a server to feed the acquired content back to the terminal.

In another application scenario, when the terminal is the execution subject of the content acquisition method combining RPA and AI provided in the present embodiment. The user can upload the text to be recognized through a text uploading interface provided by the terminal, or the user can input audio data through a voice input interface provided by the terminal, the terminal directly executes a content acquisition method combining RPA and AI, and then the acquired content is provided for the user.

It should be noted that "obtaining" in the present application refers to a content obtaining process combining the robot flow automation RPA and the artificial intelligence AI, that is, the content obtaining process is a full-flow automation content obtaining process, and the content obtaining process is further combined with the artificial intelligence AI to implement full-flow automation parsing of the text to be identified, so as to identify the content in the text to be identified.

For example, the method and the device realize the content acquisition process based on full-flow automation, divide the full-flow automation content acquisition process into an accurate matching process and a fuzzy matching process, and combine a shallow neural network model to determine text content, so that the time consumption for acquiring the text content can be saved, the convenience for acquiring the text content is improved, the application performance of acquiring the text content is effectively improved, and the industrialized application effect of acquiring the text content is improved.

Referring to fig. 1, the method includes:

s101: and acquiring the text to be identified by adopting a robot process automation RPA method.

Wherein the text of the content (e.g. words, sentences, etc. in the text) to be obtained may be referred to as the text to be identified, the text may be e.g. a contracted text, or a agreement text between enterprises, etc., the text refers to a representation of a written language, typically a sentence or a combination of sentences having a complete, systematic meaning (Message). A text may be a Sentence (Sentence), a Paragraph (Paragraph), or a chapter (discovery), and in the embodiment of the present application, the text is an electronic text that may be recognized by a computer device, where a specific format of the text is, for example, any possible text format such as PDF format, word format, and the like, which is not limited thereto.

In order to realize the Robot Process Automation (RPA), the embodiment of the application can specifically receive the text to be identified uploaded by the user through the text uploading interface when acquiring the text to be identified; and/or receiving audio data input by a user through a voice input interface, analyzing semantic content in the audio data, and performing text conversion on the semantic content to obtain a text to be identified.

That is, a text uploading interface may be configured on the computer device, to detect whether the user invokes the text uploading interface to upload an electronic text, if so, the text uploaded by the user is used as a text to be identified, and in addition, in order to make the execution of the robot flow automation RPA more flexible, the embodiment of the application further supports configuring a voice input interface on the computer device, so that audio data input by the user is received based on the voice input interface, and then, a built-in audio parsing algorithm (the process of the audio parsing algorithm may be referred to in the related art and will not be described any more herein) is adopted to parse semantic content in the audio data, and text conversion is performed on the semantic content to obtain the text to be identified.

S102: matching a first candidate content and a second candidate content in a text to be identified by adopting an RPA method and combining an extraction model, wherein the extraction model comprises: the method comprises the steps of accurately matching items and fuzzy matching items, wherein first candidate contents are obtained based on the accurate matching items, and second candidate contents are obtained based on the fuzzy matching items.

Referring to fig. 2, fig. 2 is a schematic diagram of an extraction model in an embodiment of the present application, where the extraction model is used to match text content from text to be identified, and the extraction model includes a plurality of matching terms, in this embodiment of the present application, the plurality of matching terms in the extraction model are divided into an exact matching term and a fuzzy matching term, where the exact matching term is, for example, "[ @ v_identity ]", "[ hold-down or hold-up ]", "[ @ r_number ]", and the remaining matching terms can be divided into fuzzy matching terms in fig. 2.

The term "exact" and "fuzzy" are used to characterize the attribute and the characteristic of the matching term, and are not limited to the embodiments of the present application, when the exact matching term is used to match the content in the text, the content that is completely matched with the content described by the exact matching term is extracted from the text, and the extracted content that is completely matched with the content described by the exact matching term may be referred to as a first candidate content, and when the fuzzy matching term is used to match the content in the text, the content that is partially matched with the content described by the fuzzy matching term is extracted from the text, and the extracted content that is partially matched with the content described by the fuzzy matching term may be referred to as a second candidate content.

For example, assume that the text to be recognized is: advanced manager who holds 300,000 shares of own company (accounting for 0.0284% of total stock of own company) keeps own company not more than 75,000 shares of own company (accounting for 0.0071% of total stock of own company) in a centralized bidding manner within six months after fifteen transactions from this announcement, and confirms with the supervision; assuming that the exact match term, e.g., "[ @ v_identity ]", "[ minus hold ] plus hold ]", "[ @ r_number ]", in fig. 2, the first candidate content identified from the text to be identified according to the exact match term may be as follows:

The exact match [ @ v_identity ] matches 2 first candidate contents: advanced manager (18, 2), supervision (59,1);

the exact match [ minus or plus) matches 1 first candidate content: a hold-down (37,1);

the exact match [ @ r_number ] correspondingly matches 4 first candidate contents: 300,000 (4, 3), 0.0284 (14, 1), 75,000 (43,3), 0.0071 (53,1), wherein the number in brackets () representing the starting position and length of the corresponding first candidate content in the text to be recognized based on the segmentation, the number in brackets () can be used as a label for the node in the subsequent graph model, without limitation.

The matching process for the fuzzy matching item is the same as that described above, and will not be described again here.

Optionally, in some embodiments, in order to effectively improve the matching efficiency of the exact matching process, save time for obtaining text content, and reduce time complexity of exact matching processing, in this embodiment, when the number of exact matching terms in the extraction model is multiple, a storage structure of multiple exact matching terms is configured to be a double-array tree, and according to an expression rule between multiple exact matching terms, the double-array tree is constructed, and then, an RPA method may be used to combine the double-array tree to match the first candidate content in the text to be identified.

The double-array tree is a data storage structure, and when data is stored based on the double-array tree, an efficient searching effect can be obtained in the data searching process, and the double-array tree is actually a deterministic finite automaton. The traversal starts from the root node, then from beginning to end, the next state is determined by each character of the keyword, and the edge marked with the same character is selected for movement.

The expression rule may be used to describe the expression relationship of the order before and after each exact match term, for example, if the exact match term includes A, B, C, D, E, based on the expression rule, the content described by a is usually before the content described by B and C, but the content described by B and C is not limited to the order before and after the content described by B, D and E, and then it may be determined that a is a parent node of the double-array tree, B and C are child nodes of the a node in the double-array tree, D and E are child nodes of B, and then in the application of actual matching, the first candidate content in the text to be identified is traversed based on the structure of the double-array tree, thereby benefiting from the performance of the double-array tree, and the time complexity of the exact matching process is O (n).

In the embodiment of the application, the process of matching the first candidate content and the second candidate content in the text to be identified by adopting the RPA method in combination with the extraction model may be that the first candidate content in the text to be identified is matched by adopting the RPA method in combination with the extraction model, and then the second candidate content in the text to be identified is matched by adopting the RPA method in combination with the extraction model, which is not limited.

In the embodiment of the application, in order to effectively improve the overall acquisition efficiency and the acquisition accuracy of the content acquisition method, whether the first candidate content corresponding to each accurate matching item can be obtained through matching is also judged, if so, the second candidate content in the text to be identified is matched by combining the fuzzy matching item through an RPA method.

That is, if a plurality of exact matches are included in the extraction model, it is determined whether the first candidate content corresponding to each exact match can be identified from among the texts to be identified, if one or more exact matches cannot be matched from among the texts to be identified, it may be determined that the target content cannot be determined from among the texts to be identified at this time, if the first candidate content corresponding to each exact match can be identified from among the texts to be identified, it is continuously triggered to combine the fuzzy matches by the RPA method, and match the second candidate content from among the texts to be identified.

S103: target content is determined from the first candidate content and the second candidate content based on a model of a shallow neural network among natural language processing (Natural Language Processing, NLP) of the artificial intelligence AI.

It can be appreciated that, due to the diversification of the text content expression to be identified, the content extracted based on the extraction model is generally candidate content, that is, according to the exact match term and the fuzzy match term in the extraction model, a plurality of first candidate content and a plurality of second candidate content may be matched from the text to be identified, so that the application further supports determining the target content from the first candidate content and the second candidate content, where the target content is the more accurate text content identified from the text to be identified.

Embodiments of the present application specifically determine target content from first candidate content and second candidate content based on a model of a shallow neural network in natural language processing (Natural Language Processing, NLP) of an artificial intelligence AI.

Natural language processing (Natural Language Processing, NLP), i.e. computer science, artificial intelligence, linguistics focus on the field of interactions between computer and human (natural) language.

In this embodiment, the shallow neural network model is just the shallow neural network model in the NLP is processed by combining the natural language based on the artificial intelligence AI, so as to analyze the vector representations of the first candidate content and the second candidate content, so as to determine the target content in the text to be identified, thereby effectively reducing the time complexity brought by the complex deep network model.

In the embodiment, the content acquisition process based on full-flow automation is realized, the full-flow automation content acquisition process is divided into an accurate matching process and a fuzzy matching process, and the shallow neural network model is combined to determine the text content, so that the time consumption for acquiring the text content can be saved, the convenience for acquiring the text content is improved, the application performance of acquiring the text content is effectively improved, and the industrialized application effect of acquiring the text content is improved.

Fig. 3 is a flowchart of a content acquisition method combining RPA and AI according to another embodiment of the present application.

Referring to fig. 3, the step of determining target content from the first candidate content and the second candidate content based on the shallow neural network model in the artificial intelligence AI further includes:

S301: and respectively taking the first candidate content and the second candidate content as nodes, and connecting at least part of the nodes by adopting edges to construct a graph model, wherein each node corresponds to one accurate matching item or fuzzy matching item.

The graph model may be, for example, an undirected graph, or any other possible graph model, without limitation.

For example, referring to fig. 4, fig. 4 is a schematic diagram of a graph model according to an embodiment of the present application, and fig. 4 illustrates a graph model as an undirected graph, and referring to the above example, assume that text to be recognized is: advanced manager who holds 300,000 shares of own company (accounting for 0.0284% of total stock of own company) keeps own company not more than 75,000 shares of own company (accounting for 0.0071% of total stock of own company) in a centralized bidding manner within six months after fifteen transactions from this announcement, and confirms with the supervision; assuming that the exact match term, e.g., "[ @ v_identity ]", "[ minus hold ] plus hold ]", "[ @ r_number ]", in fig. 2, the first candidate content identified from the text to be identified according to the exact match term may be as follows:

the exact match [ @ r_number ] correspondingly matches 4 first candidate contents: 300,000 (4, 3), 0.0284 (14, 1), 75,000 (43,3), 0.0071 (53,1), wherein the number in brackets () representing the starting position and length of the corresponding first candidate content in the text to be recognized based on the segmentation, the number in brackets () can be used as a label for the node in the subsequent graph model.

Fig. 4 includes a plurality of nodes 40, each node 40 is used to represent a candidate content, in this embodiment, fig. 4 shows a schematic diagram of creating a graph model based on a first candidate content, where each node corresponds to a first candidate content, the first candidate content is used in a text to be identified as a label of a node based on a starting position and a length of a word, an arrangement manner of each node corresponds to an expression rule of an exact match term in an extraction model, that is, a plurality of first candidate contents in a first label 41 in fig. 4 are candidate contents obtained by matching based on the exact match term @ v_identity ], a plurality of first candidate contents in a second label 42 are candidate contents obtained by matching based on the exact match term @ r_number ], and a plurality of first candidate contents in a third label 43 are candidate contents obtained by matching based on the exact match term @ r_number, and an expression rule in the extraction model is: the exact match [ @ v_identity ], exact match [ @ R add ], exact match [ @ r_number ], then the candidate content in the first marker 41 may be arranged in the left portion of the second marker 42 and the candidate content in the third marker 43 may be arranged in the right portion of the second marker 42, which may be considered a process of mapping the first candidate content to the location of the corresponding exact match of the extraction model and acting as an anchor point.

Then, after the nodes shown in fig. 4 are established, the edge(s) may be established between the first candidate contents corresponding to the different exact matches, and trigger the execution of subsequent steps.

S302: and determining a scoring value corresponding to the edge based on the shallow neural network model.

The scoring value can be used as a weight value corresponding to an edge, and the higher the scoring value is, the more accurate the candidate content corresponding to the node connecting the edge is, and the scoring value can be used for determining the optimal path in the graph model later.

Of course, the neural network model is just one possible implementation manner of determining the score value corresponding to the edge, and in the actual implementation process, the determination of the score value corresponding to the edge may be implemented in any other possible manner, for example, the determination may also be implemented by using a traditional programming technology (such as an analog method and an engineering method), and for example, the determination may also be implemented by using a genetic algorithm and an artificial neural network method.

Optionally, referring to fig. 5, in some embodiments, the step of determining the score value corresponding to the edge based on the shallow neural network model further includes:

s501: dividing the text to be identified according to the position information of the first candidate content to obtain text fragments.

The location information therein may be, for example, the starting location in the label of the example node 40 of fig. 4 described above.

That is, first candidate contents corresponding to two adjacent exact matching items can be determined to obtain two first candidate contents, and then, according to the position information of the two first candidate contents, a text segment between the two position information is extracted from the text to be identified.

S502: and determining a target fuzzy matching item corresponding to the text fragment, wherein the target fuzzy matching item belongs to a plurality of fuzzy matching items.

Then, a fuzzy matching item corresponding to the text segment between the two position information can be determined and used as a target fuzzy matching item, wherein at least part of the content of the text segment can be obtained by means of dematching based on the target fuzzy matching item.

Alternatively, since the two pieces of position information have corresponding exact match items, one fuzzy match item between the two exact match items may also be regarded as the target fuzzy match item.

For example, referring to fig. 2 together, two exact matches are "[ reduction or increase ]" [ @ r_number ] ", and" [ reduction or increase ] "[ @ r_number ]" and one fuzzy match "< stock of the company no more >", the fuzzy match "< stock of the company no more >" may be regarded as the target fuzzy match, and based on the target fuzzy match "< stock of the company no more >", at least part of the content between the first candidate content matched based on "[ reduction or increase ]" and the first candidate content matched based on "[ @ r_number ]", which is the text segment obtained by the above division, may be matched.

For example, when the target fuzzy matching item "< stock of the company is not more than >", the text content described by the target fuzzy matching item "< stock of the company is not more than >" is: the stock of the company is not exceeded.

S503: and inputting text contents described by the text fragments and the target fuzzy matching items into a shallow neural network model, and taking an output value of the shallow neural network model as a scoring value corresponding to the edge.

According to the text content described by the text segment and the target fuzzy matching item, the text content described by the text segment and the target fuzzy matching item can be input into the shallow neural network model, and the output value of the shallow neural network model is used as the score value corresponding to the edge, namely, the embodiment of the application realizes pruning processing at the fuzzy matching layer, and the pruning is performed on the basis of accurate matching according to the first candidate content obtained by the accurate matching, and the fuzzy matching processing is performed on the text segment obtained by the pruning, so that the text content obtaining efficiency is further improved, the application performance of the extraction model is improved, and industrial application of text content obtaining is effectively assisted.

Alternatively, in some embodiments, the first word information and the first word information of the text segment may be acquired; acquiring second word information and second word information of text content; the first word information, the second word information and the second word information are input into a shallow neural network model together, and a similarity value between a text segment and text content is analyzed by adopting the shallow neural network model; and taking the similarity value output by the shallow neural network model as a scoring value.

The word information is, for example, a word name, a word sense, context information of a word, and the like, and the word information is, for example, a word name, a word sense, context information of a word, and the like, wherein the context information is used for describing semantic content of a word or context of a word in a text segment, the word information corresponding to the text segment may be referred to as first word information, the word information corresponding to the text content may be referred to as second word information, and the word information corresponding to the text content may be referred to as second word information.

According to the shallow neural network model, various sample word information and corresponding relation between sample similarity values corresponding to the sample word information and the sample word information can be learned, and therefore the similarity values output by the shallow neural network model can be directly obtained and used as grading values through inputting the first word information, the second word information and the second word information into the shallow neural network model.

In the embodiment of the application, the first word information, the second word information and the second word information are input into the shallow neural network model together, and vector representations of text fragments can be obtained based on the shallow neural network model, and vector representations of text contents are obtained, so that similarity between the text fragments and the text contents is obtained by calculating based on the vector representations of the first word information, the second word information and the second word information, and the similarity is not limited.

In order to avoid the influence of the overlong text segment on the recognition accuracy as much as possible, in the embodiment of the present application, a ratio value between the length of the text segment and the length of the text content may also be determined, and the score value output by the shallow neural network model is correspondingly adjusted according to the ratio value, for example, when the ratio value is too large, the score value is correspondingly reduced, which is not limited.

When the similarity value is higher, the text fragment and the text content are indicated to have higher similarity, and when the similarity value is lower, the similarity between the text fragment and the text content is indicated to be not high, therefore, the similarity value is adopted to mark the weight of the edge between the nodes, and when the weight is maximum, the candidate content corresponding to the node connected with the edge is indicated to be the matched target content, therefore, the word-matching-based advantage and the word-matching-based advantage can be combined by combining word information and word information to compare the similarity between the text fragment and the text content, the recognition accuracy can be effectively avoided by considering the word meaning and context information, and the recognition accuracy can be improved in the whole.

S303: and determining a target path from the graph model according to the grading value, and taking the first candidate content and the second candidate content corresponding to the nodes connected by the edges on the target path as target contents.

For example, referring to the above example and fig. 4 together, the process of searching the target path is exemplified by two exact matching terms "[ reduction or increase holding ]" [ @ r_number ] ", the extraction model may divide the text to be identified by the position information of the two nodes corresponding to the first candidate content to obtain a plurality of text fragments, and then determine, from the plurality of text fragments, a text fragment with the highest similarity to the text content (the stock of the company) described by" the remaining template nodes (target fuzzy matching nodes), "< stock of the company is not more than >", which is as follows:

the text segment divided by the nodes (37,1) and (4, 3) is not established, the edge is discarded, and the weight corresponding to the edge is 0; the text segment divided by the nodes (37,1) and (14, 1) is not established, the edge is discarded, and the weight corresponding to the edge is 0; the text segments divided for nodes (37,1) and (43,3) are "own company shares no more", highly similar to the text content (own company shares no more), and the similarity is recorded as the weight of the edge; the text fragments divided for the nodes (37,1) and (43,3) are "the company shares do not exceed 75,000 (which account for the company total share), the similarity between the text content (which does not exceed the company shares) is lower, the similarity is recorded as the weight of the edge, and the like, after the weights of all the edges on the undirected graph are obtained, a target path with the highest weight is obtained, and the first candidate content and the second candidate content corresponding to the nodes connected by the edges on the target path are used as target contents.

In the embodiment, pruning processing on the fuzzy matching layer is realized, pruning is carried out on the text to be identified according to the first candidate content obtained by accurate matching on the basis of the accurate matching, fuzzy matching processing is carried out on the text fragments obtained by pruning, and therefore the efficiency of text content acquisition is further improved, the application performance of an extraction model is improved, and industrial application of text content acquisition is effectively assisted. By combining word information and word information to compare the similarity between the text segment and the text content, the advantages based on word matching and the advantages based on word matching can be fused, word meaning and context information are considered, the influence of word segmentation on recognition accuracy can be effectively avoided, and recognition accuracy is improved as a whole.

Fig. 6 is a schematic structural diagram of a content acquisition apparatus combining RPA and AI according to an embodiment of the present application.

Referring to fig. 6, the apparatus 600 includes:

an obtaining module 601, configured to obtain a text to be identified by using a robot procedure automation RPA method;

the matching module 602 is configured to match a first candidate content and a second candidate content in a text to be identified by using an RPA method in combination with an extraction model, where the extraction model includes: the method comprises the steps of obtaining a precise matching item and a fuzzy matching item, wherein a first candidate content is obtained based on the precise matching item, and a second candidate content is obtained based on the fuzzy matching item;

A determining module 603, configured to determine the target content from the first candidate content and the second candidate content based on a shallow neural network model in natural language processing NLP of the artificial intelligence AI.

Optionally, in some embodiments, the obtaining module 601 is specifically configured to:

receiving a text to be identified uploaded by a user through a text uploading interface; and/or the number of the groups of groups,

and receiving audio data input by a user through a voice input interface, analyzing semantic content in the audio data, and performing text conversion on the semantic content to obtain a text to be identified.

Optionally, in some embodiments, the number of exact matches in the extraction model is a plurality, the storage structure of the plurality of exact matches is a double-array tree, and the double-array tree is constructed according to the expression rule between the plurality of exact matches, where the matching module 602 is specifically configured to:

and matching the first candidate content in the text to be identified by adopting an RPA method and combining the double-array tree.

Optionally, in some embodiments, the matching module 602 is further configured to:

judging whether first candidate contents corresponding to each accurate matching item can be obtained by matching;

if yes, the RPA method is combined with the fuzzy matching item, and the second candidate content in the text to be identified is matched.

Optionally, in some embodiments, referring to fig. 7, the determining module 603 includes:

a building unit 6031, configured to use the first candidate content and the second candidate content as nodes, and connect at least some nodes with edges, so as to build a graph model, where each node corresponds to an exact match or a fuzzy match;

a determining unit 6032, configured to determine a score value corresponding to the edge based on the shallow neural network model;

and an acquisition unit 6033 for determining a target path from the graph model according to the score value, and taking the first candidate content and the second candidate content corresponding to the nodes connected by the edges on the target path as target contents.

Optionally, in some embodiments, the determining unit 6032 is specifically configured to:

dividing the text to be identified according to the position information of the first candidate content to obtain a text segment;

determining a target fuzzy matching item corresponding to the text fragment, wherein the target fuzzy matching item belongs to a plurality of fuzzy matching items;

and inputting text contents described by the text fragments and the target fuzzy matching items into a shallow neural network model, and taking an output value of the shallow neural network model as a scoring value corresponding to the edge.

Optionally, in some embodiments, the determining unit 6032 is further configured to:

Acquiring first word information and first word information of a text fragment;

acquiring second word information and second word information of text content;

the first word information, the second word information and the second word information are input into a shallow neural network model together, and a similarity value between a text segment and text content is analyzed by adopting the shallow neural network model; and

and taking the similarity value output by the shallow neural network model as a scoring value.

Optionally, in some embodiments, referring to fig. 7, the determining module 603 further includes:

an adjusting unit 6034 for acquiring the length of the text segment, acquiring the length of the text content, determining a ratio value between the length of the text segment and the length of the text content, and adjusting the scoring value according to the ratio value.

The content acquiring device combining RPA and AI provided in the embodiment of the present application may be used to execute the above method embodiment, and its implementation principle and technical effects are similar, and this embodiment is not repeated here.

Fig. 8 is a schematic hardware structure of a computer device according to an embodiment of the present application. As shown in fig. 8, the computer device 80 provided in this embodiment includes: at least one processor 801 and a memory 802. The computer device 80 further comprises a communication component 803. The processor 801, the memory 802, and the communication section 803 are connected via a bus 804.

In a specific implementation, the at least one processor 801 executes computer-executable instructions stored in the memory 802, such that the at least one processor 801 performs the content acquisition method as described above in connection with RPA and AI.

The specific implementation process of the processor 801 may refer to the above-mentioned method embodiment, and its implementation principle and technical effects are similar, and this embodiment will not be described herein again.

In the embodiment shown in fig. 8, it should be understood that the processor may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be other general purpose processors, digital signal processors (english: digital Signal Processor, abbreviated as DSP), application specific integrated circuits (english: application Specific Integrated Circuit, abbreviated as ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.

The memory may comprise high speed RAM memory or may further comprise non-volatile storage NVM, such as at least one disk memory.

The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the buses in the drawings of the present application are not limited to only one bus or one type of bus.

The present application also provides a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, implement the content acquisition method as described above in combination with RPA and AI.

The above-described readable storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. A readable storage medium can be any available medium that can be accessed by a general purpose or special purpose computer.

An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. In the alternative, the readable storage medium may be integral to the processor. The processor and the readable storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short). The processor and the readable storage medium may reside as discrete components in a device.

It should be noted that in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims

1. A content acquisition method combining RPA and AI, the method comprising:

acquiring a text to be identified by adopting a robot process automation RPA method;

Matching a first candidate content and a second candidate content in the text to be identified by adopting the RPA method and combining an extraction model, wherein the extraction model comprises the following components: the first candidate content is obtained based on the matching of the precise matching item, and the second candidate content is obtained based on the matching of the fuzzy matching item;

determining target content from the first candidate content and the second candidate content based on a shallow neural network model among natural language processing (Natural Language Processing, NLP) of the artificial intelligence AI;

the determining, based on the shallow neural network model in the artificial intelligence AI, target content from the first candidate content and the second candidate content includes:

respectively taking the first candidate content and the second candidate content as nodes, and connecting at least part of the nodes by adopting edges to construct a graph model, wherein each node corresponds to one accurate matching item or fuzzy matching item;

determining a scoring value corresponding to the edge based on the shallow neural network model;

determining a target path from the graph model according to the grading value, and taking the first candidate content and the second candidate content corresponding to nodes connected with each side on the target path as the target content;

The determining the scoring value corresponding to the edge based on the shallow neural network model comprises the following steps:

dividing the text to be identified according to the position information of the first candidate content to obtain text fragments;

determining a target fuzzy matching item corresponding to the text segment, wherein the target fuzzy matching item belongs to a plurality of fuzzy matching items;

inputting the text content described by the text segment and the target fuzzy matching item into the shallow neural network model, and taking the output value of the shallow neural network model as the corresponding grading value of the edge;

inputting the text content described by the text segment and the target fuzzy matching item into the shallow neural network model, and taking the output value of the shallow neural network model as the scoring value corresponding to the edge, wherein the method comprises the following steps:

acquiring first word information and first word information of the text segment;

acquiring second word information and second word information of the text content;

the first word information, the second word information and the second word information are input into the shallow neural network model together, and the similarity value between the text segment and the text content is analyzed by adopting the shallow neural network model; and

And taking the similarity value output by the shallow neural network model as the scoring value.

2. The method of claim 1, wherein the acquiring text to be recognized using a robotic process automated RPA method comprises:

receiving audio data input by a user through a voice input interface, analyzing semantic content in the audio data, and performing text conversion on the semantic content to obtain the text to be identified.

3. The method of claim 1, wherein the number of exact matches in the extraction model is a plurality, the storage structure of the exact matches is a double-tuple tree, the double-tuple tree is constructed according to expression rules among the exact matches, and the matching the first candidate content in the text to be identified by using the RPA method in combination with the extraction model comprises:

and matching the first candidate content in the text to be identified by adopting the RPA method and combining the double-array tree.

4. A method according to claim 1 or 3, wherein said matching the first candidate content and the second candidate content among the text to be identified using the RPA method in combination with a decimation model, comprises:

and if so, adopting the RPA method to combine the fuzzy matching item and match the second candidate content in the text to be identified.

5. The method as recited in claim 1, further comprising:

acquiring the length of the text segment and acquiring the length of the text content;

determining a ratio value between the length of the text segment and the length of the text content;

and adjusting the grading value according to the proportion value.

6. A content acquisition apparatus combining RPA and AI, the apparatus comprising:

the acquisition module is used for acquiring a text to be identified by adopting a Robot Process Automation (RPA) method;

the matching module is used for matching the first candidate content and the second candidate content in the text to be identified by adopting the RPA method and combining with a extraction model, and the extraction model comprises the following components: the first candidate content is obtained based on the matching of the precise matching item, and the second candidate content is obtained based on the matching of the fuzzy matching item;

a determining module, configured to determine target content from the first candidate content and the second candidate content based on a shallow neural network model in natural language processing NLP of artificial intelligence AI;

The determining module includes:

the construction unit is used for respectively taking the first candidate content and the second candidate content as nodes, adopting edges to connect at least part of the nodes so as to construct a graph model, and each node corresponds to one accurate matching item or fuzzy matching item;

the determining unit is used for determining the scoring value corresponding to the edge based on the shallow neural network model;

the acquisition unit is used for determining a target path from the graph model according to the grading value, and taking the first candidate content and the second candidate content which correspond to nodes connected by each side on the target path as the target content;

the determining unit is specifically configured to:

The determining unit is further configured to:

7. The apparatus of claim 6, wherein the acquisition module is specifically configured to:

8. The apparatus of claim 6, wherein the number of exact matches in the extraction model is a plurality, the storage structure of the exact matches is a double-array tree, and the double-array tree is constructed according to expression rules among the exact matches, wherein the matching module is specifically configured to:

9. The apparatus of claim 6 or 8, wherein the matching module is further to:

10. The apparatus of claim 6, wherein the determination module further comprises:

and the adjusting unit is used for acquiring the length of the text segment, acquiring the length of the text content, determining a ratio value between the length of the text segment and the length of the text content, and adjusting the ratio value according to the ratio value.

11. A computer device, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing computer-executable instructions stored in the memory causes the at least one processor to perform the combined RPA and AI content retrieval method of any of claims 1-5.

12. A computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, implement the content acquisition method in combination with RPA and AI of any one of claims 1-5.