CN114154461A - Text data processing method, device and system - Google Patents

Text data processing method, device and system Download PDF

Info

Publication number
CN114154461A
CN114154461A CN202010933970.2A CN202010933970A CN114154461A CN 114154461 A CN114154461 A CN 114154461A CN 202010933970 A CN202010933970 A CN 202010933970A CN 114154461 A CN114154461 A CN 114154461A
Authority
CN
China
Prior art keywords
text
target
processing
operator
text data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010933970.2A
Other languages
Chinese (zh)
Inventor
陶冶
陈伟
周安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN202010933970.2A priority Critical patent/CN114154461A/en
Priority to PCT/CN2021/117271 priority patent/WO2022052959A1/en
Publication of CN114154461A publication Critical patent/CN114154461A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a text data processing method, a device and a system, wherein the method comprises the following steps: adding a text processing operator for processing text data in an operator set of a machine learning platform, wherein the text processing operator comprises: an abstract extraction operator and/or a text similarity analysis operator; and processing text data based on the text processing operator in the machine learning platform.

Description

Text data processing method, device and system
Technical Field
The present invention relates to the field of text data processing technologies, and more particularly, to a text data processing method, a text data processing apparatus, a system including at least one computing apparatus and at least one storage apparatus, and a readable storage medium.
Background
With the development of artificial intelligence, the value of data is continuously highlighted, and the requirements of automatically outputting abstracts of articles and comparing the similarity of two articles by text data through a natural language processing technology are more and more common. These requirements mainly include training, prediction, online, etc. of the corresponding text processing model.
However, training, prediction, and online of the text processing model need to be performed by professionals involved in natural language processing. People who lack the experience related to natural language processing can hardly complete the landing of related capabilities in a business scene. Therefore, the level of the processing personnel greatly limits the falling of the article abstract and the text similarity capability in the business scene, so that a large amount of valuable text data is wasted and cannot be utilized.
Disclosure of Invention
An object of the present disclosure is to provide a new technical solution for processing text data.
According to a first aspect of the present disclosure, there is provided a text data processing method, including:
adding a text processing operator for processing text data in an operator set of a machine learning platform, wherein the text processing operator comprises: an abstract extraction operator and/or a text similarity analysis operator;
and processing text data based on the text processing operator in the machine learning platform.
Optionally, the performing text data processing based on the text processing operator in the machine learning platform includes:
providing a user configuration interface, wherein the user configuration interface comprises an operator display area and a canvas area used for creating a data processing flow chart; the operator display area displays the text processing operator for processing the text data;
acquiring uploaded target text data;
in response to an operation of creating a target data processing flow diagram, creating the target data processing flow diagram in the canvas area according to the target text data and the text processing operator;
and responding to the operation of operating the target data processing flow chart, and operating the text processing operator according to the target data processing flow chart to process the target text data to obtain a text processing result.
Optionally, the creating, in response to the operation of creating the target data processing flowchart, the target data processing flowchart in the canvas area according to the target text data and the text processing operator includes:
in response to an operation of selecting the target text data, presenting the target text data in the canvas area;
in response to an operation of selecting the text processing operator in the operator presentation region, presenting the text processing operator in the canvas region;
and responding to the operation of connecting the target text data and the text processing operator, and connecting the target text data and the text processing operator in the canvas area to obtain the target data processing flow chart.
Optionally, the method further includes:
responding to a request for resource configuration of the text processing operator, and providing a first configuration interface of the text processing operator;
and acquiring resource configuration parameters of the text processing operator through the first configuration interface so as to operate the text processing operator according to the resource configuration parameters.
Optionally, the acquiring the uploaded target text data includes:
providing an entry for uploading data;
and responding to the operation of uploading the target text data, and acquiring the target text data uploaded through the entrance.
Optionally, the text processing operator is a summarization extraction operator for performing summarization extraction processing on the text data;
the step of operating the text processing operator according to the target data processing flow chart to process the target text data to obtain a text processing result comprises the following steps:
operating the abstract extraction operator to perform the following processing on the target text data:
sentence breaking processing is carried out on target line text data in the target text data to obtain a plurality of sentences; the target line text data is text data of any line in the target text data;
determining the similarity between every two sentences, and regularizing the similarity between every two sentences to obtain a similarity matrix;
obtaining scores of other sentences according to the similarity matrix and preset scores of the specified sentences; wherein the other sentences are sentences except the specified sentences in the plurality of sentences;
selecting a set number of sentences from the plurality of sentences as abstract sentences according to the scores;
and generating a text abstract corresponding to the target line text data as the text processing result according to the abstract statement.
Optionally, the determining the similarity between each two sentences includes:
coding each statement to obtain a statement vector of each statement;
for every two sentences, a cosine value between the corresponding sentence vectors is determined as a similarity between the corresponding two sentences.
Optionally, the generating a text abstract corresponding to the target line text data according to the abstract statement includes, as the text processing result:
acquiring the sequence of each abstract statement in the target line text data;
and sequencing the abstract sentences according to the sequence, and adding preset punctuations behind each abstract sentence to obtain a text abstract corresponding to the target line text data as the text processing result.
Optionally, the text processing operator is a text similarity analysis operator for performing similarity analysis processing on the text data;
the step of operating the text processing operator according to the target data processing flow chart to process the target file data to obtain a text processing result comprises the following steps:
operating the text similarity analysis operator to perform the following processing on the target text data:
respectively encoding first text data and second text data which are positioned on the same line in the target text data to obtain a first vector of the first text data and a second vector of the second text data; the first text data is located in a first target column in the target text data, and the second text data is located in a second target column in the target text data;
and determining the prediction similarity between the first text data and the second text data according to the first vector and the second vector as the text processing result.
Optionally, the determining the prediction similarity between the first text data and the second text data according to the first vector and the second vector includes:
determining cosine similarity of the first vector and the second vector;
determining a maximum of squares of the first vector and the second vector;
determining an absolute value of a difference of the first vector and the second vector;
determining a dot product of the first vector and the second vector;
splicing the cosine similarity, the maximum value of the square, the absolute value of the difference, the dot product, the first vector and the second vector to obtain target splicing vectors of the first text data and the second text data;
acquiring a mapping function between the splicing vector and the similarity;
and determining the prediction similarity between the first text data and the second text data according to the mapping function and the target splicing vector.
Optionally, the method further includes:
responding to a request for configuring a processing object of the text similarity analysis operator, and providing a second configuration interface of the text similarity analysis operator;
and acquiring the first target column and the second target column through the second configuration interface, so that the text similarity analysis operator performs similarity analysis processing on the text data of the first target column and the text data of the second target column in the target text data.
Optionally, the method further includes:
and displaying the text processing result.
Optionally, the performing text data processing based on the text processing operator in the machine learning platform includes:
in response to a request for enabling the text processing operator to be online, packaging the text processing operator to obtain a target pre-estimation service for a target user;
and running the target pre-estimation service, processing the text data provided to the target pre-estimation service by the target user, and returning a corresponding text processing result to the target user.
Optionally, the step of packaging the text processing operator in response to the request for bringing the text processing operator online to obtain the target pre-estimation service includes:
responding to a request for creating pre-estimation service, and providing at least one pre-estimation module for a user to select;
acquiring a target estimation module selected by a user and providing a model selection interface;
acquiring the text processing operator selected by the user through the model selection interface;
and packaging the text processing operator according to the target pre-estimation module to obtain the target pre-estimation service.
Optionally, the method further includes:
responding to a request for configuring the target pre-estimation service, and providing a third configuration interface;
and acquiring configuration information of the target pre-estimation service through the third configuration interface so as to operate the target pre-estimation service according to the configuration information.
Optionally, the configuration information includes performing one of: the used GPU resource, the used CPU resource, the used memory resource and the address of the image called in operation.
Optionally, the method further includes:
and responding to the request for viewing the state of the target pre-estimation service, and displaying the state of the target pre-estimation service.
Optionally, the state of the target pre-estimation service includes at least one of: the method comprises the following steps of occupying resources of the target pre-estimation service in the running process, the online history of the target pre-estimation service, the version number of the target pre-estimation service, the deployment time of the target pre-estimation service and the running time of the target pre-estimation service.
According to a second aspect of the present disclosure, there is provided a processing apparatus of text data, comprising:
an operator adding module, configured to add a text processing operator for processing text data in an operator set of a machine learning platform, where the text processing operator includes: an abstract extraction operator and/or a text similarity analysis operator;
and the text processing module is used for processing text data based on the text processing operator in the machine learning platform.
Optionally, the text processing module is further configured to:
providing a user configuration interface, wherein the user configuration interface comprises an operator display area and a canvas area used for creating a data processing flow chart; the operator display area displays the text processing operator for processing the text data;
acquiring uploaded target text data;
in response to an operation of creating a target data processing flow diagram, creating the target data processing flow diagram in the canvas area according to the target text data and the text processing operator;
and responding to the operation of operating the target data processing flow chart, and operating the text processing operator according to the target data processing flow chart to process the target text data to obtain a text processing result.
Optionally, the creating, in response to the operation of creating the target data processing flowchart, the target data processing flowchart in the canvas area according to the target text data and the text processing operator includes:
in response to an operation of selecting the target text data, presenting the target text data in the canvas area;
in response to an operation of selecting the text processing operator in the operator presentation region, presenting the text processing operator in the canvas region;
and responding to the operation of connecting the target text data and the text processing operator, and connecting the target text data and the text processing operator in the canvas area to obtain the target data processing flow chart.
Optionally, the method further includes:
means for providing a first configuration interface for the text processing operator in response to a request for resource configuration of the text processing operator;
and the module is used for acquiring the resource configuration parameters of the text processing operator through the first configuration interface so as to operate the text processing operator according to the resource configuration parameters.
Optionally, the acquiring the uploaded target text data includes:
providing an entry for uploading data;
and responding to the operation of uploading the target text data, and acquiring the target text data uploaded through the entrance.
Optionally, the text processing operator is a summarization extraction operator for performing summarization extraction processing on the text data;
the step of operating the text processing operator according to the target data processing flow chart to process the target text data to obtain a text processing result comprises the following steps:
operating the abstract extraction operator to perform the following processing on the target text data:
sentence breaking processing is carried out on target line text data in the target text data to obtain a plurality of sentences; the target line text data is text data of any line in the target text data;
determining the similarity between every two sentences, and regularizing the similarity between every two sentences to obtain a similarity matrix;
obtaining scores of other sentences according to the similarity matrix and preset scores of the specified sentences; wherein the other sentences are sentences except the specified sentences in the plurality of sentences;
selecting a set number of sentences from the plurality of sentences as abstract sentences according to the scores;
and generating a text abstract corresponding to the target line text data as the text processing result according to the abstract statement.
Optionally, the determining the similarity between each two sentences includes:
coding each statement to obtain a statement vector of each statement;
for every two sentences, a cosine value between the corresponding sentence vectors is determined as a similarity between the corresponding two sentences.
Optionally, the generating a text abstract corresponding to the target line text data according to the abstract statement includes, as the text processing result:
acquiring the sequence of each abstract statement in the target line text data;
and sequencing the abstract sentences according to the sequence, and adding preset punctuations behind each abstract sentence to obtain a text abstract corresponding to the target line text data as the text processing result.
Optionally, the text processing operator is a text similarity analysis operator for performing similarity analysis processing on the text data;
the step of operating the text processing operator according to the target data processing flow chart to process the target file data to obtain a text processing result comprises the following steps:
operating the text similarity analysis operator to perform the following processing on the target text data:
respectively encoding first text data and second text data which are positioned on the same line in the target text data to obtain a first vector of the first text data and a second vector of the second text data; the first text data is located in a first target column in the target text data, and the second text data is located in a second target column in the target text data;
and determining the prediction similarity between the first text data and the second text data according to the first vector and the second vector as the text processing result.
Optionally, the determining the prediction similarity between the first text data and the second text data according to the first vector and the second vector includes:
determining cosine similarity of the first vector and the second vector;
determining a maximum of squares of the first vector and the second vector;
determining an absolute value of a difference of the first vector and the second vector;
determining a dot product of the first vector and the second vector;
splicing the cosine similarity, the maximum value of the square, the absolute value of the difference, the dot product, the first vector and the second vector to obtain target splicing vectors of the first text data and the second text data;
acquiring a mapping function between the splicing vector and the similarity;
and determining the prediction similarity between the first text data and the second text data according to the mapping function and the target splicing vector.
Optionally, the method further includes:
a module for providing a second configuration interface for the text similarity analysis operator in response to a request for configuration of a processing object for the text similarity analysis operator;
and a module for acquiring the first target column and the second target column through the second configuration interface, so that the text similarity analysis operator performs similarity analysis processing on the text data of the first target column and the text data of the second target column in the target text data.
Optionally, the method further includes:
and the module is used for displaying the text processing result.
Optionally, the text processing module is further configured to:
in response to a request for enabling the text processing operator to be online, packaging the text processing operator to obtain a target pre-estimation service for a target user;
and running the target pre-estimation service, processing the text data provided to the target pre-estimation service by the target user, and returning a corresponding text processing result to the target user.
Optionally, the step of packaging the text processing operator in response to the request for bringing the text processing operator online to obtain the target pre-estimation service includes:
responding to a request for creating pre-estimation service, and providing at least one pre-estimation module for a user to select;
acquiring a target estimation module selected by a user and providing a model selection interface;
acquiring the text processing operator selected by the user through the model selection interface;
and packaging the text processing operator according to the target pre-estimation module to obtain the target pre-estimation service.
Optionally, the method further includes:
means for providing a third configuration interface in response to a request to configure the target pre-estimated service;
and the module is used for acquiring the configuration information of the target pre-estimation service through the third configuration interface so as to operate the target pre-estimation service according to the configuration information.
Optionally, the configuration information includes performing one of: the used GPU resource, the used CPU resource, the used memory resource and the address of the image called in operation.
Optionally, the method further includes:
and the module is used for responding to the request for viewing the state of the target pre-estimation service and displaying the state of the target pre-estimation service.
Optionally, the state of the target pre-estimation service includes at least one of: the method comprises the following steps of occupying resources of the target pre-estimation service in the running process, the online history of the target pre-estimation service, the version number of the target pre-estimation service, the deployment time of the target pre-estimation service and the running time of the target pre-estimation service.
According to a third aspect of the present disclosure, there is provided a system comprising at least one computing device and at least one storage device, wherein the at least one storage device is configured to store instructions for controlling the at least one computing device to perform the method according to the first aspect of the present disclosure.
According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method according to the first aspect of the present disclosure.
According to the data processing method, the text processing operator is added in the operator set of the machine learning platform in advance, the text processing operator can be directly called in the machine learning platform to perform corresponding processing on the text data, and a corresponding text processing flow does not need to be reconstructed when the text data is processed. The user can complete the processing of the text data without professional natural voice processing related knowledge and scene related experience, the user can use the text data after opening the box, and the use threshold is reduced.
Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a block diagram of one example of a hardware configuration of an electronic device that may be used to implement embodiments of the present disclosure.
FIG. 2 is a flow chart diagram of a method of processing text data according to an embodiment of the disclosure;
FIG. 3 is a flow chart diagram of one example of a method of processing text data according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of uploading target text data, according to an embodiment of the present disclosure;
fig. 5 is a flowchart illustrating another example of a processing method of text data according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a target data processing flow diagram according to an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a selection interface for a pre-estimated service according to an embodiment of the disclosure;
FIG. 8 is a schematic diagram of a selection model interface according to an embodiment of the present disclosure;
FIG. 9 is a block schematic diagram of a text data processing apparatus according to an embodiment of the present disclosure;
fig. 10 is a block schematic diagram of a system according to an embodiment of the present disclosure.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Various embodiments and examples according to embodiments of the present invention are described below with reference to the accompanying drawings.
< hardware configuration >
Fig. 1 is a block diagram showing a hardware configuration of an electronic apparatus 1000 that can implement an embodiment of the present disclosure.
The electronic device 1000 may be a laptop, desktop, cell phone, tablet, etc. As shown in fig. 1, the electronic device 1000 may include a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, an input device 1600, a speaker 1700, a microphone 1800, and the like. The processor 1100 may be a central processing unit CPU, a microprocessor MCU, or the like. The memory 1200 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1300 includes, for example, a USB interface, a headphone interface, and the like. The communication device 1400 is capable of wired or wireless communication, for example, and may specifically include Wifi communication, bluetooth communication, 2G/3G/4G/5G communication, and the like. The display device 1500 is, for example, a liquid crystal display panel, a touch panel, or the like. The input device 1600 may include, for example, a touch screen, a keyboard, a somatosensory input, and the like. A user can input/output voice information through the speaker 1700 and the microphone 1800.
The electronic device shown in fig. 1 is merely illustrative and is in no way meant to limit the disclosure, its application, or uses. In an embodiment of the present disclosure, the memory 1200 of the electronic device 1000 is used for storing instructions for controlling the processor 1100 to operate so as to execute any one of the methods provided by the embodiment of the present disclosure. It will be understood by those skilled in the art that although a plurality of means are shown for the electronic device 1000 in fig. 1, the present disclosure may only relate to a part of the means therein, e.g. the electronic device 1000 only relates to the processor 1100 and the storage means 1200. The skilled person can design the instructions according to the disclosed solution of the present disclosure. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.
< method examples >
< example one >
In the present embodiment, a method for processing text data is provided. The processing method of the text data may be implemented by an electronic device. The electronic device may be the electronic device 1000 as shown in fig. 1.
As shown in fig. 2, the text data processing method of the present embodiment may include the following steps S2100 to S2200:
step S2100, adding a text processing operator for processing text data in an operator set of the machine learning platform.
In one embodiment of the present disclosure, the text processing operator may include a summarization operator and/or a text similarity analysis operator.
In an embodiment where the text processing operator includes a summarization operator, the summarization operator may be used to perform summarization processing on the text data to obtain a summary of the text data.
In an embodiment where the text processing operator includes a text similarity analysis operator, the text similarity analysis operator may be configured to perform similarity analysis processing on the first text data and the second text data to obtain a similarity between the first text data and the second text data.
The operators provided in the operator set of the machine learning platform in the embodiment can be directly called by the machine learning platform,
and step S2200, processing the text data based on the text processing operator in the machine learning platform.
According to the data processing method, the text processing operator is added in the operator set of the machine learning platform in advance, the text processing operator can be directly called in the machine learning platform to perform corresponding processing on the text data, and a corresponding text processing flow does not need to be reconstructed when the text data is processed. The user can complete the processing of the text data without professional natural voice processing related knowledge and scene related experience, the user can use the text data after opening the box, and the use threshold is reduced.
< example two >
On the basis of the first embodiment, the present embodiment provides a method for performing text data processing based on a text processing operator in a machine learning platform, which may specifically include steps S3100 to S3400 shown in fig. 3:
step S3100, providing a user configuration interface, wherein the user configuration interface comprises an operator display area and a canvas area.
The operator display area of this embodiment displays text processing operators for processing text data, including a summarization extraction operator and/or a text similarity analysis operator. The canvas area of the present embodiment may be used to create a data processing flow diagram according to the operator exposed by the operator exposure area.
The data processing flow diagrams may be used to represent corresponding data processing flows. In one embodiment, the data processing flow diagram is a directed acyclic graph. A directed acyclic graph, called DAG graph for short, refers to a directed graph without loops.
Step S3200, acquiring the uploaded target text data.
The target text data in this embodiment may be uploaded to the machine learning platform in advance, and when step S3200 is executed, the target text data in the machine learning platform may be directly called.
In an embodiment of the present disclosure, acquiring the uploaded target text data may further include:
providing an entry for uploading data; and responding to the operation of uploading the target text data, and acquiring the target text data uploaded through the entrance.
For example, as shown in fig. 4, uploading modes of target text data can be provided, including local uploading (supporting csv, tsv, txt, partial, and orc formats), importing from FTP (supporting csv, tsv, txt formats), importing from HDFS (supporting csv, tsv, txt, partial, and orc formats), shallow copying from HDFS (supporting partial formats), importing from databases (supporting databases such as Oracle, MySQL, Teradata, and verticals), and importing from Hive (supporting Simple Auth or Kerberos authentication). The user can select an uploading mode according to the self requirement, and provides data which can be uploaded to the machine learning platform by clicking an uploading button, namely an entrance for uploading data, and the user selects target text data which needs to be uploaded according to the self requirement and triggers the operation of uploading the target text data.
And step S3300, in response to the operation of creating the target data processing flow chart, creating the target data processing flow chart in the canvas area according to the target text data and the text processing operator.
In one embodiment of the present disclosure, in response to the operation of creating the target data processing flowchart, the creating of the target data processing flowchart in the canvas area according to the target text data and the text processing operator may include steps S3310 to S3330 as follows:
in step S3310, in response to the operation of selecting the target text data, the target text data is displayed in the canvas area.
In an embodiment of the present disclosure, the operation of selecting the target text data may be the aforementioned operation of uploading the target text data. By uploading the target text data, the target text data can be directly presented in the canvas area.
In another embodiment of the present disclosure, the user configuration interface may further include a data display area, and in a case where the user uploads the target text data to the machine learning platform in advance, the data display area may display the target text data.
On the basis of this embodiment, the operation of selecting the target text data may be an operation of dragging the target text data in the data display area to the canvas area. The user drags the target text data in the data display area to the canvas area, and the electronic device can be triggered to display the dragged target text data in the canvas area.
On the basis of the present embodiment, the operation of selecting the target text data may be a selection operation performed on the target text data in the data display area. The selection operation may include a double-click operation, a single-click operation, or a right-click and click operation of a selection button. The user may present the target text data in the canvas area by performing a selection operation with respect to the target text data in the data presentation area.
Step S3320, in response to the operation of selecting the text processing operator in the operator display area, displaying the text processing operator in the canvas area.
In an embodiment of the present disclosure, the operation of selecting the text processing operator in the operator display area may be an operation of dragging the text processing operator in the operator display area to the canvas area. The user drags the text processing operator in the operator display area to the canvas area, and the electronic device can be triggered to display the dragged text processing operator in the canvas area.
In another embodiment of the present disclosure, the operation of selecting the text processing operator in the operator display area may also be to perform a selection operation on the text processing operator in the operator display area. The selection operation may include a double-click operation, a single-click operation, or a right-click and click operation of a selection button. The user can present the text processing operator in the canvas area by performing a selection operation for the text processing operator in the operator presentation area.
Step S3330, responding to the operation of connecting the target text data and the text processing operator, and connecting the target text data and the text processing operator in the canvas area to obtain a target data processing flow chart.
In one embodiment of the present disclosure, in the canvas area, the target text data has downstream connection points for representing the output of the data; the text processing operator may have at least an upstream connection point for representing an input of data.
The user may trigger an operation of connecting the target text data and the text processing operator in the canvas area by clicking a downstream connection point of the target text data and an upstream connection point of the text processing operator, respectively, and the electronic device connects the target text data and the text processing operator displayed in the canvas area in response to the operation.
In another embodiment of the present disclosure, it may be that the user places a text processing operator in the canvas area below the target text data, triggering an operation of connecting the target text data and the text processing operator, in response to which the electronic device automatically connects a downstream connection point of the target text data and an upstream connection point of the text processing operator presented by the canvas area.
In one embodiment of the present disclosure, target text data and text processing operators are connected in a canvas area, and the resulting target data processing flow diagram may be as shown in fig. 6.
In one embodiment of the present disclosure, the method may further include:
responding to a request for resource configuration of a text processing operator, and providing a first configuration interface of the text processing operator; and acquiring resource configuration parameters of the text processing operator through the first configuration interface so as to operate the text processing operator according to the resource configuration parameters.
The resource configuration parameters in this embodiment may include CPU utilization, GPU utilization, memory usage, and the like.
And step S3400, responding to the operation of running the target data processing flow chart, and running a text processing operator according to the target data processing flow chart to process the target text data to obtain a text processing result.
In an embodiment where the text processing operator is a digest extraction operator, in response to an operation of operating the target data processing flowchart, the text processing operator is operated according to the target data processing flowchart to process the target text data, and obtaining a text processing result may include: the digest extraction operator is operated to perform the following processing steps S3410 to S3450 on the target text data:
step S3410 performs sentence segmentation processing on the target line text data in the target text data to obtain a plurality of sentences.
The target line text data is text data of any line in the target text data.
In one embodiment of the present disclosure, the target text data may be structured data, and each line of data may represent a complete article. Specifically, each line of data of the target text data may be traversed as the target line text data.
The sentence-breaking processing may be performed on the target line text data according to preset punctuations, and a part between any two adjacent preset punctuations is used as a sentence.
Step S3420, determining a similarity between each two sentences, and regularizing the similarity between each two sentences to obtain a similarity matrix.
In one embodiment of the present disclosure, determining the similarity between each two sentences may include steps S3421 to S3422 as follows:
step S3421, encode each sentence to obtain a sentence vector of each sentence.
In an embodiment of the present disclosure, each statement may be encoded through an embedding model, so as to obtain a statement vector of the corresponding statement.
In step S3422, for each two sentences, cosine values between the corresponding sentence vectors are determined as similarities between the corresponding two sentences.
For example, for the 1 st statement and the 2 nd statement, the statement vector of the 1 st statement may be represented as a, and the statement vector of the 2 nd statement may be represented as B, then the similarity S between the 1 st statement and the 2 nd statement1,2Can be expressed as:
Figure BDA0002671253150000121
in the case of obtaining the similarity between every two sentences, the similarity matrix may be obtained according to the similarity between every two sentences.
Specifically, in the similarity matrix, an element corresponding to the ith row and the jth column may be a similarity between the ith sentence and the jth sentence, where i, j ∈ [1, N ], and N is a number of multiple sentences obtained by sentence-breaking processing on the target row sentence.
Step S3430, obtaining the scores of other sentences according to the similarity matrix and the preset score of the specified sentence.
Wherein the other sentences are sentences other than the specified sentences in the plurality of sentences.
In an embodiment of the present disclosure, the specified statement may be an nth statement set in advance according to an application scenario or a specific requirement, where N is ∈ [1, N ], and N is a number of multiple statements obtained by performing sentence-breaking processing on the target line statement. That is, n is set by the user in advance according to the application scenario or specific requirements. For example, n may be 1, and then the 1 st statement is the specified statement.
The preset score of the specified sentence may be preset according to an application scenario or a specific requirement. For example, the preset score for a given sentence may be 1.
In general, the first sentence of an article, that is, the 1 st sentence is a more important sentence, and therefore, the 1 st sentence may be a specified sentence.
In an embodiment of the present disclosure, according to the similarity matrix and a preset score of the nth sentence (the designated sentence), a score f (k) of the kth sentence is obtained and may be represented as:
f(k)=α*S*f(k-1)+(1-α)*y
wherein k is an integer except N in [1, N ], S is a similarity matrix, alpha is a preset parameter which is larger than 0 and smaller than 1, and f (k-1) is the fraction of the k-1 statement.
y is a vector for representing a specified sentence, the value of the nth element in the vector is a first set value, and the values of the other elements are second set values. The first setting value and the second setting value are different values set according to an application scenario or a specific requirement, for example, the first setting value may be 1, and the second setting value may be 0.
For example, in the case where the 1 st sentence is a specified sentence, y may be represented as [1,0, … 0,0 ]. In the case where the 2 nd sentence is a specified sentence, y can be represented as [0,1, … 0,0 ]. In the case where the nth sentence is the specified sentence, y may be represented as [0,0, … 0, N ].
In step S3440, a set number of sentences are selected from the plurality of sentences as summary sentences based on the scores.
In one embodiment of the present disclosure, a set number of sentences with the highest score may be selected as the summary sentences. Specifically, the plurality of sentences may be sorted in descending order according to the order of scores from high to low, and the sorting value of each sentence may be recorded. And selecting the sentences of which the sequencing values are less than or equal to the set number as abstract sentences.
Step S3450, generating a text abstract corresponding to the text data of the target line according to the abstract sentence as a text processing result.
In one embodiment of the present disclosure, the text abstract corresponding to the target line text data is generated according to the abstract statement, and the text processing result may include steps S3451 to S3452 as follows:
in step S3451, the order of each abstract statement in the target line text data is obtained.
Step S3452, the abstract sentences are sequenced according to the sequence, and preset punctuations are added behind each abstract sentence to obtain a text abstract corresponding to the text data of the target line as a text processing result.
In this embodiment, the abstract sentences may be sorted according to the sequence in the target line text data, and a preset punctuation is added after each abstract sentence, so as to obtain the corresponding text abstract.
The punctuation added after each abstract statement can be preset according to an application scene or specific requirements, or can be determined according to the punctuation of the corresponding abstract statement in the target line text data.
In the embodiment of setting the punctuation added after each abstract statement in advance according to an application scenario or specific requirements, the punctuation added after each abstract statement may be the same or different. For example, the punctuation added after the last abstract statement may all be commas, and the punctuation added after the last abstract statement may be a period or an exclamation point.
In an embodiment of setting the punctuation added after the corresponding summary sentence according to the punctuation of each summary sentence in the target line text data, the punctuation of the last summary sentence in the target line text data may be added after the corresponding summary sentence. In the case where the punctuation of the last digest sentence in the target line text data is a punctuation other than a comma, it may be that the punctuation of the last digest sentence in the target line text data is added after the digest sentence. In the case where the punctuation of the last abstract statement in the target line text data is comma, the punctuation added after the last abstract statement may be a period or an exclamation point.
Therefore, batch abstract extraction processing can be performed on the text data of multiple lines in the target text data through the abstract extraction operator of the embodiment, so that the text abstract of each line of text data is obtained.
In an embodiment where the text processing operator is a text similarity analysis operator, in response to an operation of operating the target data processing flowchart, operating the text processing operator to process the target text data according to the target data processing flowchart, and obtaining a text processing result may include: operating a summarization extraction operator to perform the following processing steps S3460 to S3470 on the target text data:
step S3460, respectively encoding the first text data and the second text data located in the same line in the target text data to obtain a first vector of the first text data and a second vector of the second text data.
The first text data is located in a first target column in the target text data, and the second text data is located in a second target column in the target text data.
In one embodiment of the present disclosure, the target text data may be multiple columns of structured data. The method may further comprise:
responding to a request for configuring a processing object for the text similarity analysis operator, and providing a second configuration interface of the text similarity analysis operator; and acquiring the first target column and the second target column through a second configuration interface, so that the text similarity analysis operator performs similarity analysis processing on the text data of the first target column and the text data of the second target column in the target text data.
Specifically, the user may fill in the sequence number corresponding to the first target column and the sequence number corresponding to the second target column through the second configuration interface.
The electronic equipment acquires the sequence number input by the user through the second configuration interface so as to acquire the first target column and the second target column corresponding to the sequence number, and then the text similarity analysis operator performs similarity analysis processing on the text data of the first target column and the text data of the second target column in the target text data.
The target text data in this embodiment may include M lines, and then, the M lines of data may be traversed, and the text data of the first target column and the second target column in each line may be respectively used as the first text data and the second text data.
For the first text data and the second text data located in the same line, the first text data may be encoded according to an embedding model to obtain a first vector, and the second text data may be encoded to obtain a second vector.
Step S3470, determining a prediction similarity between the first text data and the second text data as a text processing result according to the first vector and the second vector.
In one embodiment of the present disclosure, determining the prediction similarity between the first text data and the second text data according to the first vector and the second vector may include steps S3471 to S3477 as follows:
in step S3471, cosine similarity between the first vector and the second vector is determined.
For example, a first vector of first text data may be represented as v1The second vector of the second text data may be denoted v2Then, the cosine similarity of the first vector and the second vectorcos(v1,v2) Can be determined by:
Figure BDA0002671253150000151
in step S3472, the maximum of the squares of the first vector and the second vector is determined.
Specifically, the square of the first vector and the square of the second vector may be calculated, respectively, and the larger one of the first vector and the second vector may be determined as the maximum value of the squares of the first vector and the second vector, which may be expressed as max (v)1,v2)2
In step S3473, the absolute value of the difference between the first vector and the second vector is determined.
The absolute value of the difference between the first vector and the second vector may be expressed as | v |1-v2|。
In step S3474, a dot product of the first vector and the second vector is determined.
Step S3475, the cosine similarity, the maximum of the square, the absolute value of the difference, the dot product, the first vector and the second vector are spliced to obtain a target splicing vector of the first text data and the second text data.
Specifically, the cosine similarity, the maximum value of the square, the absolute value of the difference, the dot product, the first vector and the second vector may be spliced according to a preset sequence to obtain a target splicing vector of the first text data and the second text data.
The preset sequence may be preset according to an application scenario or a specific requirement.
Step S3476, a mapping function between the stitching vector and the similarity is obtained.
In one embodiment of the present disclosure, the mapping function may be trained in advance from training samples. Each training sample may include a value of a concatenation vector of the corresponding two pieces of text data and a degree of similarity between the two pieces of text data that is labeled in advance.
Specifically, the step of training the mapping function according to the training samples may include steps S3476-1 to S3476 — as follows:
step S3476-1, according to the value of the splicing vector of each training sample, and with the preset undetermined parameter of the machine learning algorithm as a variable, determining the prediction similarity expression of each training sample.
For example, for the mth training sample xmThe predicted similarity expression for the training sample may be expressed as F (x)m)。
And step S3476-2, constructing a loss function according to the prediction similarity expression and the corresponding labeling similarity of each training sample.
In one embodiment of the present disclosure, a cross entropy loss function between the predicted similarity expression and the labeled similarity for the training samples may be determined.
In this embodiment, an engineer may determine in advance according to experience whether two texts corresponding to each training sample are similar, and label the corresponding labeling similarity as 0 or 1. For example, the corresponding labeling similarity may be labeled as 1 when the two texts corresponding to the training sample are similar, and labeled as 0 when the two texts corresponding to the training sample are dissimilar.
The prediction similarity expression corresponding to the mth training sample is pmAnd the labeled similarity corresponding to the mth training sample is expressed as ymIn the case where the number of training samples is Z, the loss function L can be expressed as:
Figure BDA0002671253150000161
and step S3476-3, determining undetermined parameters according to the loss function, and finishing the training of the mapping function.
In an embodiment of the present disclosure, the mapping function may be obtained by determining the value of the parameter to be determined when the loss function is minimum.
Step S3477, determining a prediction similarity between the first text data and the second text data according to the mapping function and the target concatenation vector.
Specifically, the target stitching vector obtained in step S3475 may be input into the mapping function, so that the similarity corresponding to the target stitching vector may be obtained, that is, the prediction similarity between the first text data and the second text data.
By the text similarity analysis operator of the embodiment, similarity analysis can be performed on first text data and second text data which are positioned in the same line in a first target column and a second target column in the target text data in batch, so that prediction similarity between the first text data and the second text data of each line is obtained.
On the basis of any of the above embodiments, the method may further include: and displaying the text processing result for the user to view.
< example three >
On the basis of the foregoing first embodiment or second embodiment, the present embodiment provides a method for performing text data processing based on a text processing operator in a machine learning platform, which may specifically include steps S5100 to S5200 shown in fig. 5:
and S5100, in response to a request for online processing of the text processing operator, packaging the text processing operator to obtain target estimation service for a target user.
In an embodiment of the present disclosure, in response to a request for bringing a text processing operator online, performing a packing process on the text processing operator to obtain a target pre-estimation service may include steps S5110 to S5140 as follows:
step S5110, in response to the request for creating the pre-estimation service, providing at least one pre-estimation module for the user to select.
In one embodiment of the present disclosure, a button may be provided for triggering a request to create a pre-estimation service, and a user may trigger a request to create a pre-estimation service by clicking on the button.
The estimation module provided in this embodiment may be set in advance according to an application scenario or a specific requirement. For example, the at least one predictor module may include: at least one of a self-learning module, a batch pre-estimation module, a GDBT real-time pre-estimation module, a TensorFlow real-time pre-estimation module, an H2O real-time pre-estimation module, a PMML real-time pre-estimation module, a custom operator pre-estimation module, and a custom application module, as shown in FIG. 6.
Step S5120, obtain the target estimation module selected by the user, and provide a model selection interface.
The user may perform a selection operation for any of the prediction modules as shown in FIG. 7, in response to which the electronic device provides a model selection interface.
In one embodiment, the selection operation for the target estimation module may be an operation of double-clicking the target estimation module, or an operation of right-clicking the target estimation module and clicking a selection button in a pop-up menu (including a selection button).
In one embodiment of the present disclosure, the model selection interface may be as shown in FIG. 8.
Step S5130, a text processing operator selected by the user through the model selection interface is obtained.
In the example shown in fig. 8, the user may input a search condition through the input box, and the electronic device may present a model operator matching the search condition for selection by the user.
And S5140, packaging the text processing operator according to the target estimation module to obtain target estimation service.
Step S5200, running the target pre-estimation service, processing the text data provided by the target user to the target pre-estimation service, and returning a corresponding text processing result to the target user.
In an embodiment of the present disclosure, the target user may provide the text data obtained in real time to the target pre-estimation service, so that the target pre-estimation service processes the text data provided by the target user, and returns a corresponding text processing result to the target user.
In the case that the text processing operator is the abstraction operator, the target pre-estimation service may refer to the way of performing abstraction processing on the text data provided by the target user in the foregoing second embodiment, which is not described herein again.
In the case that the text processing operator is a text similarity analysis operator, the target prediction service may refer to the method for performing similarity analysis on the text data provided by the target user in the foregoing embodiment, which is not described herein again.
According to the embodiment, the text processing operator is packaged into the target pre-estimation service, the text data provided by the target user is processed in real time according to the target pre-estimation service, and the corresponding text processing result is returned to the target user. Thus, real-time text data provided by the target user can be subjected to text processing.
In one embodiment of the present disclosure, the method may further include:
responding to a request for configuring the target pre-estimation service, and providing a third configuration interface; and acquiring configuration information of the target pre-estimation service through a third configuration interface so as to operate the target pre-estimation service according to the configuration information.
In one embodiment of the disclosure, the configuration information may include performing one of: the used GPU resource, the used CPU resource, the used memory resource and the address of the image called in operation.
In one embodiment of the present disclosure, the method may further include: and responding to the request for checking the state of the target pre-estimation service, and displaying the state of the target pre-estimation service.
Specifically, the button may be a button that provides a request for the user to trigger the viewing of the state of the target pre-estimation service, and the user may trigger the request for the viewing of the state of the target pre-estimation service by clicking the button.
In one embodiment of the present disclosure, the state of the target pre-estimation service may include at least one of: the method comprises the steps of occupying resources of a target pre-estimation service in the running process, the online history of the target pre-estimation service, the version number of the target pre-estimation service, the deployment time of the target pre-estimation service and the running time of the target pre-estimation service.
< apparatus embodiment >
In the present embodiment, a text data processing apparatus 5000 is provided, as shown in fig. 9, including an operator addition module 5100 and a text processing module 5200. The operator adding module 5100 is configured to add a text processing operator for processing text data in an operator set of the machine learning platform, where the text processing operator includes: an abstract extraction operator and/or a text similarity analysis operator; the text processing module 5200 is used for performing text data processing based on text processing operators in the machine learning platform.
In one embodiment of the present disclosure, text processing module 5200 may also be configured to:
providing a user configuration interface, wherein the user configuration interface comprises an operator display area and a canvas area used for creating a data processing flow chart; the operator display area displays a text processing operator for processing text data;
acquiring uploaded target text data;
in response to the operation of creating the target data processing flow chart, creating the target data processing flow chart in the canvas area according to the target text data and the text processing operator;
and responding to the operation of running the target data processing flow chart, and running a text processing operator according to the target data processing flow chart to process the target text data to obtain a text processing result.
In one embodiment of the present disclosure, in response to the operation of creating the target data processing flowchart, creating the target data processing flowchart in the canvas area according to the target text data and the text processing operator includes:
in response to an operation of selecting the target text data, presenting the target text data in the canvas area;
displaying the text processing operator in the canvas area in response to an operation of selecting the text processing operator in the operator display area;
and responding to the operation of connecting the target text data and the text processing operator, and connecting the target text data and the text processing operator in the canvas area to obtain a target data processing flow chart.
In an embodiment of the present disclosure, the processing apparatus 5000 of the text data may further include:
means for providing a first configuration interface for a text processing operator in response to a request for resource configuration of the text processing operator;
and the module is used for acquiring the resource configuration parameters of the text processing operator through the first configuration interface so as to operate the text processing operator according to the resource configuration parameters.
In one embodiment of the present disclosure, acquiring the uploaded target text data includes:
providing an entry for uploading data;
and responding to the operation of uploading the target text data, and acquiring the target text data uploaded through the entrance.
In one embodiment of the present disclosure, the text processing operator is a summarization operator for performing summarization processing on text data;
the method for processing the target text data by operating a text processing operator according to the target data processing flow chart comprises the following steps of:
and operating a abstract extraction operator to perform the following processing on the target text data:
performing sentence breaking processing on target line text data in the target text data to obtain a plurality of sentences; the target line text data is text data of any line in the target text data;
determining the similarity between every two sentences, and regularizing the similarity between every two sentences to obtain a similarity matrix;
obtaining scores of other sentences according to the similarity matrix and preset scores of the specified sentences; wherein, the other sentences are sentences except the specified sentences in the plurality of sentences;
selecting a set number of sentences from the plurality of sentences as abstract sentences according to the scores;
and generating a text abstract corresponding to the target line text data as a text processing result according to the abstract statement.
In one embodiment of the present disclosure, determining the similarity between each two sentences includes:
coding each statement to obtain a statement vector of each statement;
for every two sentences, a cosine value between the corresponding sentence vectors is determined as a similarity between the corresponding two sentences.
In an embodiment of the present disclosure, generating a text abstract corresponding to target line text data according to an abstract statement, and the text processing result includes:
acquiring the sequence of each abstract statement in the target line text data;
and sequencing the abstract sentences according to the sequence, and adding preset punctuations behind each abstract sentence to obtain a text abstract corresponding to the text data of the target line as a text processing result.
In one embodiment of the present disclosure, the text processing operator is a text similarity analysis operator for performing similarity analysis processing on the text data;
the method for processing the target file data by operating the text processing operator according to the target data processing flow chart comprises the following steps of:
and operating a text similarity analysis operator to perform the following processing on the target text data:
respectively encoding first text data and second text data which are positioned on the same line in the target text data to obtain a first vector of the first text data and a second vector of the second text data; the first text data is located in a first target column in the target text data, and the second text data is located in a second target column in the target text data;
and determining the prediction similarity between the first text data and the second text data according to the first vector and the second vector as a text processing result.
In one embodiment of the present disclosure, determining the predicted similarity between the first text data and the second text data from the first vector and the second vector comprises:
determining cosine similarity of the first vector and the second vector;
determining a maximum of squares of the first vector and the second vector;
determining an absolute value of a difference between the first vector and the second vector;
determining a dot product of the first vector and the second vector;
splicing the cosine similarity, the maximum value of the square, the absolute value of the difference, the dot product, the first vector and the second vector to obtain a target splicing vector of the first text data and the second text data;
acquiring a mapping function between the splicing vector and the similarity;
and determining the prediction similarity between the first text data and the second text data according to the mapping function and the target splicing vector.
In an embodiment of the present disclosure, the processing apparatus 5000 of the text data may further include:
a module for providing a second configuration interface for the text similarity analysis operator in response to a request for configuring the processing object for the text similarity analysis operator;
and the module is used for acquiring the first target column and the second target column through the second configuration interface so that the text similarity analysis operator performs similarity analysis processing on the text data of the first target column and the text data of the second target column in the target text data.
In an embodiment of the present disclosure, the processing apparatus 5000 of the text data may further include:
and the module is used for displaying the text processing result.
In one embodiment of the present disclosure, text processing module 5200 may also be configured to:
in response to a request for online processing of the text processing operator, packaging the text processing operator to obtain target pre-estimation service for a target user;
and running the target pre-estimation service, processing the text data provided to the target pre-estimation service by the target user, and returning a corresponding text processing result to the target user.
In an embodiment of the present disclosure, in response to a request for bringing a text processing operator online, packing the text processing operator to obtain a target pre-estimation service includes:
responding to a request for creating pre-estimation service, and providing at least one pre-estimation module for a user to select;
acquiring a target estimation module selected by a user and providing a model selection interface;
acquiring a text processing operator selected by a user through a model selection interface;
and packaging the text processing operator according to the target estimation module to obtain target estimation service.
In an embodiment of the present disclosure, the processing apparatus 5000 of the text data may further include:
a module for providing a third configuration interface in response to a request for configuring a target pre-estimated service;
and the module is used for acquiring the configuration information of the target pre-estimation service through the third configuration interface so as to operate the target pre-estimation service according to the configuration information.
In one embodiment of the disclosure, the configuration information includes performing one of: the used GPU resource, the used CPU resource, the used memory resource and the address of the image called in operation.
In an embodiment of the present disclosure, the processing apparatus 5000 of the text data may further include:
and the module is used for responding to the request for checking the state of the target pre-estimation service and displaying the state of the target pre-estimation service.
In one embodiment of the present disclosure, the state of the target pre-estimation service includes at least one of: the method comprises the steps of occupying resources of a target pre-estimation service in the running process, the online history of the target pre-estimation service, the version number of the target pre-estimation service, the deployment time of the target pre-estimation service and the running time of the target pre-estimation service.
It will be appreciated by those skilled in the art that the processing means 5000 of text data may be implemented in various ways. For example, the processing device 5000 of text data may be realized by an instruction configuration processor. For example, the instructions may be stored in a ROM, and when the apparatus is started, the instructions are read from the ROM into a programmable device to realize the processing apparatus 5000 of text data. For example, the processing device 5000 of the text data may be solidified into a dedicated device (e.g., ASIC). The processing means 5000 of the text data may be divided into units independent of each other or may be implemented by combining them together. The processing means 5000 of text data may be implemented by one of the various implementations described above, or may be implemented by a combination of two or more of the various implementations described above.
In this embodiment, the processing device 5000 of the text data may have various implementation forms, for example, the processing device 5000 of the text data may be any functional module running in a software product or an application program providing a text processing service, or a peripheral insert, a plug-in, a patch, etc. of the software product or the application program, and may also be the software product or the application program itself.
< System embodiment >
In this embodiment, as shown in fig. 10, a system 6000 of at least one computing device 6100 and at least one storage device 6200 is also provided. The at least one storage device 6200 is to store executable instructions; the instructions are for controlling at least one computing device 6100 to perform a method in accordance with any embodiment of the present disclosure.
In this embodiment, the system 6000 may be a device such as a mobile phone, a tablet computer, a palm computer, a desktop computer, a notebook computer, a workstation, a game machine, or a distributed system formed by a plurality of devices.
< computer-readable storage Medium >
In this embodiment, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method according to any embodiment of the present disclosure.
The present invention may be an apparatus, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, by software, and by a combination of software and hardware are equivalent.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims (10)

1. A method for processing text data, comprising:
adding a text processing operator for processing text data in an operator set of a machine learning platform, wherein the text processing operator comprises: an abstract extraction operator and/or a text similarity analysis operator;
and processing text data based on the text processing operator in the machine learning platform.
2. The method of claim 1, wherein the text data processing based on the text processing operator in the machine learning platform comprises:
providing a user configuration interface, wherein the user configuration interface comprises an operator display area and a canvas area used for creating a data processing flow chart; the operator display area displays the text processing operator for processing the text data;
acquiring uploaded target text data;
in response to an operation of creating a target data processing flow diagram, creating the target data processing flow diagram in the canvas area according to the target text data and the text processing operator;
and responding to the operation of operating the target data processing flow chart, and operating the text processing operator according to the target data processing flow chart to process the target text data to obtain a text processing result.
3. The method of claim 2, the creating, in response to the operation of creating a target data processing flow graph, the target data processing flow graph in the canvas area according to the target text data and the text processing operator comprising:
in response to an operation of selecting the target text data, presenting the target text data in the canvas area;
in response to an operation of selecting the text processing operator in the operator presentation region, presenting the text processing operator in the canvas region;
and responding to the operation of connecting the target text data and the text processing operator, and connecting the target text data and the text processing operator in the canvas area to obtain the target data processing flow chart.
4. The method of claim 2, further comprising:
responding to a request for resource configuration of the text processing operator, and providing a first configuration interface of the text processing operator;
and acquiring resource configuration parameters of the text processing operator through the first configuration interface so as to operate the text processing operator according to the resource configuration parameters.
5. The method of claim 2, the obtaining uploaded target text data comprising:
providing an entry for uploading data;
and responding to the operation of uploading the target text data, and acquiring the target text data uploaded through the entrance.
6. The method of claim 2, wherein the text processing operator is a summarization operator for summarization of text data;
the step of operating the text processing operator according to the target data processing flow chart to process the target text data to obtain a text processing result comprises the following steps:
operating the abstract extraction operator to perform the following processing on the target text data:
sentence breaking processing is carried out on target line text data in the target text data to obtain a plurality of sentences; the target line text data is text data of any line in the target text data;
determining the similarity between every two sentences, and regularizing the similarity between every two sentences to obtain a similarity matrix;
obtaining scores of other sentences according to the similarity matrix and preset scores of the specified sentences; wherein the other sentences are sentences except the specified sentences in the plurality of sentences;
selecting a set number of sentences from the plurality of sentences as abstract sentences according to the scores;
and generating a text abstract corresponding to the target line text data as the text processing result according to the abstract statement.
7. The method of claim 6, the determining a similarity between each two sentences comprising:
coding each statement to obtain a statement vector of each statement;
for every two sentences, a cosine value between the corresponding sentence vectors is determined as a similarity between the corresponding two sentences.
8. A processing apparatus of text data, comprising:
an operator adding module, configured to add a text processing operator for processing text data in an operator set of a machine learning platform, where the text processing operator includes: an abstract extraction operator and/or a text similarity analysis operator;
and the text processing module is used for processing text data based on the text processing operator in the machine learning platform.
9. A system comprising at least one computing device and at least one storage device, wherein the at least one storage device is to store instructions for controlling the at least one computing device to perform the method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202010933970.2A 2020-09-08 2020-09-08 Text data processing method, device and system Pending CN114154461A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010933970.2A CN114154461A (en) 2020-09-08 2020-09-08 Text data processing method, device and system
PCT/CN2021/117271 WO2022052959A1 (en) 2020-09-08 2021-09-08 Method, device and system for processing text data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010933970.2A CN114154461A (en) 2020-09-08 2020-09-08 Text data processing method, device and system

Publications (1)

Publication Number Publication Date
CN114154461A true CN114154461A (en) 2022-03-08

Family

ID=80460658

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010933970.2A Pending CN114154461A (en) 2020-09-08 2020-09-08 Text data processing method, device and system

Country Status (2)

Country Link
CN (1) CN114154461A (en)
WO (1) WO2022052959A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114911553A (en) * 2022-03-28 2022-08-16 携程旅游信息技术(上海)有限公司 Text processing task construction method, device, equipment and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4174787A3 (en) 2021-11-01 2023-05-10 Rehrig Pacific Company Delivery system
CN114996441B (en) * 2022-04-27 2024-01-12 京东科技信息技术有限公司 Document processing method, device, electronic equipment and storage medium
CN116932092B (en) * 2023-09-18 2024-01-09 之江实验室 Method, device, medium and equipment for automatically generating operator calling code

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268441A (en) * 2017-01-04 2018-07-10 科大讯飞股份有限公司 Sentence similarity computational methods and apparatus and system
CN109816114A (en) * 2018-12-29 2019-05-28 大唐软件技术股份有限公司 A kind of generation method of machine learning model, device
US20190243898A1 (en) * 2018-02-05 2019-08-08 International Business Machines Corporation Statistical preparation of data using semantic clustering
CN111125348A (en) * 2019-11-25 2020-05-08 北京明略软件系统有限公司 Text abstract extraction method and device
CN111597327A (en) * 2020-04-22 2020-08-28 哈尔滨工业大学 Public opinion analysis-oriented unsupervised multi-document abstract generation method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003248676A (en) * 2002-02-22 2003-09-05 Communication Research Laboratory Solution data compiling device and method, and automatic summarizing device and method
CN106126620A (en) * 2016-06-22 2016-11-16 北京鼎泰智源科技有限公司 Method of Chinese Text Automatic Abstraction based on machine learning
CN108009135B (en) * 2016-10-31 2021-05-04 深圳市北科瑞声科技股份有限公司 Method and device for generating document abstract
CN110377881B (en) * 2019-06-11 2023-04-07 创新先进技术有限公司 Integration method, device and system of text processing service
CN110688104A (en) * 2019-09-04 2020-01-14 北京三快在线科技有限公司 Visualization flow processing method and device, electronic equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268441A (en) * 2017-01-04 2018-07-10 科大讯飞股份有限公司 Sentence similarity computational methods and apparatus and system
US20190243898A1 (en) * 2018-02-05 2019-08-08 International Business Machines Corporation Statistical preparation of data using semantic clustering
CN109816114A (en) * 2018-12-29 2019-05-28 大唐软件技术股份有限公司 A kind of generation method of machine learning model, device
CN111125348A (en) * 2019-11-25 2020-05-08 北京明略软件系统有限公司 Text abstract extraction method and device
CN111597327A (en) * 2020-04-22 2020-08-28 哈尔滨工业大学 Public opinion analysis-oriented unsupervised multi-document abstract generation method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114911553A (en) * 2022-03-28 2022-08-16 携程旅游信息技术(上海)有限公司 Text processing task construction method, device, equipment and storage medium

Also Published As

Publication number Publication date
WO2022052959A1 (en) 2022-03-17

Similar Documents

Publication Publication Date Title
US20230049258A1 (en) Inputting images to electronic devices
CN114154461A (en) Text data processing method, device and system
US20210089936A1 (en) Opinion snippet detection for aspect-based sentiment analysis
US9460083B2 (en) Interactive dashboard based on real-time sentiment analysis for synchronous communication
US10025980B2 (en) Assisting people with understanding charts
KR20210040316A (en) Method for generating user interactive information processing model and method for processing user interactive information
EP3599617A1 (en) Computer system and method of presenting information related to basis of predicted value output by predictor, data carrier
US11341336B2 (en) Recording medium, conversation control method, and information processing apparatus
US20160378852A1 (en) Question and answer system emulating people and clusters of blended people
US11061982B2 (en) Social media tag suggestion based on product recognition
CN110688844A (en) Text labeling method and device
US11183076B2 (en) Cognitive content mapping and collating
US10360401B2 (en) Privacy protection in network input methods
CN112990625A (en) Method and device for allocating annotation tasks and server
CN111506775A (en) Label processing method and device, electronic equipment and readable storage medium
US10783141B2 (en) Natural language processing social-based matrix refactorization
US20190005949A1 (en) Linguistic profiling for digital customization and personalization
CN111813948A (en) Information processing method and device and electronic equipment
US11809481B2 (en) Content generation based on multi-source content analysis
US20220317823A1 (en) Semi-virtualized portable command center
KR20200009812A (en) Method and system for supporting spell checking within input interface of mobile device
CN115080039A (en) Front-end code generation method, device, computer equipment, storage medium and product
US20170220585A1 (en) Sentence set extraction system, method, and program
KR20190012492A (en) Apparatus and method for generating automatic sentence
CN111191795A (en) Method, device and system for training machine learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination