CN112966038A - Method and device for extracting structured data from unstructured data - Google Patents

Method and device for extracting structured data from unstructured data Download PDF

Info

Publication number
CN112966038A
CN112966038A CN202110262891.8A CN202110262891A CN112966038A CN 112966038 A CN112966038 A CN 112966038A CN 202110262891 A CN202110262891 A CN 202110262891A CN 112966038 A CN112966038 A CN 112966038A
Authority
CN
China
Prior art keywords
data
document
target document
structured
classification label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110262891.8A
Other languages
Chinese (zh)
Inventor
陈洲
张志恒
沈云
莫钧涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guotai Epoint Software Co Ltd
Original Assignee
Guotai Epoint Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guotai Epoint Software Co Ltd filed Critical Guotai Epoint Software Co Ltd
Priority to CN202110262891.8A priority Critical patent/CN112966038A/en
Publication of CN112966038A publication Critical patent/CN112966038A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/174Form filling; Merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a method and a device for extracting structured data from unstructured data, belonging to the technical field of computers, wherein the method comprises the following steps: acquiring a target document; performing data segmentation on the target document to obtain a plurality of data fragments in the target document; sequentially inputting a plurality of data segments into a pre-trained data classification model to obtain each classification label included by each data segment and data content corresponding to each classification label; storing each classification label and corresponding data content into a structured database to obtain structured data; displaying the structured data through the form; the problem that when data are input in an unstructured data form, different personnel input data in different and uniform modes, and accordingly document input and review efficiency is low can be solved; because the unstructured data in the target document can be displayed in the form of the structural data, the document entry and review efficiency can be improved, and the accuracy of extracting the unstructured data can be improved.

Description

Method and device for extracting structured data from unstructured data
Technical Field
The application relates to a method and a device for extracting structured data from unstructured data, and belongs to the technical field of computers.
Background
The current government purchasing system requires a buyer to input the purchasing requirement in a structured way for subsequent series of functions such as intelligent bid evaluation and the like.
At present, the purchasing requirements of a buyer are divided into a cargo list, technical requirements and the like, and are usually presented in a word format document.
However, the mode of inputting the purchasing requirement by different personnel can be different, which can lead to the final presented document format being not uniform, and the problem of low document inputting and reviewing efficiency is caused.
Disclosure of Invention
The application provides a method and a device for extracting structured data from unstructured data, which can solve the problem that when data are input in an unstructured data form, the input mode of data is different and uniform by different personnel, so that the document input and review efficiency is low. The application provides the following technical scheme:
in a first aspect, a method for extracting structured data from unstructured data is provided, the method comprising:
acquiring a target document, wherein the target document comprises unstructured data to be extracted;
performing data segmentation on the target document to obtain a plurality of data fragments in the target document;
sequentially inputting the plurality of data segments into a pre-trained data classification model to obtain each classification label included by each data segment and data content corresponding to each classification label; the data classification model is obtained by using a plurality of groups of training data for training in advance, and each group of training data comprises: labeling a plurality of sample data fragments and a classification label corresponding to each sample data fragment;
storing each classification label and corresponding data content into a structured database to obtain structured data;
and displaying the structured data through a form.
Optionally, the performing data segmentation on the target document to obtain a plurality of data segments in the target document includes:
extracting text content in the target document through a file content extraction tool;
and performing data segmentation on the text content according to preset punctuations to obtain a plurality of data fragments.
Optionally, the classification label is determined based on data extraction requirements of the unstructured data.
Optionally, before the step of sequentially inputting the plurality of data segments into a pre-trained data classification model to obtain each classification label included in each data segment and a data segment corresponding to each classification label, the method further includes:
obtaining a sample document;
performing data cutting on the sample document to obtain a plurality of sample data fragments in the sample document;
labeling each sample data fragment according to the data extraction requirement to obtain a corresponding classification label;
inputting the sample data fragments into a preset neural network model to obtain a model result;
and training the neural network model based on a preset loss function, the model result and the classification label to obtain the data classification model.
Optionally, the sample document includes unstructured data related to the data extraction requirements.
Optionally, the data classification model is built based on a bi-directional encoder representation BERT model of the converter.
Optionally, the displaying the structured data through a form includes:
and displaying the structured data in a form through a webpage.
Optionally, the target document is a word document, the unstructured data to be extracted is stored in the word document in a non-fixed format, and a historical stock document exists.
In a second aspect, an apparatus for extracting structured data from unstructured data is provided, the apparatus comprising:
the document acquisition module is used for acquiring a target document, and the target document comprises unstructured data to be extracted;
the data cutting module is used for carrying out data cutting on the target document to obtain a plurality of data fragments in the target document;
the data classification module is used for sequentially inputting the data fragments into a pre-trained data classification model to obtain each classification label included by each data fragment and data content corresponding to each classification label; the data classification model is obtained by using a plurality of groups of training data for training in advance, and each group of training data comprises: labeling a plurality of sample data fragments and a classification label corresponding to each sample data fragment;
the structured storage module is used for storing each classification label and corresponding data content into a structured database to obtain structured data;
and the data display module is used for displaying the structured data through a form.
Optionally, the target document is a word document, the unstructured data to be extracted is stored in the word document in a non-fixed format, and a historical stock document exists.
The beneficial effect of this application lies in: obtaining a target document, wherein the target document comprises unstructured data to be extracted; performing data segmentation on the target document to obtain a plurality of data fragments in the target document; sequentially inputting a plurality of data segments into a pre-trained data classification model to obtain each classification label included by each data segment and data content corresponding to each classification label; the data classification model is obtained by using a plurality of groups of training data for training in advance, and each group of training data comprises: labeling a plurality of sample data fragments and a classification label corresponding to each sample data fragment; storing each classification label and corresponding data content into a structured database to obtain structured data; displaying the structured data through the form; the problem that when data are input in an unstructured data form, different personnel input data in different and uniform modes, and accordingly document input and review efficiency is low can be solved; because unstructured data in the target document can be displayed in the form of structured data, document entry and review efficiency can be improved.
In addition, after the target document is subjected to data segmentation, each data segment is classified and identified by using the data classification model, so that the accuracy of data classification can be improved, and the accuracy of structured data extraction is improved.
The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical solutions of the present application more clear and clear, and to implement the technical solutions according to the content of the description, the following detailed description is made with reference to the preferred embodiments of the present application and the accompanying drawings.
Drawings
FIG. 1 is a flow diagram of a method for extracting structured data from unstructured data as provided by one embodiment of the present application;
FIG. 2 is a block diagram of an apparatus for extracting structured data from unstructured data as provided by one embodiment of the present application;
FIG. 3 is a block diagram of an apparatus for extracting structured data from unstructured data according to one embodiment of the present application.
Detailed Description
The following detailed description of embodiments of the present application will be described in conjunction with the accompanying drawings and examples. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.
First, several terms referred to in the present application will be described.
Bidirectional Encoder Representation of the converter (BERT): namely the Encoder of the bidirectional Transformer, a good feature representation is learned for words by operating an automatic supervision learning method on the basis of massive linguistic data. The self-supervised learning refers to supervised learning which is operated on data without manual labeling. The network architecture of BERT includes a multi-layer Transformer structure. Wherein, the Transformer is a structure of an encoder-decoder (encoder-decoder); formed by stacking several encoders and decoders. The encoder is used to convert the input expectation into a feature vector, and the input of the decoder is the output of the encoder and the predicted result, and the output is the conditional probability of the final result.
Optionally, the present application is described by taking an execution subject of each embodiment as an example of an electronic device with processing capability, where the electronic device may be a desktop computer, a notebook computer, a tablet computer, a mobile phone, a server, and the like, and the present embodiment does not limit the device type of the computer device.
FIG. 1 is a flow chart of a method for extracting structured data from unstructured data according to one embodiment of the present application. The method at least comprises the following steps:
step 101, a target document is obtained, wherein the target document comprises unstructured data to be extracted.
The target document may be input by a user through the current electronic device; or, it is transmitted by other devices, and the source of the target document is not limited in this embodiment.
In the embodiment, the unstructured data to be extracted is stored in the target document in a non-fixed format, and a historical stock document exists, so that the entry cost can be reduced. In one example, the target document is a word document, and the unstructured data to be extracted is not stored in a table form in the word document, is stored in a non-fixed format, and has a historical inventory document.
Such as: the content in the target document includes: time for listing the land parcel: 30/10/2018 to 12/11/2018, 16: 30 hours. Wherein, the data of "10 months and 30 days in 2018" and the data of "11 months and 12 days in 2018" 16: 30 "is the unstructured data to be extracted.
And 102, performing data segmentation on the target document to obtain a plurality of data fragments in the target document.
In one example, data cutting is performed on a target document to obtain a plurality of data segments in the target document, including: extracting text content in the target document through a file content extraction tool; and performing data segmentation on the text content according to the preset punctuation marks to obtain a plurality of data fragments.
The file content extraction tool is used for extracting the text content of the target document, such as: the file content extraction tool is a tool kit tika for extracting file content in Apache (Apache HTTP Server, Apache), or a self-development tool, and the implementation manner of the file content extraction tool is not limited in this embodiment.
The predetermined punctuation marks include, but are not limited to, at least one of the following: periods, commas and semicolons. Certainly, the preset punctuation mark can be set adaptively according to the data cutting requirement, and the implementation manner of the preset punctuation mark is not limited in this embodiment.
103, sequentially inputting a plurality of data segments into a pre-trained data classification model to obtain each classification label included in each data segment and data content corresponding to each classification label; the data classification model is obtained by using a plurality of groups of training data for training in advance, and each group of training data comprises: and labeling a plurality of sample data fragments and the classification label corresponding to each sample data fragment.
The classification label is determined based on data extraction requirements of the unstructured data. Such as: the data extraction requirement is to extract purchasing information, the purchasing information includes a cargo list and technical requirements, and the classification label includes each cargo name in the cargo list and each requirement in the technical requirements. For another example: and if the data extraction requirement is to extract the listing time information, the classification label comprises the initial listing time and the final listing time.
Optionally, the plurality of data segments are sequentially input into a pre-trained data classification model, and before obtaining each classification label included in each data segment and the data segment corresponding to each classification label, the data classification model needs to be obtained through training.
The training process of the data classification model comprises the following steps: obtaining a sample document; performing data cutting on the sample document to obtain a plurality of sample data fragments in the sample document; labeling each sample data fragment according to the data extraction requirement to obtain a corresponding classification label; inputting the sample data fragments into a preset neural network model to obtain a model result; and training the neural network model based on a preset loss function, a model result and classification label labels to obtain a data classification model.
Wherein the sample document includes unstructured data related to data extraction requirements.
Labeling each sample data fragment may be performed by using an automatic labeling tool or by using a user to label the sample data fragment manually, and the classification label labeling manner is not limited in this embodiment.
The pre-set penalty function is used to minimize the difference between the model results and the classification label labeling. The predetermined loss function includes, but is not limited to, at least one of the following: negative logarithmic loss function, L1 loss function, and L2 loss function, but of course, in other implementations, the preset loss function may also include other types of loss functions, and this embodiment is not listed here.
Illustratively, the data classification model is built based on the BERT model. In other words, the preset neural network model includes a BERT model, and certainly, the preset neural network model may also be a combination of the BERT model and other neural network models, and the implementation manner of the data classification model is not limited in this embodiment.
And 104, storing each classification label and corresponding data content into a structured database to obtain structured data.
Such as: for data fragment "plot hang time: 30/10/2018 to 12/11/2018, 16: 30 hours later, the corresponding classification labels comprise listing starting time and listing ending time, wherein the data content corresponding to the listing starting time is 2018, 10 months and 30 days; the data content corresponding to the listing deadline is 11 months, 12 days and 16 days in 2018: 30 hours. Accordingly, after storing in the structured database, the structured data is obtained as shown in the following table.
Table one:
starting time of hanging card 30 days in 2018, 10 months
Hang tag deadline 11/2018, 12/12 16: at 30 hours
And 105, displaying the structured data through the form.
In one example, structured data is displayed through a form, including: the structured data is displayed in the form of a form through a web page.
In summary, in the method for extracting structured data from unstructured data provided in this embodiment, a target document is obtained, where the target document includes unstructured data to be extracted; performing data segmentation on the target document to obtain a plurality of data fragments in the target document; sequentially inputting a plurality of data segments into a pre-trained data classification model to obtain each classification label included by each data segment and data content corresponding to each classification label; the data classification model is obtained by using a plurality of groups of training data for training in advance, and each group of training data comprises: labeling a plurality of sample data fragments and a classification label corresponding to each sample data fragment; storing each classification label and corresponding data content into a structured database to obtain structured data; displaying the structured data through the form; the problem that when data are input in an unstructured data form, different personnel input data in different and uniform modes, and accordingly document input and review efficiency is low can be solved; because unstructured data in the target document can be displayed in the form of structured data, document entry and review efficiency can be improved.
In addition, after the target document is subjected to data segmentation, each data segment is classified and identified by using the data classification model, so that the accuracy of data classification can be improved, and the accuracy of structured data extraction is improved.
FIG. 2 is a block diagram of an apparatus for extracting structured data from unstructured data according to one embodiment of the present application. The device at least comprises the following modules: document acquisition module 210, data segmentation module 220, data classification module 230, structured storage module 240, and data display module 250.
A document obtaining module 210, configured to obtain a target document, where the target document includes unstructured data to be extracted;
the data cutting module 220 is configured to perform data cutting on the target document to obtain a plurality of data segments in the target document;
the data classification module 230 is configured to sequentially input the plurality of data segments into a pre-trained data classification model, so as to obtain each classification label included in each data segment and data content corresponding to each classification label; the data classification model is obtained by using a plurality of groups of training data for training in advance, and each group of training data comprises: labeling a plurality of sample data fragments and a classification label corresponding to each sample data fragment;
a structured storage module 240, configured to store each classification tag and corresponding data content in a structured database to obtain structured data;
and a data display module 250, configured to display the structured data through a form.
Optionally, the target document is a word document, the unstructured data to be extracted is stored in the word document in a non-fixed format, and a historical stock document exists.
For relevant details reference is made to the above-described method embodiments.
It should be noted that: the device for extracting structured data from unstructured data provided in the above embodiments is only illustrated by the above division of each functional module when extracting structured data from unstructured data, and in practical applications, the above function allocation may be completed by different functional modules as needed, that is, the internal structure of the device for extracting structured data from unstructured data is divided into different functional modules to complete all or part of the above described functions. In addition, the apparatus for extracting structured data from unstructured data and the method for extracting structured data from unstructured data provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in method embodiments and are not described herein again.
FIG. 3 is a block diagram of an apparatus for extracting structured data from unstructured data according to one embodiment of the present application. The apparatus comprises at least a processor 301 and a memory 302.
Processor 301 may include one or more processing cores, such as: 4 core processors, 8 core processors, etc. The processor 301 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 301 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 301 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 301 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
Memory 302 may include one or more computer-readable storage media, which may be non-transitory. Memory 302 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 302 is used to store at least one instruction for execution by processor 301 to implement a method for extracting structured data from unstructured data as provided by method embodiments herein.
In some embodiments, the means for extracting the structured data from the unstructured data may further comprise: a peripheral interface and at least one peripheral. The processor 301, memory 302 and peripheral interface may be connected by bus or signal lines. Each peripheral may be connected to the peripheral interface via a bus, signal line, or circuit board. Illustratively, peripheral devices include, but are not limited to: radio frequency circuit, touch display screen, audio circuit, power supply, etc.
Of course, the device for extracting structured data from unstructured data may also include fewer or more components, which is not limited by the embodiment.
Optionally, the present application further provides a computer-readable storage medium, in which a program is stored, and the program is loaded and executed by a processor to implement the method for extracting structured data from unstructured data of the above method embodiment.
Optionally, the present application further provides a computer product, which includes a computer-readable storage medium, in which a program is stored, and the program is loaded and executed by a processor to implement the method for extracting structured data from unstructured data of the above-mentioned method embodiment.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method of extracting structured data from unstructured data, the method comprising:
acquiring a target document, wherein the target document comprises unstructured data to be extracted;
performing data segmentation on the target document to obtain a plurality of data fragments in the target document;
sequentially inputting the plurality of data segments into a pre-trained data classification model to obtain each classification label included by each data segment and data content corresponding to each classification label; the data classification model is obtained by using a plurality of groups of training data for training in advance, and each group of training data comprises: labeling a plurality of sample data fragments and a classification label corresponding to each sample data fragment;
storing each classification label and corresponding data content into a structured database to obtain structured data;
and displaying the structured data through a form.
2. The method of claim 1, wherein the data cutting the target document to obtain a plurality of data segments in the target document comprises:
extracting text content in the target document through a file content extraction tool;
and performing data segmentation on the text content according to preset punctuations to obtain a plurality of data fragments.
3. The method of claim 1, wherein the classification label is determined based on data extraction requirements of the unstructured data.
4. The method of claim 3, wherein before the data segments are sequentially input into a pre-trained data classification model to obtain each classification label included in each data segment and the data segment corresponding to each classification label, the method further comprises:
obtaining a sample document;
performing data cutting on the sample document to obtain a plurality of sample data fragments in the sample document;
labeling each sample data fragment according to the data extraction requirement to obtain a corresponding classification label;
inputting the sample data fragments into a preset neural network model to obtain a model result;
and training the neural network model based on a preset loss function, the model result and the classification label to obtain the data classification model.
5. The method of claim 4, wherein the sample document includes unstructured data related to the data extraction requirements.
6. The method of claim 1, wherein the data classification model is built based on a bi-directional encoder representation BERT model of a converter.
7. The method of claim 1, wherein the displaying the structured data through a form comprises:
and displaying the structured data in a form through a webpage.
8. The method of claim 1, wherein the target document is a word document, and the unstructured data to be extracted is stored in the word document in a non-fixed format and has a historical inventory document.
9. An apparatus for extracting structured data from unstructured data, the apparatus comprising:
the document acquisition module is used for acquiring a target document, and the target document comprises unstructured data to be extracted;
the data cutting module is used for carrying out data cutting on the target document to obtain a plurality of data fragments in the target document;
the data classification module is used for sequentially inputting the data fragments into a pre-trained data classification model to obtain each classification label included by each data fragment and data content corresponding to each classification label; the data classification model is obtained by using a plurality of groups of training data for training in advance, and each group of training data comprises: labeling a plurality of sample data fragments and a classification label corresponding to each sample data fragment;
the structured storage module is used for storing each classification label and corresponding data content into a structured database to obtain structured data;
and the data display module is used for displaying the structured data through a form.
10. The apparatus of claim 9, wherein the target document is a word document, and the unstructured data to be extracted is stored in the word document in a non-fixed format and has a historical inventory document.
CN202110262891.8A 2021-03-11 2021-03-11 Method and device for extracting structured data from unstructured data Pending CN112966038A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110262891.8A CN112966038A (en) 2021-03-11 2021-03-11 Method and device for extracting structured data from unstructured data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110262891.8A CN112966038A (en) 2021-03-11 2021-03-11 Method and device for extracting structured data from unstructured data

Publications (1)

Publication Number Publication Date
CN112966038A true CN112966038A (en) 2021-06-15

Family

ID=76277202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110262891.8A Pending CN112966038A (en) 2021-03-11 2021-03-11 Method and device for extracting structured data from unstructured data

Country Status (1)

Country Link
CN (1) CN112966038A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344083A (en) * 2021-06-16 2021-09-03 安徽容知日新科技股份有限公司 Data labeling method and device and computing equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270604A (en) * 2020-10-14 2021-01-26 招商银行股份有限公司 Information structuring processing method and device and computer readable storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270604A (en) * 2020-10-14 2021-01-26 招商银行股份有限公司 Information structuring processing method and device and computer readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344083A (en) * 2021-06-16 2021-09-03 安徽容知日新科技股份有限公司 Data labeling method and device and computing equipment

Similar Documents

Publication Publication Date Title
CN110287961B (en) Chinese word segmentation method, electronic device and readable storage medium
CN112597312A (en) Text classification method and device, electronic equipment and readable storage medium
CN111985229B (en) Sequence labeling method and device and computer equipment
CN107590291A (en) A kind of searching method of picture, terminal device and storage medium
CN112328655B (en) Text label mining method, device, equipment and storage medium
CN113268615A (en) Resource label generation method and device, electronic equipment and storage medium
CN112084334A (en) Corpus label classification method and device, computer equipment and storage medium
CN112559687A (en) Question identification and query method and device, electronic equipment and storage medium
CN112507663A (en) Text-based judgment question generation method and device, electronic equipment and storage medium
CN112528013A (en) Text abstract extraction method and device, electronic equipment and storage medium
CN112951233A (en) Voice question and answer method and device, electronic equipment and readable storage medium
CN112966038A (en) Method and device for extracting structured data from unstructured data
CN113869456A (en) Sampling monitoring method and device, electronic equipment and storage medium
CN112100493B (en) Document ordering method, device, equipment and storage medium
CN113420542A (en) Dialog generation method and device, electronic equipment and storage medium
CN112989043A (en) Reference resolution method and device, electronic equipment and readable storage medium
CN117195886A (en) Text data processing method, device, equipment and medium based on artificial intelligence
CN115730603A (en) Information extraction method, device, equipment and storage medium based on artificial intelligence
CN114943306A (en) Intention classification method, device, equipment and storage medium
CN114757154A (en) Job generation method, device and equipment based on deep learning and storage medium
CN115203364A (en) Software fault feedback processing method, device, equipment and readable storage medium
CN112734205A (en) Model confidence degree analysis method and device, electronic equipment and computer storage medium
CN113779934A (en) Multi-modal information extraction method, device, equipment and computer-readable storage medium
CN113743721A (en) Marketing strategy generation method and device, computer equipment and storage medium
CN112364131A (en) Corpus processing method and related device thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210615

RJ01 Rejection of invention patent application after publication