CN108920656A - Document properties description content extracting method and device - Google Patents
Document properties description content extracting method and device Download PDFInfo
- Publication number
- CN108920656A CN108920656A CN201810718897.XA CN201810718897A CN108920656A CN 108920656 A CN108920656 A CN 108920656A CN 201810718897 A CN201810718897 A CN 201810718897A CN 108920656 A CN108920656 A CN 108920656A
- Authority
- CN
- China
- Prior art keywords
- document
- description content
- document information
- extracted
- properties
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of document properties description content extracting method and devices.This method includes:Obtain the document information of attribute text to be extracted;Document information is input in preparatory trained attributes extraction model and carries out model calculation, obtains operation result;Content corresponding with attribute to be extracted in document information is determined according to operation result.Through the invention, quick reading document properties information has been achieved the effect that.
Description
Technical field
The present invention relates to field of information processing, in particular to a kind of document properties description content extracting method and dress
It sets.
Background technique
When user largely reads the document of a theme, it is most concerned with several focus.This focus is exactly text
This attribute.Such as:When user wants to read tens of thousands of a bidding documents, feel emerging if only reading focus and can quickly find
The specific bidding document of interest.But since focus can not be positioned quickly in the text, the reading speed of user is greatly delayed
Degree.Come if the focus in file can be explicitly enumerated, can quickly navigate to interested file.
Document content in the related technology can not rapidly extracting aiming at the problem that, currently no effective solution has been proposed.
Summary of the invention
The main purpose of the present invention is to provide a kind of document properties description content extracting method and devices, to solve document
Content can not rapidly extracting the problem of.
To achieve the goals above, according to an aspect of the invention, there is provided a kind of document properties description content is extracted
Method, this method include:Obtain the document information of attribute text to be extracted;The document information is input to trained in advance
Model calculation is carried out in attributes extraction model, obtains operation result;According to the operation result determine in the document information with
The corresponding description content of document properties.
Further, description content corresponding with document properties in the document information is being determined according to the operation result
Later, the method also includes:Description content corresponding with document properties to be extracted in the document information is passed through default
Mode, which marks out, to be come.
Further, description content corresponding with document properties to be extracted in the document information is passed through into predetermined manner
It marks out to include:Mark in the document information that each document properties to be extracted are corresponding to be retouched by the background color of different colours
State content.
Further, model calculation is carried out in the document information to be input to preparatory trained attributes extraction model
Before, the method also includes:Acquire the model training sample of preset quantity;To paragraph and sentence in the model training sample
It labels, the sample content after being labelled;Depth is carried out to the sample content after labelling by neural network
It practises, obtains trained attributes extraction model.
Further, deep learning is carried out to the sample content after labelling by neural network, obtains trained category
Property extract model include:Word in sample after labelling is converted to digital vectors;Learnt by LSTM to the number
Vector is trained, and obtains trained attributes extraction model.
To achieve the goals above, according to another aspect of the present invention, a kind of document properties description content is additionally provided to mention
Device is taken, which includes:Acquiring unit, for obtaining the document information of attribute text to be extracted;Arithmetic element is used for institute
It states document information and is input in preparatory trained attributes extraction model and carry out model calculation, obtain operation result;Determination unit,
For determining description content corresponding with document properties in the document information according to the operation result.
Further, described device further includes:Unit is marked, for determining that the document is believed according to the operation result
In breath after description content corresponding with document properties, by description corresponding with document properties to be extracted in the document information
Content by predetermined manner mark out come.
Further, the mark unit is used for:By the background colors of different colours mark in the document information each to
The corresponding description content of the document properties of extraction.
To achieve the goals above, according to another aspect of the present invention, a kind of storage medium is additionally provided, including storage
Program, wherein equipment where controlling the storage medium in described program operation executes document properties of the present invention and retouches
State method for extracting content.
To achieve the goals above, according to another aspect of the present invention, a kind of processor is additionally provided, for running journey
Sequence, wherein described program executes document properties description content extracting method of the present invention when running.
The document information that the present invention passes through acquisition attribute text to be extracted;Document information is input to preparatory trained category
Property extract model in carry out model calculation, obtain operation result;According to operation result determine in document information with document properties pair
The description content answered, solve the problems, such as document content can not rapidly extracting, and then reached quick reading document properties information
Effect.
Detailed description of the invention
The attached drawing constituted part of this application is used to provide further understanding of the present invention, schematic reality of the invention
It applies example and its explanation is used to explain the present invention, do not constitute improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of document properties description content extracting method according to an embodiment of the present invention;
Fig. 2 is the schematic diagram that text attribute according to an embodiment of the present invention describes that paragraph extracts result;And
Fig. 3 is the schematic diagram of document properties description content extraction element according to an embodiment of the present invention.
Specific embodiment
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.The present invention will be described in detail below with reference to the accompanying drawings and embodiments.
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application
Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only
The embodiment of the application a part, instead of all the embodiments.Based on the embodiment in the application, ordinary skill people
Member's every other embodiment obtained without making creative work, all should belong to the model of the application protection
It encloses.
It should be noted that the description and claims of this application and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to embodiments herein described herein.In addition, term " includes " and " tool
Have " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing a series of steps or units
Process, method, system, product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include without clear
Other step or units listing to Chu or intrinsic for these process, methods, product or equipment.
The embodiment of the invention provides a kind of document properties description content extracting methods.
Fig. 1 is the flow chart of document properties description content extracting method according to an embodiment of the present invention, as shown in Figure 1, should
Method includes the following steps:
Step S102:Obtain the document information of attribute text to be extracted;
Step S104:Document information is input in preparatory trained attributes extraction model and carries out model calculation, is obtained
Operation result;
Step S106:Description content corresponding with document properties in document information is determined according to operation result.
The embodiment is using the document information for obtaining attribute text to be extracted;Document information is input to trained in advance
Model calculation is carried out in attributes extraction model, obtains operation result;According to operation result determine in document information with document properties
Corresponding description content, solve the problems, such as document content can not rapidly extracting, and then reached quick reading document properties letter
The effect of breath.
In embodiments of the present invention, it is a variety of to can be word format or table format etc. for the document of attribute text to be extracted
Document information after obtaining document information, can be input to preparatory trained attributes extraction model by the format file of type
Middle carry out model calculation, wherein attributes extraction model is trained according to large volume document, is carried in each document
There are attribute to be extracted and position and content of the attribute in the document, it, can be according to can after large volume document training
Can position in a document or the keyword closed on or the keyword for being included it is corresponding with attribute to be extracted to determine
Content of text can extract the attribute of user's concern in the shortest time in this way, improve reading efficiency.
Optionally, according to operation result determine in document information description content corresponding with document properties to be extracted it
Afterwards, by description content corresponding with document properties to be extracted in document information by predetermined manner mark out come.
Optionally, content corresponding with attribute to be extracted in document information is marked out by predetermined manner to include:
Pass through the corresponding description content of document properties to be extracted each in the background color mark document information of different colours.
It can be shown in several ways after determining document properties to be extracted, such as different colours can be passed through
Corresponding mark out of each attribute is come, corresponding content can be aobvious by same color in a document for Property Name and the attribute
Show, distinguishes different attribute with different colours, can be convenient user in this way and quickly read the corresponding content of each generic attribute.
Optionally, before carrying out model calculation in document information to be input to preparatory trained attributes extraction model,
Acquire the model training sample of preset quantity;It labels to paragraph in model training sample and sentence, after being labelled
Sample content;Deep learning is carried out to the sample content after labelling by neural network, obtains trained attributes extraction
Model.
Optionally, deep learning is carried out to the sample content after labelling by neural network, obtains trained attribute
Extracting model includes:Word in sample after labelling is converted to digital vectors;By LSTM study to digital vectors into
Row training, obtains trained attributes extraction model.
The process of model training can first collect representative Training document, label to the data in document, with
Each sentence is an individual, and the corresponding paragraph of every class document properties is all started with B-, and such as the beginning word of " project name " is by B-
Title indicates that then subsequent sentence is I-title, and the sentence of ending is E-title, and corresponding if it is document properties is single
Sentence is then S-title, and the sentence for being not belonging to any attribute is labeled as O, the word in document is converted to digital vectors (Word
Embedding), attribute labeling model (mark here is sentence flag attribute) is then trained by LSTM study, repeatedly
Training study is to obtain satisfactory model.
Optionally, in attributes extraction, if can be counted to some attributes extraction to two or more content of text
The probability that this multiple text may be the attribute is calculated, chooses maximum probability as the corresponding content of text of the attribute.
The embodiment of the invention also provides a kind of specific embodiments, below with reference to the specific embodiment to of the invention
Technical solution is illustrated.
The technical solution of the embodiment of the present invention can be used as a kind of text attribute based on dictionary and describe paragraph extracting method,
Deep learning method neural network based identifies that overall procedure is as follows to text attribute descriptive statement or paragraph:
1, collect representative Training document.
2, sample data mark.Sample files are labeled according to different attribute, as each attribute description sentence or section
Drop marker attribute, the sentence for being not belonging to any attribute are labeled as other.
3, deep learning method neural network based learns the data of mark, training attribute labeling model.
4, characteristic attribute extraction is carried out to document with trained model.
The deep learning method neural network based of the embodiment of the present invention identifies text attribute descriptive statement or paragraph
Method, can be realized by following steps:
Step 1 first collects representative Training document before to model training.
Step 2 is labeled data, and specific step is as follows:
Each sentence is an individual.The corresponding paragraph of every class document properties is all started with B-, and such as " project name " is opened
Beginning word is indicated that then subsequent sentence is I-title by B-title, and the sentence of ending is E-title.If it is document properties pair
What is answered be simple sentence is then S-title.The sentence for being not belonging to any attribute is labeled as O.
Step 3 learns the data of mark.Here we use deep learning method neural network based, example
Such as Word Embedding+LSTM.Specific steps:
1. word is converted to digital vectors (Word Embedding) first.
2. then training attribute labeling model by LSTM study.(mark here is sentence flag attribute).
Step 4 carries out characteristic attribute extraction to document with trained model.
Fig. 2 is the schematic diagram that text attribute according to an embodiment of the present invention describes that paragraph extracts result, as shown in Fig. 2, literary
This attribute description sentence or paragraph identification are exactly to find out the description of association attributes from a natural language text, and mark out it
Position and type, it is corresponding:Project name, budget amount, contents of a project description, bidding document price, contact method, qualification are wanted
It each classification such as seeks, identifies that it corresponding content and marks out in the text, project name is corresponding:Shandong Province's mother and child care
Key lab of institute fertility regulation project equipment buying (second batch) Ultracentrifuge buying two, budget amount is corresponding
70.000000 ten thousand yuan, the contents of a project are described as Ultracentrifuge, and bidding document price is 300 yuan/packet, and contact method is corresponding
It is marked:Healthcare hospital for women & children of purchaser Shandong Province, address, contact person, agency address and contact person and phone etc., with convenient
User reads wherein information in the shortest time.
By the above method, user can rapidly carry out the reading of text, can quickly locate required concern
Focus on, improve the efficiency of reading.
It should be noted that step shown in the flowchart of the accompanying drawings can be in such as a group of computer-executable instructions
It is executed in computer system, although also, logical order is shown in flow charts, and it in some cases, can be with not
The sequence being same as herein executes shown or described step.
The embodiment of the invention provides a kind of document properties description content extraction element, which can be used for executing this hair
The document properties description content extracting method of bright embodiment.
Fig. 3 is the schematic diagram of document properties description content extraction element according to an embodiment of the present invention, as shown in figure 3, should
Device includes:
Acquiring unit 10, for obtaining the document information of attribute text to be extracted;
Arithmetic element 20 carries out model fortune for document information to be input in preparatory trained attributes extraction model
It calculates, obtains operation result;
Determination unit 30, for determining description content corresponding with document properties in document information according to operation result.
The embodiment uses acquiring unit 10, for obtaining the document information of attribute text to be extracted;Arithmetic element 20 is used
Model calculation is carried out in being input to document information in preparatory trained attributes extraction model, obtains operation result;It determines single
Member 30, for determining content corresponding with attribute to be extracted in document information according to operation result, to solve in document
Hold can not rapidly extracting the problem of, and then achieved the effect that quick reading document properties information.
Optionally, which further includes:Mark unit, for according to operation result determine in document information with document category
After the corresponding description content of property, description content corresponding with document properties to be extracted in document information is passed through into predetermined manner
It marks out and.
Optionally, mark unit is used to mark each document category to be extracted in document information by the background color of different colours
The corresponding description content of property.
The document properties description content extraction element includes processor and memory, above-mentioned acquiring unit, arithmetic element,
Determination unit etc. stores in memory as program unit, executes above procedure list stored in memory by processor
Member realizes corresponding function.
Include kernel in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can be set one
Or more, come to read document properties information quickly by adjusting kernel parameter.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, if read-only memory (ROM) or flash memory (flash RAM), memory include that at least one is deposited
Store up chip.
The embodiment of the invention provides a kind of storage mediums, are stored thereon with program, real when which is executed by processor
The existing document properties description content extracting method.
The embodiment of the invention provides a kind of processor, the processor is for running program, wherein described program operation
Document properties description content extracting method described in Shi Zhihang.
The embodiment of the invention provides a kind of equipment, equipment include processor, memory and storage on a memory and can
The program run on a processor, processor realize following steps when executing program:Obtain the document letter of attribute text to be extracted
Breath;Document information is input in preparatory trained attributes extraction model and carries out model calculation, obtains operation result;According to fortune
It calculates result and determines description content corresponding with document properties in document information.Equipment herein can be server, PC, PAD,
Mobile phone etc..
Present invention also provides a kind of computer program products, when executing on data processing equipment, are adapted for carrying out just
The program of beginningization there are as below methods step:Obtain the document information of attribute text to be extracted;Document information is input to preparatory instruction
Model calculation is carried out in the attributes extraction model perfected, obtains operation result;According to operation result determine in document information with text
The corresponding description content of shelves attribute.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie
The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices
Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates
Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element
There is also other identical elements in process, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product.
Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application
Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code
The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
Formula.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art,
Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement,
Improve etc., it should be included within the scope of the claims of this application.
Claims (10)
1. a kind of document properties description content extracting method, which is characterized in that including:
Obtain the document information of attribute text to be extracted;
The document information is input in preparatory trained attributes extraction model and carries out model calculation, obtains operation result;
Description content corresponding with document properties in the document information is determined according to the operation result.
2. the method according to claim 1, wherein being determined in the document information according to the operation result
After description content corresponding with document properties, the method also includes:
By description content corresponding with document properties to be extracted in the document information by predetermined manner mark out come.
3. according to the method described in claim 2, it is characterized in that, by the document information with document properties pair to be extracted
The description content answered is marked out by predetermined manner:
The corresponding description content of each document properties to be extracted in the document information is marked by the background color of different colours.
4. the method according to claim 1, wherein the document information is input to preparatory trained category
Property extract model in carry out model calculation before, the method also includes:
Acquire the model training sample of preset quantity;
It labels to paragraph in the model training sample and sentence, the sample content after being labelled;
Deep learning is carried out to the sample content after labelling by neural network, obtains trained attributes extraction model.
5. according to the method described in claim 4, it is characterized in that, being carried out by neural network to the sample content after labelling
Deep learning, obtaining trained attributes extraction model includes:
Word in sample after labelling is converted to digital vectors;
The digital vectors are trained by LSTM study, obtain trained attributes extraction model.
6. a kind of document properties description content extraction element, which is characterized in that including:
Acquiring unit, for obtaining the document information of attribute text to be extracted;
Arithmetic element carries out model calculation for the document information to be input in preparatory trained attributes extraction model,
Obtain operation result;
Determination unit, for determining description content corresponding with document properties in the document information according to the operation result.
7. device according to claim 6, which is characterized in that described device further includes:
Unit is marked, for determining description content corresponding with document properties in the document information according to the operation result
Later, by description content corresponding with document properties to be extracted in the document information by predetermined manner mark out come.
8. device according to claim 7, which is characterized in that the mark unit is used for:
The corresponding description content of each document properties to be extracted in the document information is marked by the background color of different colours.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program
When control the storage medium where equipment perform claim require any one of 1 to 5 described in document properties description content mention
Take method.
10. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run
Benefit require any one of 1 to 5 described in document properties description content extracting method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810718897.XA CN108920656A (en) | 2018-07-03 | 2018-07-03 | Document properties description content extracting method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810718897.XA CN108920656A (en) | 2018-07-03 | 2018-07-03 | Document properties description content extracting method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108920656A true CN108920656A (en) | 2018-11-30 |
Family
ID=64423581
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810718897.XA Pending CN108920656A (en) | 2018-07-03 | 2018-07-03 | Document properties description content extracting method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108920656A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109685056A (en) * | 2019-01-04 | 2019-04-26 | 达而观信息科技(上海)有限公司 | Obtain the method and device of document information |
CN109815500A (en) * | 2019-01-25 | 2019-05-28 | 杭州绿湾网络科技有限公司 | Management method, device, computer equipment and the storage medium of unstructured official document |
CN111191130A (en) * | 2019-12-30 | 2020-05-22 | 泰康保险集团股份有限公司 | Information extraction method, device, equipment and computer readable storage medium |
CN111611794A (en) * | 2020-05-18 | 2020-09-01 | 众能联合数字技术有限公司 | General engineering information extraction method based on industry rules and TextCNN model |
CN111797886A (en) * | 2019-04-08 | 2020-10-20 | 京瓷办公信息系统株式会社 | Generating OCR training data for neural networks by parsing PDL files |
CN111985478A (en) * | 2020-09-02 | 2020-11-24 | 深圳壹账通智能科技有限公司 | Text positioning playing method and device, computer equipment and readable storage medium |
CN112099870A (en) * | 2020-08-28 | 2020-12-18 | 深圳前海微众银行股份有限公司 | Document processing method and device, electronic equipment and computer readable storage medium |
CN112418776A (en) * | 2019-08-23 | 2021-02-26 | 珠海金山办公软件有限公司 | Document deadline management method and device |
CN113065154A (en) * | 2021-03-19 | 2021-07-02 | 深信服科技股份有限公司 | Document detection method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110125791A1 (en) * | 2009-11-25 | 2011-05-26 | Microsoft Corporation | Query classification using search result tag ratios |
CN105320714A (en) * | 2014-10-22 | 2016-02-10 | 武汉理工大学 | Interactive retrieval method for content retrieval and labeling information active service |
CN106502988A (en) * | 2016-11-02 | 2017-03-15 | 深圳市空谷幽兰人工智能科技有限公司 | The method and apparatus that a kind of objective attribute target attribute is extracted |
CN108182295A (en) * | 2018-02-09 | 2018-06-19 | 重庆誉存大数据科技有限公司 | A kind of Company Knowledge collection of illustrative plates attribute extraction method and system |
-
2018
- 2018-07-03 CN CN201810718897.XA patent/CN108920656A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110125791A1 (en) * | 2009-11-25 | 2011-05-26 | Microsoft Corporation | Query classification using search result tag ratios |
CN105320714A (en) * | 2014-10-22 | 2016-02-10 | 武汉理工大学 | Interactive retrieval method for content retrieval and labeling information active service |
CN106502988A (en) * | 2016-11-02 | 2017-03-15 | 深圳市空谷幽兰人工智能科技有限公司 | The method and apparatus that a kind of objective attribute target attribute is extracted |
CN108182295A (en) * | 2018-02-09 | 2018-06-19 | 重庆誉存大数据科技有限公司 | A kind of Company Knowledge collection of illustrative plates attribute extraction method and system |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109685056A (en) * | 2019-01-04 | 2019-04-26 | 达而观信息科技(上海)有限公司 | Obtain the method and device of document information |
CN109685056B (en) * | 2019-01-04 | 2023-04-04 | 达而观信息科技(上海)有限公司 | Method and device for acquiring document information |
CN109815500A (en) * | 2019-01-25 | 2019-05-28 | 杭州绿湾网络科技有限公司 | Management method, device, computer equipment and the storage medium of unstructured official document |
CN111797886A (en) * | 2019-04-08 | 2020-10-20 | 京瓷办公信息系统株式会社 | Generating OCR training data for neural networks by parsing PDL files |
CN112418776A (en) * | 2019-08-23 | 2021-02-26 | 珠海金山办公软件有限公司 | Document deadline management method and device |
CN111191130A (en) * | 2019-12-30 | 2020-05-22 | 泰康保险集团股份有限公司 | Information extraction method, device, equipment and computer readable storage medium |
CN111611794A (en) * | 2020-05-18 | 2020-09-01 | 众能联合数字技术有限公司 | General engineering information extraction method based on industry rules and TextCNN model |
CN112099870A (en) * | 2020-08-28 | 2020-12-18 | 深圳前海微众银行股份有限公司 | Document processing method and device, electronic equipment and computer readable storage medium |
CN112099870B (en) * | 2020-08-28 | 2023-12-26 | 深圳前海微众银行股份有限公司 | Document processing method, device, electronic equipment and computer readable storage medium |
CN111985478A (en) * | 2020-09-02 | 2020-11-24 | 深圳壹账通智能科技有限公司 | Text positioning playing method and device, computer equipment and readable storage medium |
CN113065154A (en) * | 2021-03-19 | 2021-07-02 | 深信服科技股份有限公司 | Document detection method, device, equipment and storage medium |
CN113065154B (en) * | 2021-03-19 | 2023-12-29 | 深信服科技股份有限公司 | Document detection method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108920656A (en) | Document properties description content extracting method and device | |
CN106815192B (en) | Model training method and device and sentence emotion recognition method and device | |
CN105243055B (en) | Based on multilingual segmenting method and device | |
CN109992763A (en) | Language marks processing method, system, electronic equipment and computer-readable medium | |
CN106951571A (en) | A kind of method and apparatus for giving application mark label | |
CN109448793B (en) | Method and system for labeling, searching and information labeling of right range of gene sequence | |
CN103631874B (en) | UGC label classification determining method and device for social platform | |
CN108256537A (en) | A kind of user gender prediction method and system | |
CN107679208A (en) | A kind of searching method of picture, terminal device and storage medium | |
CN110533018A (en) | A kind of classification method and device of image | |
CN109101476A (en) | A kind of term vector generates, data processing method and device | |
CN109002443A (en) | A kind of classification method and device of text information | |
CN109683773A (en) | Corpus labeling method and device | |
CN110197188A (en) | Method, system, equipment and the storage medium of business scenario prediction, classification | |
CN110019669A (en) | A kind of text searching method and device | |
CN107330009A (en) | Descriptor disaggregated model creation method, creating device and storage medium | |
CN110309256A (en) | The acquisition methods and device of event data in a kind of text | |
CN110717312A (en) | Text labeling method and device | |
CN110647504B (en) | Method and device for searching judicial documents | |
CN111897955B (en) | Comment generation method, device, equipment and storage medium based on encoding and decoding | |
CN110008445A (en) | Event extraction method and device, electronic equipment | |
CN111126053B (en) | Information processing method and related equipment | |
CN105786929B (en) | A kind of information monitoring method and device | |
CN109558580A (en) | A kind of text analyzing method and device | |
CN109583473A (en) | A kind of generation method and device of characteristic |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181130 |