CN109800303A - A kind of document information extracting method, storage medium and terminal - Google Patents
A kind of document information extracting method, storage medium and terminal Download PDFInfo
- Publication number
- CN109800303A CN109800303A CN201811621569.4A CN201811621569A CN109800303A CN 109800303 A CN109800303 A CN 109800303A CN 201811621569 A CN201811621569 A CN 201811621569A CN 109800303 A CN109800303 A CN 109800303A
- Authority
- CN
- China
- Prior art keywords
- keyword
- information
- document
- text
- extracting method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of document information extracting method, storage medium and terminals.This method comprises: obtaining the text information and text position information of document, text information corresponds to text position information;Keyword is extracted from text information using training morpheme classification model;The corresponding hyperlink of keyword is set.The document properties information and keyword classification of document where storing keyword, the corresponding hyperlink of keyword, the corresponding text position information of keyword, keyword.The present invention can extract technical term keyword, product keyword, category keyword, attribute keywords from the information source of the information document in vertical field, fix document information lookup more really, improve search matching degree, improve user's search experience.
Description
Technical field
The present invention relates to file retrieval field, more specifically to a kind of document information extracting method, storage medium and
Terminal.
Background technique
To the Word Input of information document, there are two methods at present, one is OCR identification technology is utilized, by information document
It is converted into image, by printed page analysis, row character segmentation, Text region export result;Another method is to utilize information document
It is parsed, extracts text information, directly export result.But above two method focuses on the text for extracting information document,
Vertical field technical term keyword, product keyword, category keyword, the attribute for not being described original document content are crucial
Word, also without the relationship between description keyword.This, which becomes, restricts the bottleneck that people retrieve in vertical industry realm information.Cause
This, the research for carrying out information extraction to information document seems particularly significant.
Summary of the invention
The technical problem to be solved in the present invention is that in view of the above drawbacks of the prior art, a kind of document information is provided and is mentioned
Take method, storage medium and terminal.
The technical solution adopted by the present invention to solve the technical problems is: constructing a kind of document information extracting method, comprising:
The text information and text position information of document are obtained, the text information corresponds to the text position information;
Keyword is extracted from the text information using training morpheme classification model;
The corresponding hyperlink of the keyword is set.
Further, document information extracting method of the present invention, the document are PDF document, the acquisition document
Text information and text position information include:
The text information in the PDF document is identified using optical character recognition method, while obtaining the text information
Location information and number of pages location information in a certain page within said document.
Further, document information extracting method of the present invention, the text position information include the text information
X-axis information, y-axis information, z-axis information, wherein the x-axis information and y-axis information be the text information within said document
Location information in a certain page, the z-axis information are number of pages information of the text information in the document.
Further, document information extracting method of the present invention, it is described to use training morpheme classification model from the text
Keyword is extracted in this information includes:
Use the part of speech of the list of training morpheme, the trained morpheme list in the trained morpheme classification model, described
Training morpheme list and the correlation and goal-selling morpheme of default resource extract keyword from the text information.
Further, document information extracting method of the present invention, it is described using training morpheme classification model from described
After extracting keyword in text information, and before the corresponding hyperlink of the setting keyword, the method is also wrapped
It includes:
Keyword decoding and keyword classification are carried out to the keyword, wherein keyword decoding refers to according to the text
The file structure of shelves carries out data decoding;The keyword classification refers to classifies according to default classification mode, wherein described pre-
If classification mode includes technical term keyword patterns, product keyword patterns, category keyword patterns, attribute keywords mould
Formula.
Further, document information extracting method of the present invention, in the corresponding hyperlink of the setting keyword
Later, the method also includes:
Store the keyword, the corresponding hyperlink of the keyword, the corresponding text position information of the keyword, institute
The document properties information and keyword classification of document where stating keyword, wherein the document properties information includes document mark
Topic, document structure tree date, documentation release number.
Further, document information extracting method of the present invention, storing, the keyword, the keyword are corresponding
The document properties information and key of document where hyperlink, the corresponding text position information of the keyword, the keyword
After word classification, the method also includes:
Receive keyword;
Search corresponding with keyword search result, the search result include Document Title, the document structure tree date,
Documentation release number, keyword, the corresponding text position information of the keyword and the corresponding hyperlink of the keyword.
Further, document information extracting method of the present invention searches retrieval corresponding with the keyword described
As a result after, the method also includes:
Document where opening the keyword according to the hyperlink, and believed according to the corresponding text position of the keyword
Breath locating and displaying goes out the keyword position.
In addition, being stored thereon with computer program, the computer the present invention also provides a kind of computer readable storage medium
Such as above-mentioned document information extracting method is realized when program is executed by processor.
In addition, the terminal includes processor the present invention also provides a kind of terminal, the processor is for executing in memory
It realizes when the computer program of storage such as the step of above-mentioned document information extracting method.
Implement a kind of document information extracting method, storage medium and terminal of the invention, has the advantages that the party
Method includes: the text information and text position information for obtaining document, and text information corresponds to text position information;Use training morpheme
Classification model extracts keyword from text information;The corresponding hyperlink of keyword is set.It is corresponding to store keyword, keyword
The document properties information and keyword classification of document where the corresponding text position information of hyperlink, keyword, keyword.
The present invention can extract technical term keyword, product keyword, category from the information source of the information document in vertical field
Keyword, attribute keywords fix document information lookup more really, improve search matching degree, improve user's search experience.
Detailed description of the invention
Present invention will be further explained below with reference to the attached drawings and examples, in attached drawing:
Fig. 1 is the document information extracting method flow chart that one embodiment of the invention provides;
Fig. 2 is the document information extracting method flow chart that one embodiment of the invention provides;
Fig. 3 is the document information extracting method flow chart that one embodiment of the invention provides;
Fig. 4 is a kind of structural schematic diagram of terminal of the present invention.
Specific embodiment
For a clearer understanding of the technical characteristics, objects and effects of the present invention, now control attached drawing is described in detail
A specific embodiment of the invention.
Embodiment
As shown in Figure 1, the present embodiment document information extracting method includes:
S1, the text information and text position information for obtaining document, text information correspond to text position information.As choosing
It selects, document includes but is not limited to word document, PDF document, excel document, TXT document, PPT document, WPS document etc., this article
Shelves include text information.In document each text information will corresponding text position information, can be with by text position information
Navigate to text information.Preferably, document is PDF document, and the text information and text position information for obtaining document include: to make
With the text information in optical character recognition method identification PDF document, while text information is obtained in a document in a certain page
Location information and number of pages location information.
Further, coordinate system is established in a document, which includes that x-axis, y-axis, z-axis, wherein x-axis and y-axis are located at text
In shelves in each page, for position of the localization of text information in the page;Z-axis indicates document number of pages information, for positioning
The number of pages of the page where text information.So each text position information obtained includes the x-axis information of text information, y-axis letter
Breath, z-axis information, wherein x-axis information and y-axis information are location information of the text information in a document in a certain page, z-axis letter
Breath is number of pages information of the text information in document.It can fast and accurately be navigated to by x-axis information, y-axis information, z-axis information
The position of text information in a document.
S2, keyword is extracted from text information using training morpheme classification model.Training morpheme classification model is to pass through
Training corpus training study comprising various trained morphemes is obtained, training morpheme classification model includes training morpheme column
Table, the part of speech of training morpheme list, the correlation and goal-selling morpheme of training morpheme list and default resource.So making
With training morpheme classification model from text information extract keyword include: using training morpheme classification model in training morpheme
List, the part of speech of training morpheme list, the correlation of training morpheme list and default resource and goal-selling morpheme are from text
Keyword is extracted in information.
Alternatively, the document information extracting method of the present embodiment is in use training morpheme classification model from text information
After extracting keyword, and before the corresponding hyperlink of setting keyword, method further include:
Keyword decoding and keyword classification are carried out to keyword, wherein keyword decoding refers to the file structure according to document
Carry out data decoding;Keyword classification refers to classifies according to default classification mode, wherein default classification mode includes professional art
Language keyword patterns, product keyword patterns, category keyword patterns, attribute keywords mode.
S3, the corresponding hyperlink of setting keyword.Hyperlink is all arranged to all keywords extracted in text information, is closed
Keyword and hyperlink correspond, and include the corresponding text position information of text information in the hyperlink, pass through the hyperlink
Connect can quickly position to keyword position in a document.
The present embodiment can extract technical term keyword from the information source of the information document in vertical field, product closes
Keyword, category keyword, attribute keywords fix document information lookup more really.
Embodiment
As shown in Fig. 2, on the basis of the above embodiments, the document information extracting method of the present embodiment is in setting keyword
Further include that information Step is extracted in storage after corresponding hyperlink:
S4, establish database, storage keyword, the corresponding hyperlink of keyword, the corresponding text position information of keyword,
The document properties information and keyword classification of document where keyword, wherein document properties information includes Document Title, document
Date of formation, documentation release number.In the database, each keyword and its corresponding hyperlink of corresponding keyword, keyword
The document properties information of document where corresponding text position information, keyword and keyword classification form a storage number
According to.During later retrieval, object is matched using keyword as retrieval, whole storage number can be obtained by Keywords matching
According to.It is appreciated that because in same document there may be in multiple keywords or different document there may be same keyword,
So a plurality of storing data may be present in same keyword.
Alternatively, database is storable on the server being separately provided or data lab setting is in cloud platform.
The present embodiment can extract technical term keyword from the information source of the information document in vertical field, product closes
Keyword, category keyword, attribute keywords, and private database is established, fix document information lookup more really.
Embodiment
As shown in figure 3, on the basis of the above embodiments, the document information extracting method of the present embodiment is crucial in storage
The corresponding hyperlink of word, keyword, the corresponding text position information of keyword, document where keyword document properties information,
And after keyword classification, method further includes searching step:
S5, keyword is received.Alternatively, keyword can be received by input equipment, or is connect by phonetic incepting equipment
It receives and identifies keyword, or keyword etc. is received by the bar code or two dimensional code of camera scanning electron element.
S6, lookup search result corresponding with keyword.Search procedure are as follows: by whether the key that matching judgment receives
Whether in the database word, if the Keywords matching in the keyword and database that receive, it is corresponding reads the keyword
One storing data, obtains search result.If the crucial word mismatch in the keyword and database that receive, illustrates do not have
The keyword data.Search result includes that Document Title, document structure tree date, documentation release number, keyword, keyword are corresponding
Text position information and the corresponding hyperlink of keyword.
Alternatively, the document information extracting method of the present embodiment, after searching search result corresponding with keyword,
Method further includes that search result shows step:
S7, keyword place document is opened according to hyperlink, and positioned and shown according to the corresponding text position information of keyword
Keyword position is shown.Each text position information includes the x-axis information, y-axis information, z-axis information of text information,
In, x-axis information and y-axis information are the location information in the text information in a document a certain page, and z-axis information is that text information exists
The number of pages information of document.Text information can be fast and accurately navigated in document by x-axis information, y-axis information, z-axis information
In position.
Alternatively, if in search result including a plurality of keyword data, retrieval knot is shown according to predetermined order mode
Fruit, such as the display of document structure tree date, show according to the context of keyword in a document, or according to keyword number in document
Keyword etc. in the high document of the preferential display frequency of aobvious frequency.Superposing type arrangement, window may be selected in the arrangement of display window
Horizontal Tile arrangement, window tile arrangement, window chequered order etc. vertically.It, can for multiple keywords in same document
It is shown by splitting display window.
Alternatively, after locating and displaying goes out keyword position, the modes such as highlighted, underscore, background colour can be passed through
Keyword is highlighted, user is facilitated to check.
The present embodiment can extract technical term keyword from the information source of the information document in vertical field, product closes
Keyword, category keyword, attribute keywords, are retrieved by keyword, are fixed document information lookup more really, are improved search
Matching degree improves user's search experience.
Alternatively, above-mentioned several document information extracting methods are applied in electronic component document, electronic component here
Document includes the component parameters document of electronic component, element operation instruction document, order document, element circuitry document etc..
The present embodiment also provides a kind of computer readable storage medium, is stored thereon with computer program, computer program
Such as above-mentioned document information extracting method is realized when being executed by processor.
Embodiment
As shown in figure 4, the present embodiment also provides a kind of terminal, terminal includes processor, and processor is for executing memory
It realizes when the computer program of middle storage such as the step of above-mentioned document information extracting method.Alternatively, terminal includes but unlimited
In smart phone, tablet computer, laptop, desktop computer, server etc..
It is crucial that the present invention can extract technical term keyword, product from the information source of the information document in vertical field
Word, category keyword, attribute keywords fix document information lookup more really, improve search matching degree, improve user and search for body
It tests.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other
The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment
For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part
It is bright.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure
And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and
The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These
Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession
Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered
Think beyond the scope of this invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor
The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit
Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology
In any other form of storage medium well known in field.
Above embodiments only technical concepts and features to illustrate the invention, its object is to allow person skilled in the art
Scholar can understand the contents of the present invention and implement accordingly, can not limit the scope of the invention.It is all to be wanted with right of the present invention
The equivalent changes and modifications that range is done are sought, should belong to the covering scope of the claims in the present invention.
Claims (10)
1. a kind of document information extracting method characterized by comprising
The text information and text position information of document are obtained, the text information corresponds to the text position information;
Keyword is extracted from the text information using training morpheme classification model;
The corresponding hyperlink of the keyword is set.
2. document information extracting method according to claim 1, which is characterized in that the document is PDF document, described to obtain
The text information and text position information for taking document include:
The text information in the PDF document is identified using optical character recognition method, while obtaining the text information in institute
State the location information and number of pages location information in document in a certain page.
3. document information extracting method according to claim 1, which is characterized in that the text position information includes described
X-axis information, y-axis information, the z-axis information of text information, wherein the x-axis information and y-axis information are the text information in institute
The location information in document in a certain page is stated, the z-axis information is number of pages information of the text information in the document.
4. document information extracting method according to claim 1, which is characterized in that described to use training morpheme classification model
Keyword is extracted from the text information includes:
Use the training morpheme list in the trained morpheme classification model, the part of speech of the trained morpheme list, the training
Morpheme list and the correlation and goal-selling morpheme of default resource extract keyword from the text information.
5. document information extracting method according to claim 1, which is characterized in that use training morpheme classification mould described
Plate is after extracting keyword in the text information, described and before the corresponding hyperlink of the setting keyword
Method further include:
Keyword decoding and keyword classification are carried out to the keyword, wherein keyword decoding refers to according to the document
File structure carries out data decoding;The keyword classification refers to classifies according to default classification mode, wherein described default point
Quasi-mode includes technical term keyword patterns, product keyword patterns, category keyword patterns, attribute keywords mode.
6. document information extracting method according to claim 5, which is characterized in that corresponding in the setting keyword
Hyperlink after, the method also includes:
Store the keyword, the corresponding hyperlink of the keyword, the corresponding text position information of the keyword, the pass
The document properties information and keyword classification of document where keyword, wherein the document properties information includes Document Title, text
The shelves date of formation, documentation release number.
7. document information extracting method according to claim 6, which is characterized in that storing the keyword, the pass
The document properties letter of document where the corresponding hyperlink of keyword, the corresponding text position information of the keyword, the keyword
After breath and keyword classification, the method also includes:
Receive keyword;
Search result corresponding with the keyword is searched, the search result includes Document Title, document structure tree date, document
Version number, keyword, the corresponding text position information of the keyword and the corresponding hyperlink of the keyword.
8. document information extracting method according to claim 7, which is characterized in that in the lookup and the keyword pair
After the search result answered, the method also includes:
Document where opening the keyword according to the hyperlink, and it is fixed according to the corresponding text position information of the keyword
Position shows the keyword position.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt
The document information extracting method as described in any one of claim 1-8 is realized when processor executes.
10. a kind of terminal, which is characterized in that the terminal includes processor, and the processor is stored for executing in memory
Computer program when realize as described in any one of claim 1-8 the step of document information extracting method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811621569.4A CN109800303A (en) | 2018-12-28 | 2018-12-28 | A kind of document information extracting method, storage medium and terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811621569.4A CN109800303A (en) | 2018-12-28 | 2018-12-28 | A kind of document information extracting method, storage medium and terminal |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109800303A true CN109800303A (en) | 2019-05-24 |
Family
ID=66557897
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811621569.4A Withdrawn CN109800303A (en) | 2018-12-28 | 2018-12-28 | A kind of document information extracting method, storage medium and terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109800303A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110837788A (en) * | 2019-10-31 | 2020-02-25 | 北京深度制耀科技有限公司 | PDF document processing method and device |
CN110909112A (en) * | 2019-10-18 | 2020-03-24 | 深圳价值在线信息科技股份有限公司 | Data extraction method, device, terminal equipment and medium |
CN111563212A (en) * | 2020-04-28 | 2020-08-21 | 北京字节跳动网络技术有限公司 | Inner chain adding method and device |
CN112434148A (en) * | 2020-12-04 | 2021-03-02 | 房桂丽 | Intelligent robot response method and device based on artificial intelligence |
CN112596646A (en) * | 2020-12-21 | 2021-04-02 | 维沃移动通信有限公司 | Information display method and device and electronic equipment |
CN112597422A (en) * | 2020-12-30 | 2021-04-02 | 深圳市世强元件网络有限公司 | PDF file segmentation method and PDF file loading method in webpage |
CN112711657A (en) * | 2021-01-06 | 2021-04-27 | 北京中科深智科技有限公司 | Question-answering method and question-answering system |
CN113282752A (en) * | 2021-06-09 | 2021-08-20 | 江苏联著实业股份有限公司 | Object classification method and system based on semantic mapping |
CN113298914A (en) * | 2021-07-28 | 2021-08-24 | 北京明略软件系统有限公司 | Knowledge chunk extraction method and device, electronic equipment and storage medium |
CN113392626A (en) * | 2021-06-22 | 2021-09-14 | 上海维算科技有限公司 | Method and device for generating medical document and storage medium |
CN115525611A (en) * | 2022-08-16 | 2022-12-27 | 北京矩阵分解科技有限公司 | Method, device and equipment for inquiring key words in portable document format file |
CN117131301A (en) * | 2023-10-24 | 2023-11-28 | 苏州阿基米德网络科技有限公司 | Webpage end browsing method of medical equipment document |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101004737A (en) * | 2007-01-24 | 2007-07-25 | 贵阳易特软件有限公司 | Individualized document processing system based on keywords |
CN101118560A (en) * | 2006-08-03 | 2008-02-06 | 株式会社东芝 | Keyword outputting apparatus, keyword outputting method, and keyword outputting computer program product |
CN101546328A (en) * | 2008-03-27 | 2009-09-30 | 株式会社东芝 | Search keyword improvement apparatus, server and method |
CN104036027A (en) * | 2014-06-27 | 2014-09-10 | 吴涛军 | Methods and systems for connection establishment and information transmission between positions of electronic documents |
CN104200380A (en) * | 2014-09-18 | 2014-12-10 | 北京国双科技有限公司 | Promotion information positioning method and device |
-
2018
- 2018-12-28 CN CN201811621569.4A patent/CN109800303A/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101118560A (en) * | 2006-08-03 | 2008-02-06 | 株式会社东芝 | Keyword outputting apparatus, keyword outputting method, and keyword outputting computer program product |
CN101004737A (en) * | 2007-01-24 | 2007-07-25 | 贵阳易特软件有限公司 | Individualized document processing system based on keywords |
CN101546328A (en) * | 2008-03-27 | 2009-09-30 | 株式会社东芝 | Search keyword improvement apparatus, server and method |
CN104036027A (en) * | 2014-06-27 | 2014-09-10 | 吴涛军 | Methods and systems for connection establishment and information transmission between positions of electronic documents |
CN104200380A (en) * | 2014-09-18 | 2014-12-10 | 北京国双科技有限公司 | Promotion information positioning method and device |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110909112A (en) * | 2019-10-18 | 2020-03-24 | 深圳价值在线信息科技股份有限公司 | Data extraction method, device, terminal equipment and medium |
CN110837788B (en) * | 2019-10-31 | 2022-10-28 | 北京深度制耀科技有限公司 | PDF document processing method and device |
CN110837788A (en) * | 2019-10-31 | 2020-02-25 | 北京深度制耀科技有限公司 | PDF document processing method and device |
CN111563212A (en) * | 2020-04-28 | 2020-08-21 | 北京字节跳动网络技术有限公司 | Inner chain adding method and device |
CN112434148A (en) * | 2020-12-04 | 2021-03-02 | 房桂丽 | Intelligent robot response method and device based on artificial intelligence |
CN112596646A (en) * | 2020-12-21 | 2021-04-02 | 维沃移动通信有限公司 | Information display method and device and electronic equipment |
CN112597422A (en) * | 2020-12-30 | 2021-04-02 | 深圳市世强元件网络有限公司 | PDF file segmentation method and PDF file loading method in webpage |
CN112711657A (en) * | 2021-01-06 | 2021-04-27 | 北京中科深智科技有限公司 | Question-answering method and question-answering system |
CN113282752A (en) * | 2021-06-09 | 2021-08-20 | 江苏联著实业股份有限公司 | Object classification method and system based on semantic mapping |
CN113392626A (en) * | 2021-06-22 | 2021-09-14 | 上海维算科技有限公司 | Method and device for generating medical document and storage medium |
CN113298914A (en) * | 2021-07-28 | 2021-08-24 | 北京明略软件系统有限公司 | Knowledge chunk extraction method and device, electronic equipment and storage medium |
CN115525611A (en) * | 2022-08-16 | 2022-12-27 | 北京矩阵分解科技有限公司 | Method, device and equipment for inquiring key words in portable document format file |
CN117131301A (en) * | 2023-10-24 | 2023-11-28 | 苏州阿基米德网络科技有限公司 | Webpage end browsing method of medical equipment document |
CN117131301B (en) * | 2023-10-24 | 2024-01-05 | 苏州阿基米德网络科技有限公司 | Webpage end browsing method of medical equipment document |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109800303A (en) | A kind of document information extracting method, storage medium and terminal | |
CN110427563B (en) | Professional field system cold start recommendation method based on knowledge graph | |
US8161059B2 (en) | Method and apparatus for collecting entity aliases | |
CN104685501B (en) | Text vocabulary is identified in response to visual query | |
US10671619B2 (en) | Information processing system and information processing method | |
CN105446946B (en) | Rearrangement method, system and the electronic reading terminal of format document | |
CN107085583B (en) | Electronic document management method and device based on content | |
KR20160107187A (en) | Coherent question answering in search results | |
CN105869640A (en) | Method and device for recognizing voice control instruction for entity in current page | |
US20090132530A1 (en) | Web content mining of pair-based data | |
CN112434533B (en) | Entity disambiguation method, entity disambiguation device, electronic device, and computer-readable storage medium | |
US20220058214A1 (en) | Document information extraction method, storage medium and terminal | |
CN105786803B (en) | translation method and translation device | |
KR102373884B1 (en) | Image data processing method for searching images by text | |
EP2806336A1 (en) | Text prediction in a text input associated with an image | |
CN107861944A (en) | A kind of text label extracting method and device based on Word2Vec | |
CN111104801A (en) | Text word segmentation method, system, device and medium based on website domain name | |
CN106021532A (en) | Display method and device for keywords | |
JPH10301953A (en) | Image managing device, image retrieving device, image managing method, image retrieving method, and computer-readable recording medium recording program for allowing computer to execute these methods | |
JP2017182646A (en) | Information processing device, program and information processing method | |
CN109885583A (en) | Data query method, apparatus, equipment and storage medium based on block chain | |
CN109660621A (en) | Content pushing method and service equipment | |
CN110704654A (en) | Picture searching method and device | |
US20220027419A1 (en) | Smart search and recommendation method for content, storage medium, and terminal | |
CN110287300A (en) | Chinese and English relative words acquisition methods and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190524 |