CN110489570A - Candidate the whole network bibliography real-time update platform and system - Google Patents
Candidate the whole network bibliography real-time update platform and system Download PDFInfo
- Publication number
- CN110489570A CN110489570A CN201910722763.XA CN201910722763A CN110489570A CN 110489570 A CN110489570 A CN 110489570A CN 201910722763 A CN201910722763 A CN 201910722763A CN 110489570 A CN110489570 A CN 110489570A
- Authority
- CN
- China
- Prior art keywords
- character string
- image
- recognition unit
- candidate
- bibliography
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2379—Updates performed during online database operations; commit processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/382—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using citations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/383—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/28—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
- G06V30/287—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Library & Information Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Character Discrimination (AREA)
Abstract
The present invention relates to a kind of candidate the whole network bibliography real-time update plateform system, the system comprises: webpage grabs screen equipment, and the Web-page screen edited to user carries out grabbing screen operation, grabs screen image to obtain webpage;Text box detection device identifies that webpage grabs each image-region at each text box difference place in screen image based on text box imaging features;OCR identifies equipment, carries out OCR identification respectively to each image-region to obtain corresponding multiple character strings;Multiple character strings of each image-region are uniformly carried out the sequence of frequency of occurrence order by character string sorting equipment, using each character string of the most preset quantity of frequency of occurrence as latest keywords;More new equipment is searched for, the search of candidate the whole network bibliography is reset based on each latest keywords.By means of the invention it is possible to realize the real-time update of the keyword of search according to text editing situation.
Description
Technical field
The present invention relates to paper editor field more particularly to a kind of candidate the whole network bibliography real-time update platform and it is
System.
Background technique
Data is to constitute the basis of thesis writing.Determine the selected topic, be designed and it is necessary observation with experiment after,
The collection and processing work for carrying out data, are the further preparations done by thesis writing.
Thesis writing data can be divided into the firsthand information and two class of secondary data.The former is also referred to as primary data or straight
Data is connect, refers to that author participates in investigation, research or the thing observed and learnt in person, if recorded done in experiment or observation etc.,
Belong to this kind of data;The latter is also referred to as secondary data or secondary source, refers to related profession or document feature information, mainly
By study accumulation usually.On the basis of obtaining enough data, to be also processed, be allowed to systematization and methodization,
Convenient for application.For thesis writing, these two types of data be all it is essential, they are properly applied into paper and write
In work, pay attention to distinguishing primary and secondary, suitably to be quoted on the basis of abundant digest and assimilate especially for documents and materials, it should not noisy guest
Take master by force.The utilization of the firsthand information to also be accomplished true, accurate, errorless.
In the epoch of current information explosion, only manually its efficiency of mode is very low for the collection of the data of thesis writing
Under, the general search that the whole network document is carried out in such a way that user inputs keyword, however, the keyword of this user subjectivity
The mode of determination necessarily to have an inborn precision insufficient, can not reflect the true of the paper that active user edits accurately, comprehensively
Content.
Summary of the invention
It to solve the above-mentioned problems, can be right the present invention provides a kind of candidate the whole network bibliography real-time update platform
The ranking results of the frequency of occurrence of each character string are chosen automatically in the Web-page screen that user is being edited waits for searching for
The keyword of the whole network bibliography is selected, is extracted to improve the intelligent of search keyword, more it is essential that wherein going back root
The position occurred according to each character string, which is determined, carries out the different weights of frequency of occurrence statistics to it, thus to appear in table or
Character string in formula gives the weight inclination of determining keyword.
According to an aspect of the present invention, a kind of candidate the whole network bibliography real-time update platform, the platform packet are provided
It includes:
Webpage grabs screen equipment, is arranged in the terminal of operation webpage, the Web-page screen for being edited to user
It carries out grabbing screen operation, grabs screen image to obtain webpage;
Text box detection device is grabbed screen equipment with the webpage and is connect, and grabs screen image for receiving the webpage, and be based on
Text box imaging features identify that webpage grabs each image-region where each text box difference in screen image;
OCR identify equipment, connect with the text box detection device, for each image-region received respectively into
Row OCR is identified to obtain corresponding multiple character strings;
Character string sorting equipment is connect with OCR identification equipment, for multiple character strings of each image-region to be united
One carries out the sequence of frequency of occurrence order, using each character string of the most preset quantity of frequency of occurrence as latest keywords;
More new equipment is searched for, is connect with the character string sorting equipment, for based on each latest keywords received
The search of candidate the whole network bibliography is reset, it is multiple with reference to the corresponding multiple texts of periodical needed for editing paper to obtain
Shelves;
Wherein, OCR identification equipment further includes OCR recognition unit, Table recognition unit and formulas solutions unit, described
OCR recognition unit is used to carry out each image-region received respectively OCR identification to obtain corresponding multiple character strings;
Wherein, the Table recognition unit is connect with the OCR recognition unit, for determining that the OCR recognition unit obtains
Where whether each character string obtained is located at it within the scope of table of image-region, and the character is authorized based on definitive result
The different multiples of the frequency of occurrence of string;
Wherein, the formulas solutions unit is connect with the OCR recognition unit, for determining that the OCR recognition unit obtains
Where whether each character string obtained is located at it within the scope of formula of image-region, and the character is authorized based on definitive result
The different multiples of the frequency of occurrence of string.
According to another aspect of the present invention, a kind of candidate the whole network bibliography real-time update system, feature are additionally provided
It is, obtained system includes: memory and processor, and the processor is connect with the memory;The memory, for depositing
Store up the executable instruction of the processor;The processor is made for calling the executable instruction in the memory with realizing
It is candidate complete for searching for be realized according to text editing situation with candidate the whole network bibliography real-time update platform as described above
The method of the real-time update of the keyword of net bibliography.
Bibliography cannonical format can the type based on bibliography be simply summarized as follows:
The type of bibliography (i.e. quotation source) is identified in a manner of single-letter, specifically:
M --- monograph;C --- collection of thesis;N --- newspaper article;J --- journal of writings;D --- academic dissertation;R——
Report;For being not belonging to above-mentioned document type, using alphabetical " Z " mark.
For English reference, it should also be noted that following two points:
1. author's name use " surname is in preceding name rear " principle, specific format is: surname, the initial of name such as:
Malcolm Richard Cowley is answered are as follows: and Cowley, M.R., if there is two authors, first author's mode is constant, it
The initial of second author name is placed on front afterwards, and surname is put behind, and such as: Frank Norris and Irving Gordon is answered
Are as follows: Norris, F.&I.Gordon.;
2. title, newpapers and periodicals name use italics, such as: Mastering English Literature, English
Weekly。
The present invention at least has inventive point crucial at following two:
(1) each image-region where webpage grabs each text box in screen image respectively is identified, to each image
Region carries out text string extracting respectively, and the ranking results of the frequency of occurrence of each character string based on each image-region are automatic
It chooses for searching for the keyword of candidate the whole network bibliography, is extracted to improve the intelligent of search keyword;
(2) position occurred according to each character string determines the different weights that frequency of occurrence statistics is carried out to it, thus
The weight inclination of determining keyword is given to the character string appeared in table or formula.
Detailed description of the invention
Embodiment of the present invention is described below with reference to attached drawing, in which:
Fig. 1 is the structure box according to candidate the whole network bibliography real-time update platform shown in embodiment of the present invention
Figure.
Fig. 2 is according to candidate the whole network bibliography real-time update platform candidate obtained shown in embodiment of the present invention
The interface schematic diagram of the search result of the whole network bibliography.
Specific embodiment
Below with reference to accompanying drawings to the reality of candidate the whole network bibliography real-time update platform and corresponding system of the invention
The scheme of applying is described in detail.
In the prior art, when user's Paper Writing, it usually needs scanned for the whole network document to obtain multiple references
Document is referred to and is used, wherein the general search that the whole network document is carried out in such a way that user inputs keyword, however,
The mode of the determination of the keyword of this user's subjectivity necessarily has inborn precision deficiency, can not reflect accurately, comprehensively current
The true content for the paper that user edits.
In order to overcome above-mentioned deficiency, the present invention has built a kind of candidate the whole network bibliography real-time update platform and corresponding
System can effectively solve the problem that corresponding technical problem.
Fig. 1 is the structure box according to candidate the whole network bibliography real-time update platform shown in embodiment of the present invention
Figure, the platform include:
Webpage grabs screen equipment, is arranged in the terminal of operation webpage, the Web-page screen for being edited to user
It carries out grabbing screen operation, grabs screen image to obtain webpage;
Text box detection device is grabbed screen equipment with the webpage and is connect, and grabs screen image for receiving the webpage, and be based on
Text box imaging features identify that webpage grabs each image-region where each text box difference in screen image;
OCR identify equipment, connect with the text box detection device, for each image-region received respectively into
Row OCR is identified to obtain corresponding multiple character strings;
Character string sorting equipment is connect with OCR identification equipment, for multiple character strings of each image-region to be united
One carries out the sequence of frequency of occurrence order, using each character string of the most preset quantity of frequency of occurrence as latest keywords;
More new equipment is searched for, is connect with the character string sorting equipment, for based on each latest keywords received
The search of candidate the whole network bibliography is reset, it is multiple with reference to the corresponding multiple texts of periodical needed for editing paper to obtain
Shelves;
Wherein, OCR identification equipment further includes OCR recognition unit, Table recognition unit and formulas solutions unit, described
OCR recognition unit is used to carry out each image-region received respectively OCR identification to obtain corresponding multiple character strings;
Wherein, the Table recognition unit is connect with the OCR recognition unit, for determining that the OCR recognition unit obtains
Where whether each character string obtained is located at it within the scope of table of image-region, and the character is authorized based on definitive result
The different multiples of the frequency of occurrence of string;
Wherein, the formulas solutions unit is connect with the OCR recognition unit, for determining that the OCR recognition unit obtains
Where whether each character string obtained is located at it within the scope of formula of image-region, and the character is authorized based on definitive result
The different multiples of the frequency of occurrence of string.
Then, continue to carry out the specific structure of candidate the whole network bibliography real-time update platform of the invention further
Explanation.
In candidate's the whole network bibliography real-time update platform:
In the Table recognition unit, whether each character string for determining that the OCR recognition unit obtains is located at it
Within the scope of the table of place image-region, and authorize based on definitive result the different multiples packet of the frequency of occurrence of the character string
It includes: when the character string for determining the OCR recognition unit acquisition is located at where it within the scope of table of image-region, by the character
The frequency of occurrence of string increases n times, and wherein N is natural number and is greater than 1.
In candidate's the whole network bibliography real-time update platform:
In the formulas solutions unit, whether each character string for determining that the OCR recognition unit obtains is located at it
Within the scope of the formula of place image-region, and authorize based on definitive result the different multiples packet of the frequency of occurrence of the character string
It includes: when the character string for determining the OCR recognition unit acquisition is located at where it within the scope of formula of image-region, by the character
The frequency of occurrence of string increases M times, and wherein M is natural number and is greater than 1.
In candidate's the whole network bibliography real-time update platform:
In the Table recognition unit, whether each character string for determining that the OCR recognition unit obtains is located at it
Within the scope of the table of place image-region, and authorize based on definitive result the different multiples packet of the frequency of occurrence of the character string
It includes: when the character string for determining the OCR recognition unit acquisition is not located at where it within the scope of table of image-region, by the word
The frequency of occurrence of symbol string increases by 1 time.
In candidate's the whole network bibliography real-time update platform:
In the formulas solutions unit, whether each character string for determining that the OCR recognition unit obtains is located at it
Within the scope of the formula of place image-region, and authorize based on definitive result the different multiples packet of the frequency of occurrence of the character string
It includes: when the character string for determining the OCR recognition unit acquisition is not located at where it within the scope of formula of image-region, by the word
The frequency of occurrence of symbol string increases by 1 time.
In candidate's the whole network bibliography real-time update platform:
In OCR identification equipment, M is greater than N.
In candidate's the whole network bibliography real-time update platform:
In OCR identification equipment, M value is that 4, N value is 2.
In candidate's the whole network bibliography real-time update platform:
In OCR identification equipment, the OCR recognition unit, the Table recognition unit and the formulas solutions list
The asic chip of different model is respectively adopted to realize in member.
Can also include: in candidate's the whole network bibliography real-time update platform
Instant playback equipment is connect with the character string sorting equipment, for the multiple of each image-region of instant playback
Character string uniformly carries out the ranking results of frequency of occurrence order.
Meanwhile in order to overcome above-mentioned deficiency, the present invention has also built a kind of candidate the whole network bibliography real-time update system,
Obtained system includes: memory and processor, and the processor is connect with the memory;
Wherein, the memory, for storing the executable instruction of the processor;
Wherein, the processor, for calling the executable instruction in the memory, to realize using as described above
Candidate the whole network bibliography real-time update platform according to text editing situation to realize for searching for candidate the whole network bibliography
The method of the real-time update of keyword.
Fig. 2 is according to candidate the whole network bibliography real-time update platform candidate obtained shown in embodiment of the present invention
The interface schematic diagram of the search result of the whole network bibliography.
As shown in Fig. 2, resetting candidate the whole network ginseng based on each latest keywords received in described search more new equipment
After the search for examining document, the corresponding multiple documents of multiple reference periodicals needed for editing paper are obtained, it is the multiple
The relevant information of document is shown on the interface of Fig. 2;
Wherein, Fig. 2 is obtained after the search for being reset candidate the whole network bibliography based on each latest keywords received
It has arrived and has amounted to 1329298 documents, the corresponding result information of 1329298 documents has been shown, due to the limitation of length, In
Paging list display has been carried out in Fig. 2;
Wherein, first page has two periodicals, and entitled " the analysis computer network security " of first periodical, author is Liu
Intelligence is strong, and source is " Heilungkiang scientific and technological information " the 31st phase in 2011, and entitled " the computer network security spy of second periodical
Analysis ", author Wu Hailiang, source are " Sci-tech Pioneering monthly magazine " the 13rd phase in 2011.
In addition, OCR (Optical Character Recognition, optical character identification) refer to electronic equipment (such as
Scanner or digital camera) check the character printed on paper, its shape is determined by the mode for detecting dark, bright, then uses character
Shape is translated into the process of computword by recognition methods;That is, it is directed to printed character, it is using optical mode that papery is literary
Text conversion in shelves becomes the image file of black and white lattice, and passes through identification software for the text conversion in image into text lattice
Formula, the technology further edited and processed for word processor.How except mistake or using auxiliary information raising recognition correct rate, it is
Therefore the most important project of OCR, the noun of ICR (Intelligent Character Recognition) also generate.It measures
One OCR system performance quality refers mainly to indicate: reject rate, misclassification rate, recognition speed, the friendly of user interface, product
Stability, ease for use and feasibility etc..
The concept of OCR is to be put forward at first in nineteen twenty-nine by Germany scientist Tausheck, later American scientist
Handel also proposed the idea identified using technology to text.And it is to what printed Chinese character identification was studied earliest
The Casey and Nagy of IBM Corporation, they have delivered first article about Chinese Character Recognition within 1966, use template matching
Method identifies 1000 printed Chinese characters.
Early in the 60, seventies, countries in the world begin to the research of OCR, and the initial stage studied, mostly with the identification side of text
Based on method research, and the text identified is only 0 to 9 number.For equally possessing the Japan of ideographic language, or so nineteen sixty
Begin one's study OCR basic identification it is theoretical, initial stage is using number as object, until beginning between 1965 to 1970 years some simple
Product identify the postcode on mail such as the postcode identifying system of printing word, help post office to make region point letter
Operation;Also therefore so far postcode is always the address ways of writing that various countries are advocated.
The Chinese Character Recognition in the early 1970s, scholar of Japan begins one's study, and done a large amount of work.China is in OCR
The research work of technical aspect is started late, and just starts to study the identification of number, English alphabet and symbol in the seventies,
The research for starting progress Chinese Character Recognition the end of the seventies, by 1986, China proposed " 863 " high and new technology project, and Chinese character is known
Other research enters a substantive stage, and the Ding Xiaoqing professor of Tsinghua University and the Chinese Academy of Sciences distinguish developmental research, push away in succession
Chinese OCR product is gone out, has been now the most leading Chinese character OCR technique of China.The OCR software of early stage, due to discrimination and commercialization etc.
Various factors fail to reach actual requirement.Simultaneously as hardware device is at high cost, the speed of service is slow, does not also reach real
Degree.Only individual departments, such as information departments, journalism unit use OCR software.Into the 1990s with
Afterwards, with the extensive use of falt bed scanner, and universal, the promotion significantly of Chinese information automation and office automation
The further development of OCR technique, makes the recognition correct rate of OCR, recognition speed meet the requirement of users.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned
In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage
Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware
Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal
Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene
Programmable gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries
It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium
In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above
State the embodiment of the present invention, it is to be understood that above-described embodiment is exemplary, and is not considered as limiting the invention,
Those skilled in the art can make changes, modifications, alterations, and variations to the above described embodiments within the scope of the invention.
Claims (10)
1. a kind of candidate's the whole network bibliography real-time update platform characterized by comprising
Webpage grabs screen equipment, is arranged in the terminal of operation webpage, and the Web-page screen for being edited to user carries out
Screen operation is grabbed, grabs screen image to obtain webpage;
Text box detection device is grabbed screen equipment with the webpage and is connect, and grabs screen image for receiving the webpage, and be based on text
Frame imaging features identify that webpage grabs each image-region where each text box difference in screen image;
OCR identifies equipment, connect with the text box detection device, for carrying out respectively to each image-region received
OCR is identified to obtain corresponding multiple character strings;
Character string sorting equipment connect with OCR identification equipment, for by multiple character strings of each image-region uniformly into
The sequence of row frequency of occurrence order, using each character string of the most preset quantity of frequency of occurrence as latest keywords;
More new equipment is searched for, is connect with the character string sorting equipment, for based on each latest keywords resetting received
The search of candidate the whole network bibliography, it is multiple with reference to the corresponding multiple documents of periodical needed for editing paper to obtain;
Wherein, the OCR identification equipment further includes OCR recognition unit, Table recognition unit and formulas solutions unit, the OCR
Recognition unit is used to carry out each image-region received respectively OCR identification to obtain corresponding multiple character strings;
Wherein, the Table recognition unit is connect with the OCR recognition unit, for determining what the OCR recognition unit obtained
Where whether each character string is located at it within the scope of table of image-region, and the character string is authorized based on definitive result
The different multiples of frequency of occurrence;
Wherein, the formulas solutions unit is connect with the OCR recognition unit, for determining what the OCR recognition unit obtained
Where whether each character string is located at it within the scope of formula of image-region, and the character string is authorized based on definitive result
The different multiples of frequency of occurrence.
2. candidate's the whole network bibliography real-time update platform as described in claim 1, it is characterised in that:
In the Table recognition unit, determine whether each character string that the OCR recognition unit obtains is located at where it
It within the scope of the table of image-region, and include: to work as based on the different multiples that definitive result authorizes the frequency of occurrence of the character string
Determine that the character string that the OCR recognition unit obtains is located within the scope of the table of its place image-region, by the character string
Frequency of occurrence increases n times, and wherein N is natural number and is greater than 1.
3. candidate's the whole network bibliography real-time update platform as claimed in claim 2, it is characterised in that:
In the formulas solutions unit, determine whether each character string that the OCR recognition unit obtains is located at where it
It within the scope of the formula of image-region, and include: to work as based on the different multiples that definitive result authorizes the frequency of occurrence of the character string
Determine that the character string that the OCR recognition unit obtains is located within the scope of the formula of its place image-region, by the character string
Frequency of occurrence increases M times, and wherein M is natural number and is greater than 1.
4. candidate's the whole network bibliography real-time update platform as claimed in claim 3, it is characterised in that:
In the Table recognition unit, determine whether each character string that the OCR recognition unit obtains is located at where it
It within the scope of the table of image-region, and include: to work as based on the different multiples that definitive result authorizes the frequency of occurrence of the character string
Determine that the character string that the OCR recognition unit obtains is not located within the scope of the table of its place image-region, by the character string
Frequency of occurrence increase by 1 time.
5. candidate's the whole network bibliography real-time update platform as claimed in claim 4, it is characterised in that:
In the formulas solutions unit, determine whether each character string that the OCR recognition unit obtains is located at where it
It within the scope of the formula of image-region, and include: to work as based on the different multiples that definitive result authorizes the frequency of occurrence of the character string
Determine that the character string that the OCR recognition unit obtains is not located within the scope of the formula of its place image-region, by the character string
Frequency of occurrence increase by 1 time.
6. candidate's the whole network bibliography real-time update platform as claimed in claim 5, it is characterised in that:
In OCR identification equipment, M is greater than N.
7. candidate's the whole network bibliography real-time update platform as claimed in claim 6, it is characterised in that:
In OCR identification equipment, M value is that 4, N value is 2.
8. candidate's the whole network bibliography real-time update platform as claimed in claim 7, it is characterised in that:
In OCR identification equipment, the OCR recognition unit, the Table recognition unit and the formulas solutions unit point
It is not realized using the asic chip of different model.
9. candidate's the whole network bibliography real-time update platform as claimed in claim 8, which is characterized in that the platform also wraps
It includes:
Instant playback equipment is connect with the character string sorting equipment, multiple characters for each image-region of instant playback
The unified ranking results for carrying out frequency of occurrence order of string.
10. a kind of candidate's the whole network bibliography real-time update system, which is characterized in that obtained system includes: memory and processing
Device, the processor are connect with the memory;
The memory, for storing the executable instruction of the processor;
The processor, for calling the executable instruction in the memory, to realize using such as any institute of claim 1-9
The candidate the whole network bibliography real-time update platform stated according to text editing situation to realize for searching for candidate the whole network with reference to text
The method of the real-time update for the keyword offered.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910722763.XA CN110489570A (en) | 2019-08-06 | 2019-08-06 | Candidate the whole network bibliography real-time update platform and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910722763.XA CN110489570A (en) | 2019-08-06 | 2019-08-06 | Candidate the whole network bibliography real-time update platform and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110489570A true CN110489570A (en) | 2019-11-22 |
Family
ID=68549576
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910722763.XA Pending CN110489570A (en) | 2019-08-06 | 2019-08-06 | Candidate the whole network bibliography real-time update platform and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110489570A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111880697A (en) * | 2020-08-07 | 2020-11-03 | 北京搜狗科技发展有限公司 | Encyclopedic data processing method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102542273A (en) * | 2011-12-02 | 2012-07-04 | 方正国际软件有限公司 | Detection method and system for complex formula areas in document image |
CN102591475A (en) * | 2011-12-29 | 2012-07-18 | 北京百度网讯科技有限公司 | Content input method and system for online editor |
CN103559310A (en) * | 2013-11-18 | 2014-02-05 | 广东利为网络科技有限公司 | Method for extracting key word from article |
CN104615640A (en) * | 2014-11-28 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Method and device for providing searching keywords and carrying out searching |
CN105264486A (en) * | 2012-12-18 | 2016-01-20 | 汤姆森路透社全球资源公司 | Mobile-enabled systems and processes for intelligent research platform |
CN109144954A (en) * | 2018-09-18 | 2019-01-04 | 天津字节跳动科技有限公司 | Edit resource recommendation method, device and the electronic equipment of document |
-
2019
- 2019-08-06 CN CN201910722763.XA patent/CN110489570A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102542273A (en) * | 2011-12-02 | 2012-07-04 | 方正国际软件有限公司 | Detection method and system for complex formula areas in document image |
CN102591475A (en) * | 2011-12-29 | 2012-07-18 | 北京百度网讯科技有限公司 | Content input method and system for online editor |
CN105264486A (en) * | 2012-12-18 | 2016-01-20 | 汤姆森路透社全球资源公司 | Mobile-enabled systems and processes for intelligent research platform |
CN103559310A (en) * | 2013-11-18 | 2014-02-05 | 广东利为网络科技有限公司 | Method for extracting key word from article |
CN104615640A (en) * | 2014-11-28 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Method and device for providing searching keywords and carrying out searching |
CN109144954A (en) * | 2018-09-18 | 2019-01-04 | 天津字节跳动科技有限公司 | Edit resource recommendation method, device and the electronic equipment of document |
Non-Patent Citations (1)
Title |
---|
第03期: ""复杂版面文档图像中公式与文本的提取及分析"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111880697A (en) * | 2020-08-07 | 2020-11-03 | 北京搜狗科技发展有限公司 | Encyclopedic data processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102053991B (en) | Method and system for multi-language document retrieval | |
Wilkinson et al. | Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections | |
JP2012529108A (en) | Lighting system and language detection | |
CN109344914A (en) | A kind of method and system of the Text region of random length end to end | |
US8208726B2 (en) | Method and system for optical character recognition using image clustering | |
CN109062792A (en) | A kind of Open Source Code detection method based on String matching and characteristic matching | |
CN102591475A (en) | Content input method and system for online editor | |
Isheawy et al. | Optical character recognition (ocr) system | |
CN107562843B (en) | News hot phrase extraction method based on title high-frequency segmentation | |
Valy et al. | A new khmer palm leaf manuscript dataset for document analysis and recognition: Sleukrith set | |
CN108197119A (en) | The archives of paper quality digitizing solution of knowledge based collection of illustrative plates | |
US10970489B2 (en) | System for real-time expression of semantic mind map, and operation method therefor | |
CN110209759B (en) | Method and device for automatically identifying page | |
CN109074355B (en) | Method and medium for ideographic character analysis | |
Fischer et al. | Handwritten historical document analysis, recognition, and retrieval-state of the art and future trends | |
Shapira et al. | Massive multi-document summarization of product reviews with weak supervision | |
CN110489570A (en) | Candidate the whole network bibliography real-time update platform and system | |
CN112464907A (en) | Document processing system and method | |
Ohta et al. | CRF-based bibliography extraction from reference strings focusing on various token granularities | |
CN100444194C (en) | Automatic extraction device, method and program of essay title and correlation information | |
CN107562932A (en) | The academic reference of books data in literature acquisition method of Chinese | |
Karambelkar et al. | Automated Text Extraction from Images using Optical Character Recognition. | |
CN113722421A (en) | Contract auditing method and system and computer readable storage medium | |
JP2010092108A (en) | Similar sentence extraction program, method, and apparatus | |
CN105335416A (en) | Content extraction method, content extraction apparatus and content extraction system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20211130 Address after: 1501-1, floor 15, No. 19, Chegongzhuang West Road, Haidian District, Beijing 100048 Applicant after: Super Intellectual Property Consultant (Beijing) Co.,Ltd. Address before: 12a-3-110, block D, 12 / F, No. 28, information road, Haidian District, Beijing 100085 Applicant before: Beijing Ruyou Education Technology Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20191122 |
|
WD01 | Invention patent application deemed withdrawn after publication |