CN105868177A - Universal formula search method - Google Patents

Universal formula search method Download PDF

Info

Publication number
CN105868177A
CN105868177A CN201610171766.5A CN201610171766A CN105868177A CN 105868177 A CN105868177 A CN 105868177A CN 201610171766 A CN201610171766 A CN 201610171766A CN 105868177 A CN105868177 A CN 105868177A
Authority
CN
China
Prior art keywords
mathematical
document
mathematical formulae
index
checked
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610171766.5A
Other languages
Chinese (zh)
Inventor
赵华
孟凡
孟一凡
吕清
蔡迢阳
任玉伟
董冬立
马程程
刘少松
张旭论
王旭丹
贾苗
刘金星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Normal University
Original Assignee
Hebei Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Normal University filed Critical Hebei Normal University
Priority to CN201610171766.5A priority Critical patent/CN105868177A/en
Publication of CN105868177A publication Critical patent/CN105868177A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a universal formula search method. The method comprises the following steps of establishing a universal formula search engine; running a plurality of network crawler processes by a searcher; extracting a mathematic formula in a document from an original webpage database by an indexer; performing mathematic formula query by a querier through a mathematic formula index and mathematic symbol dictionary database; and returning the document containing the queried mathematic formula in the original webpage database to a user by the querier and displaying the document in a search completion interface. The method has the beneficial effects that a quick and accurate mathematic formula search scheme is provided for scientific research personnel and teaching staffs, a theoretical support is provided for scientific research, and extensive and accurate material search schemes are provided for teaching and science popularization; and dedicated teaching formula search interfaces are provided for various document databases, the application and business ranges of the document databases are expanded, and payment download interfaces are provided for increasing the benefits of the document databases.

Description

A kind of general formula searching method
Technical field
The invention belongs to search engine technique field, relate to a kind of general formula searching method.
Background technology
For carrying out mathematical formulae search, two methods are typically had to select.
First method is a method gradually evolved, by extension on existing text search system and mathematics phase The function of search adapted to.The method, on the basis of the string representation of mathematical formulae, utilizes traditional text retrieval side Method search mathematical formulae.This method is due to text search system based on existing maturation, and workload is little.
Second method is to create a brand-new mathematical formulae search system, fully collects and index mathematical material, profit Scan for by the structure in the content representation form of mathematical formulae, start from scratch completely, when this method is to need more Between and energy.The method is wanted careful use and integrate various computer algebra and symbol-manipulation technique, simultaneously by not being required nothing more than Also require exploitation novelty index and search technique, and the research of this respect does not the most also start to.Certainly, it is adaptable to computer generation The mathematic(al) representation analytic technique of number system and compiler has been developed and has been achieved.These technology can and also should be used.
Use first method be DLMF (Digital Library of Mathematical Functions) and ActiveMath system.It is the most indexed that mathematical formulae is converted into textual form.Search string is similar to LaTex order, Search is performed after being converted into character string.This searches for mathematical material while allowing search plain text, but it can not carry For strong mathematical formulae function of search.
One similar method is to use XQuery search engine based on XML.Having of both approaches is identical excellent Gesture is to rely on existing technology, but they are not the most provided that a searching method being perfectly facing mathematical formulae.
Take second method is MBase system, and it uses the pattern match of programming language to find in knowledge base The mathematics terms of OMDoc coding [24].The search engine of HELM system is to from the mathematical formulae that Content MathML represents The structural metadata extracted is indexed, to provide effective retrieval.According to being that metadata is similar to formula structure, can conduct A kind of filter of large-scale terminological data bank.But, due to complete formula structure information dropout, semantic equivalence cannot be protected Card.
Summary of the invention
The technical problem to be solved be to provide a kind of can be by mathematical formulae precise search web document General formula searching method.
Be the technical scheme is that a kind of general formula searching method by solving above-mentioned technical problem, it includes as follows Step:
(1) general formula search engine is set up;Described general formula search engine includes:
Searcher, for roaming, finding and collect mathematical formulae in the Internet;
Index, is used for setting up mathematical formulae index;
Requestor, for query statement is converted to query task, gives index, completes inquiry, and returns result to use Family;
(2) set up mathematical symbol dictionary database, and distribute one No. ID, as mathematical symbol dictionary to every kind of mathematical symbol ID;
(3) described searcher runs multiple web crawlers processes;Web crawlers collects webpage from network, it is judged that document in webpage Whether comprise mathematical formulae;If comprising mathematical formulae, then downloading described document, storing after being compressed described document processing In raw page data storehouse;
(4) described index extracts the mathematical formulae in described document in described raw page data storehouse;Described index root According to described mathematical symbol dictionary database, described mathematical formulae is set up mathematical formulae to index;
(5), after described requestor receives inquiry request, described mathematical formulae index and mathematical symbol dictionary database is utilized to enter Row mathematical formulae is inquired about, and obtains the mathematical formulae inquired;
(6) document comprising, in raw page data storehouse, the mathematical formulae inquired is returned to user and shows by described requestor On the interface searched for.
Described set up mathematical formulae index concrete grammar as follows:
A. the mathematical formulae in described document is converted to text-string form mathematical formulae;
B. described text-string form mathematical formulae is carried out participle, described text-string form mathematical formulae is decomposed into Mathematical symbol, records described mathematical symbol positional information in described text-string form mathematical formulae simultaneously;
C. utilize mathematical symbol dictionary database, described mathematical symbol is converted to and described literary composition corresponding for mathematical symbol dictionary ID Shelves mathematical symbol ID;
D. according to document mathematical symbol ID, mathematical symbol set up document mathematics notation index;
F. document mathematics notation index table is set up according to setting up document mathematics notation index.
Described mathematical formulae query steps is as follows:
The most described requestor receives mathematical formulae to be checked;
B. judge whether described mathematical formulae to be checked is text formatting;If described mathematical formulae to be checked is not text lattice Formula, carries out text formatting conversion by described mathematical formulae to be checked, is converted to text formatting mathematical formulae to be checked;
C. mathematical formulae to be checked to described text formatting carries out participle, and is decomposed into be checked by mathematical formulae to be checked for this form Ask mathematical symbol, record the described mathematical symbol to be checked positional information in mathematical formulae to be checked simultaneously;
D. mathematical symbol to be checked is converted to corresponding with described mathematical symbol dictionary ID according to mathematical symbol dictionary database Mathematical symbol ID to be checked;
E. in concordance list, inquire about document mathematical symbol ID, obtain and described Query Result consistent for mathematical symbol ID to be checked;
F., Query Result utilizes KNN algorithm carry out computing, obtains the mathematical formulae inquired;
G. the document comprising the mathematical formulae content inquired in raw page data storehouse is returned to user.
Described mathematical formulae index includes towards the Presentation index represented and the Content rope of Semantic-Oriented Draw.
Described requestor receives the typing mode of mathematical formulae to be checked and includes that structured document typing mode and image are adopted Collection typing mode;Described structured document typing mode is latex structured document typing mode, word formula editors structure Change document typing mode and pdf structured document typing mode;Described image acquisition typing mode includes insert pictures mode, screen Curtain sectional drawing mode, camera collection mode, scanner acquisition mode and high photographing instrument acquisition mode.
The document of the described user of returning to includes the documents and materials that the derivation of formula is relevant to formula;Described return It is PPT, Word, Latex or PDF to the structured document form of the document of user.
The document of the described user of returning to according to similarity, drawn the frequency and public's scoring carries out the division of priority, root It is ranked up according to the division of priority and shows on the interface searched for.
Its contain respectively with middle National IP Network, ten-thousand-ton train, dragon citation journals interface, it is provided that under the paying of the document retrieved Carrying function, in supporting respectively, the reimbursement that has of National IP Network, ten-thousand-ton train and dragon citation journals is downloaded.
The invention has the beneficial effects as follows: the present invention provides quick and accurate mathematical formulae for scientific research personnel and faculty Search plan, provides theories integration for scientific research, also provides material collection side extensively and accurately for teaching and scientific popularization Case;The present invention provides special mathematical formulae searching interface for each big bibliographic data base, expand bibliographic data base range and The scope of business, it is provided that paying download interface increases bibliographic data base income.To sum up, the present invention has higher economic benefit and society Can benefit.
Accompanying drawing explanation
Fig. 1 is for extracting mathematical formulae and carrying out conversion process flow chart from webpage.
Fig. 2 is for setting up mathematical symbol index flow chart.
Fig. 3 is for asking mathematical formulae querying flow figure.
Detailed description of the invention
Below in conjunction with Fig. 1-3 and specific embodiment, the present invention is described further.
Embodiment one:
Set up index database, common-used formula and corresponding ID according to original mathematical dictionary to be indexed;Prepare initial data, to original Data carry out pretreatment, and setup parameter K;Input formula in a browser, and utilize participle technique to be converted on backstage Latex form is ax^{2}+bx+c=0;Run search device, searcher runs multiple crawlers and counts parallel in webpage According to search;The page download containing content ax^{2}+bx+c=0 in webpage being got off, compression storage is to raw page data storehouse; Utilize participle technique to be broken down into ax^{2}, bx, c according to content to be indexed;Concordance list and former is combined according to web page contents The mathematics dictionary storehouse begun, generates formula to be output;Use KNN algorithm, setup parameter K, safeguard size be K by Europe several in Must be used for storing training tuple (formula the most to be output) apart from descending priority query.By webpage prime formula Latex Form is as test tuple;From training tuple, choose at random K tuple as initial arest neighbors tuple, calculate test respectively Training unit's deck label and distance, to the distance of this K tuple, are stored in priority query by tuple;Travel through complete, calculate priority Most classes of K tuple in queue, and as the classification of test tuple;Test tuple calculates error rate after being completed, Continue to set different K value re-training, finally take the K value that error rate is minimum, be converted to similarity;Finally according to similarity, Take front 30% according to similarity degree, return to user.
Embodiment two:
Set up index database, common-used formula and corresponding ID according to original mathematical dictionary to be indexed;Prepare initial data, to original Data carry out pretreatment, and setup parameter K;Input formula in a browser, and utilize participle technique to be converted on backstage Latex form be sin left (3x+ frac{ pi}{6} right);Run search device, searcher runs multiple climbing Worm program carries out data search in webpage parallel;By in webpage containing content sin left (3x+ frac{ pi}{6} Right) page download is got off, and compression storage is to raw page data storehouse;Participle technique is utilized to be decomposed according to content For sin, left (, 3x, frac{ pi}{6}, right) be indexed;Concordance list and former is combined according to web page contents The mathematics dictionary storehouse begun, generates formula to be output;Use KNN algorithm, setup parameter K, safeguard size be K by Europe several in Must be used for storing training tuple (formula the most to be output) apart from descending priority query.By webpage prime formula Latex Form is as test tuple;From training tuple, choose at random K tuple as initial arest neighbors tuple, calculate test respectively Training unit's deck label and distance, to the distance of this K tuple, are stored in priority query by tuple;Travel through complete, calculate priority Most classes of K tuple in queue, and as the classification of test tuple;Test tuple calculates error rate after being completed, Continue to set different K value re-training, finally take the K value that error rate is minimum, be converted to similarity;Finally according to similarity, Take front 30% according to similarity degree, return to user.
Embodiment three:
Set up index database, common-used formula and corresponding ID according to original mathematical dictionary to be indexed;Prepare initial data, to original Data carry out pretreatment, and setup parameter K;Input formula in a browser, and utilize participle technique to be converted on backstage Latex form be lim_{n rightarrow infty left (1+ frac{1}{n} right) ^{n} run Searcher, searcher runs multiple crawlers and carries out data search in webpage parallel;By in webpage containing content lim_ { n rightarrow infty } the page download of left (1+ frac{1}{n} right) ^{n} get off, compression is deposited Storage is to raw page data storehouse;According to content utilize participle technique be broken down into lim_{n rightarrow infty , left (1+ frac{1}{n} right) ^{n}) be indexed;Concordance list and original is combined according to web page contents Mathematics dictionary storehouse, generates formula to be output;Use KNN algorithm, setup parameter K, safeguard size be K by Euclid away from From descending priority query, it is used for storing training tuple (formula the most to be output).By webpage prime formula Latex form As test tuple;From training tuple, choose at random K tuple as initial arest neighbors tuple, calculate respectively and test tuple To the distance of this K tuple, training unit's deck label and distance are stored in priority query;Travel through complete, calculate priority query Most classes of middle K tuple, and as the classification of test tuple;Test tuple calculates error rate after being completed, and continues Set different K value re-training, finally take the K value that error rate is minimum, be converted to similarity;Finally according to similarity, according to Similarity degree takes front 30%, returns to user.
The present invention establishes general formula search engine;Described general formula search engine includes:
Searcher, for roaming, finding and collect mathematical formulae in the Internet;
Index, is used for setting up mathematical formulae index;
Requestor, for query statement is converted to query task, gives index, completes inquiry, and returns result to use Family;
The present invention establishes mathematical symbol dictionary database, and distributes one No. ID, as mathematical symbol to every kind of mathematical symbol Dictionary ID;
Described searcher runs multiple web crawlers processes;Web crawlers collects webpage from network, it is judged that in webpage, document is No comprise mathematical formulae;If comprising mathematical formulae, then downloading described document, storing after being compressed described document processing and arriving In raw page data storehouse;
Described index extracts the mathematical formulae in described document in described raw page data storehouse;Described index is according to institute State mathematical symbol dictionary database and described mathematical formulae is set up mathematical formulae index;
After described requestor receives inquiry request, utilize described mathematical formulae index and mathematical symbol dictionary database number Formula is inquired about, and obtains the mathematical formulae inquired;
The document comprising, in raw page data storehouse, the mathematical formulae inquired is returned to user and shows by described requestor On the interface searched for.
Described set up mathematical formulae index concrete grammar as follows:
A. the mathematical formulae in described document is converted to text-string form mathematical formulae;
B. described text-string form mathematical formulae is carried out participle, described text-string form mathematical formulae is decomposed into Mathematical symbol, records described mathematical symbol positional information in described text-string form mathematical formulae simultaneously;
C. utilize mathematical symbol dictionary database, described mathematical symbol is converted to and described literary composition corresponding for mathematical symbol dictionary ID Shelves mathematical symbol ID;
D. according to document mathematical symbol ID, mathematical symbol set up document mathematics notation index;
F. document mathematics notation index table is set up according to setting up document mathematics notation index.
Described mathematical formulae query steps is as follows:
The most described requestor receives mathematical formulae to be checked;
B. judge whether described mathematical formulae to be checked is text formatting;If described mathematical formulae to be checked is not text lattice Formula, carries out text formatting conversion by described mathematical formulae to be checked, is converted to text formatting mathematical formulae to be checked;
C. mathematical formulae to be checked to described text formatting carries out participle, and is decomposed into be checked by mathematical formulae to be checked for this form Ask mathematical symbol, record the described mathematical symbol to be checked positional information in mathematical formulae to be checked simultaneously;
D. mathematical symbol to be checked is converted to corresponding with described mathematical symbol dictionary ID according to mathematical symbol dictionary database Mathematical symbol ID to be checked;
E. in concordance list, inquire about document mathematical symbol ID, obtain and described Query Result consistent for mathematical symbol ID to be checked;
F., Query Result utilizes KNN algorithm carry out computing, obtains the mathematical formulae inquired;
G. the document comprising the mathematical formulae content inquired in raw page data storehouse is returned to user.
Described mathematical formulae index includes towards the Presentation index represented and the Content rope of Semantic-Oriented Draw.
Described requestor receives the typing mode of mathematical formulae to be checked and includes that structured document typing mode and image are adopted Collection typing mode;Described structured document typing mode is latex structured document typing mode, word formula editors structure Change document typing mode and pdf structured document typing mode;Described image acquisition typing mode includes insert pictures mode, screen Curtain sectional drawing mode, camera collection mode, scanner acquisition mode and high photographing instrument acquisition mode.
The document of the described user of returning to includes the documents and materials that the derivation of formula is relevant to formula;Described return It is PPT, Word, Latex or PDF to the structured document form of the document of user.
The document of the described user of returning to according to similarity, drawn the frequency and public's scoring carries out the division of priority, root It is ranked up according to the division of priority and shows on the interface searched for.
Its contain respectively with middle National IP Network, ten-thousand-ton train, dragon citation journals interface, it is provided that under the paying of the document retrieved Carrying function, in supporting respectively, the reimbursement that has of National IP Network, ten-thousand-ton train and dragon citation journals is downloaded.
The present invention solves the frame retrieval Input of mathematical formulae.When user needs to retrieve formula, inventive algorithm needs Want to provide multiple typing mode, including structured document typing mode and image acquisition typing mode.Structured document is recorded Enter mode can with provide current conventional latex structured document mode and word formula editors structured document mode and Pdf structured document mode, provides the interface of other institutional document modes simultaneously;Image acquisition typing mode supports insertion figure Sheet mode and screenshot capture mode, can also provide the input modes such as photographic head, scanner or high photographing instrument simultaneously.
The present invention solves online (and off-line) precise search problem of mathematical formulae.User uses certain typing mode to record Enter need retrieval formula after, click on " search ", inventive algorithm by search the derivation of this formula, this formula be correlated with Documents and materials, including various structured document forms, the documents and materials of common formats as various in PPT, Word, Latex, PDF etc., Also have the website data relevant with this formula also will be retrieved.
The present invention can by the formula result that obtain of retrieval will according to similarity, drawn the frequency, public's scoring etc. and carried out preferentially The division of level, is ranked up according to the priority of retrieval result and shows on the interface searched for.
Search engine algorithms of the present invention contains and the interface of current each big bibliographic data base, to provide the literary composition retrieved The paying download function offered, supports that the reimbursement that has of each overall search mechanism is downloaded.
The present invention uses linear discriminant system algorithm and principal component analysis system algorithm that formula picture is carried out Similarity matching, Use, carry out data output according to similarity degree.The present invention uses in JavaScript language exploitation browser and fills out in list The formula of confiscating automatically generates Latex code.The present invention uses cosine-algorithm based on space vector for Latex code similarity Join output data.Independent research distributed full-text search system of the present invention, for structured document full-text searches such as PDF.
Searcher of the present invention runs multiple web crawlers processes, is responsible for crawling the webpage containing mathematically related content in network Document.Index is responsible for setting up mathematics index.Requestor is responsible for query statement is converted to query task, gives index, complete Become inquiry, and return result to user.Computer Algebra System sets up for index and query processing is helpful, can be complete Become necessary evaluation work.
The present invention first web crawlers collects webpage from network, it is judged that whether comprise mathematical formulae content in webpage, as Fruit have, then download the document, be compressed wait process after store in raw page data storehouse.Then, carry in original web page Take the mathematical information such as mathematical formulae therein, and it is carried out form conversion.The mathematical formulae of multiple format is converted to Latex Form.Secondly, index mathematical formulae is set up index.In order to both support based on semantic mathematical formulae inquiry, also support Based on the mathematical formulae inquiry represented, index establishes respectively towards the Presentation index represented and Semantic-Oriented Content indexes.There is provided two kinds of indexes, to support two kinds of inquiry modes.
After requestor receives inquiry request, query statement is resolved, find eligible the most again on index Mathematical material, return Query Result.User is returned to after Query Result is evaluated sequence.
The problem that mathematical symbol index first has to solve is exactly the design problem of mathematical symbol dictionary, needs various numbers Learn symbol to classify, join in dictionary.Mathematical symbol can be largely classified into: variable, numeral, operative symbol, mathematical function, Keyword etc..One No. ID is respectively allocated for each mathematical symbol.
The process of setting up of index mainly includes following three steps: a. mathematical formulae textual, by the mathematical formulae of different-format Be converted to text-string form.B. mathematical formulae is carried out participle, be decomposed into the combined sequence of each mathematical symbol.C. basis Mathematical symbol dictionary, is converted to the ID of correspondence by mathematical symbol.D. each mathematical symbol is indexed.Participle is by mathematics Formula Solution parser completes, and mathematical formulae carries out morphology and syntactic analysis, and mathematical formulae is decomposed into number one by one Learn symbol, simultaneously record mathematical symbol positional information in mathematical formulae.
Mathematical formulae query script mainly includes that the following steps: a. accepts the mathematical formulae of inquiry.B. textual, if The mathematical formulae of inquiry is not that text formatting is converted to text formatting.C. mathematical formulae is carried out participle, be decomposed into mathematics symbol Number, and the syntagmatic between record symbol.D. according to mathematical symbol dictionary, mathematical symbol is converted to No. ID.E. in concordance list Inquire about each mathematical symbol respectively, can be with executed in parallel.F. Query Result is combined computing, obtains final result.
The index of mathematical symbol index and querying flow are essentially identical with text search, simply adds additional logarithm Learning the special handling of formula, the particularly word segmentation processing of mathematical formulae, by being extended Lucene.
The above embodiment is only the preferred embodiments of the present invention, and and non-invention possible embodiments exhaustive. For persons skilled in the art, done any aobvious to it on the premise of without departing substantially from the principle of the invention and spirit And the change being clear to, within all should being contemplated as falling with the claims of the present invention.

Claims (8)

1. a general formula searching method, it is characterised in that comprise the steps:
Set up general formula search engine;Described general formula search engine includes:
Searcher, for roaming, finding and collect mathematical formulae in the Internet;
Index, is used for setting up mathematical formulae index;
Requestor, for query statement is converted to query task, gives index, completes inquiry, and returns result to use Family;
Set up mathematical symbol dictionary database, and distribute one No. ID, as mathematical symbol dictionary ID to every kind of mathematical symbol;
(3) described searcher runs multiple web crawlers processes;Web crawlers collects webpage from network, it is judged that document in webpage Whether comprise mathematical formulae;If comprising mathematical formulae, then downloading described document, storing after being compressed described document processing In raw page data storehouse;
(4) described index extracts the mathematical formulae in described document in described raw page data storehouse;Described index root According to described mathematical symbol dictionary database, described mathematical formulae is set up mathematical formulae to index;
(5), after described requestor receives inquiry request, described mathematical formulae index and mathematical symbol dictionary database is utilized to enter Row mathematical formulae is inquired about, and obtains the mathematical formulae inquired;
(6) document comprising, in raw page data storehouse, the mathematical formulae inquired is returned to user and shows by described requestor On the interface searched for.
A kind of general formula searching method the most according to claim 1, it is characterised in that described mathematical formulae of setting up indexes Concrete grammar as follows:
A. the mathematical formulae in described document is converted to text-string form mathematical formulae;
B. described text-string form mathematical formulae is carried out participle, described text-string form mathematical formulae is decomposed into Mathematical symbol, records described mathematical symbol positional information in described text-string form mathematical formulae simultaneously;
C. utilize mathematical symbol dictionary database, described mathematical symbol is converted to and described literary composition corresponding for mathematical symbol dictionary ID Shelves mathematical symbol ID;
D. according to document mathematical symbol ID, mathematical symbol set up document mathematics notation index;
F. document mathematics notation index table is set up according to setting up document mathematics notation index.
A kind of general formula searching method the most according to claim 2, it is characterised in that: described mathematical formulae query steps As follows:
The most described requestor receives mathematical formulae to be checked;
B. judge whether described mathematical formulae to be checked is text formatting;If described mathematical formulae to be checked is not text lattice Formula, carries out text formatting conversion by described mathematical formulae to be checked, is converted to text formatting mathematical formulae to be checked;
C. mathematical formulae to be checked to described text formatting carries out participle, and is decomposed into be checked by mathematical formulae to be checked for this form Ask mathematical symbol, record the described mathematical symbol to be checked positional information in mathematical formulae to be checked simultaneously;
D. mathematical symbol to be checked is converted to corresponding with described mathematical symbol dictionary ID according to mathematical symbol dictionary database Mathematical symbol ID to be checked;
E. in concordance list, inquire about document mathematical symbol ID, obtain and described Query Result consistent for mathematical symbol ID to be checked;
F., Query Result utilizes KNN algorithm carry out computing, obtains the mathematical formulae inquired;
G. the document comprising the mathematical formulae content inquired in raw page data storehouse is returned to user.
A kind of general formula searching method the most according to claim 1, it is characterised in that: described mathematical formulae index includes Towards the Presentation index represented and the Content index of Semantic-Oriented.
A kind of general formula searching method the most according to claim 1, it is characterised in that: described requestor receives to be checked The typing mode of mathematical formulae includes structured document typing mode and image acquisition typing mode;Described structured document typing Mode is latex structured document typing mode, word formula editors structured document typing mode and pdf structured document Typing mode;Described image acquisition typing mode includes insert pictures mode, screenshot capture mode, camera collection mode, sweeps Retouch instrument acquisition mode and high photographing instrument acquisition mode.
A kind of general formula searching method the most according to claim 1, it is characterised in that return to the document of user described in: Including the documents and materials that the derivation of formula is relevant to formula;The structured document form of the described document returning to user For PPT, Word, Latex or PDF.
A kind of general formula searching method the most according to claim 1, it is characterised in that return to the document of user described in: According to similarity, drawn the frequency and public's scoring carries out the division of priority, be ranked up according to the division of priority and show On the interface searched for.
A kind of general formula searching method the most according to claim 1, it is characterised in that: it contains knows with China respectively Net, ten-thousand-ton train, dragon citation journals interface, it is provided that the paying download function of the document retrieved, respectively support in National IP Network, ten thousand The reimbursement that has of number formulary evidence and dragon citation journals is downloaded.
CN201610171766.5A 2016-03-24 2016-03-24 Universal formula search method Pending CN105868177A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610171766.5A CN105868177A (en) 2016-03-24 2016-03-24 Universal formula search method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610171766.5A CN105868177A (en) 2016-03-24 2016-03-24 Universal formula search method

Publications (1)

Publication Number Publication Date
CN105868177A true CN105868177A (en) 2016-08-17

Family

ID=56625332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610171766.5A Pending CN105868177A (en) 2016-03-24 2016-03-24 Universal formula search method

Country Status (1)

Country Link
CN (1) CN105868177A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107153640A (en) * 2017-05-08 2017-09-12 成都准星云学科技有限公司 A kind of segmenting method towards elementary mathematics field
CN107463553A (en) * 2017-09-12 2017-12-12 复旦大学 For the text semantic extraction, expression and modeling method and system of elementary mathematics topic
CN107885870A (en) * 2017-11-24 2018-04-06 北京神州泰岳软件股份有限公司 A kind of service profile formulas Extraction method and device
CN108133168A (en) * 2016-12-01 2018-06-08 北京新唐思创教育科技有限公司 Formula searching method and its device in a kind of text identification
CN108304383A (en) * 2018-01-29 2018-07-20 北京神州泰岳软件股份有限公司 The formula info extracting method and device of service profile
CN108319724A (en) * 2018-02-28 2018-07-24 北京仁和汇智信息技术有限公司 A kind of Homepage Publishing method and device with formula file
CN108399156A (en) * 2018-02-28 2018-08-14 北京仁和汇智信息技术有限公司 The composition method and device of formula in a kind of pdf document
CN110888993A (en) * 2018-08-20 2020-03-17 珠海金山办公软件有限公司 Composite document retrieval method and device and electronic equipment
CN111078724A (en) * 2019-12-11 2020-04-28 中国建设银行股份有限公司 Method, device and equipment for searching test questions in learning system and storage medium
CN111597393A (en) * 2020-04-14 2020-08-28 北京金山云网络技术有限公司 Theorem search method, device, equipment and storage medium
CN112613279A (en) * 2020-12-24 2021-04-06 北京乐学帮网络技术有限公司 File conversion method and device, computer device and readable storage medium
CN116108326A (en) * 2023-04-12 2023-05-12 山东工程职业技术大学 Mathematic tool software control method, device, equipment and storage medium
CN116483943A (en) * 2023-06-21 2023-07-25 山东网安安全技术有限公司 Full text retrieval method and full text retrieval system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101110077A (en) * 2007-08-24 2008-01-23 新诺亚舟科技(深圳)有限公司 Method for implementing associated searching on handhold learning terminal

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101110077A (en) * 2007-08-24 2008-01-23 新诺亚舟科技(深圳)有限公司 Method for implementing associated searching on handhold learning terminal

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
MICHAEL KOHLHASE 等: "A search engine for mathematical formulate", 《ARTIFICIAL INTELLIGENCE AND SYMBOL COMPUTATION-8TH INTERNATIONAL CONFERENCE》 *
刘志伟: "数学索引引擎研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
崔林卫 等: "基于Nutch的Web数学公式提取", 《广西师范大学学报 自然科学版》 *
闫慧丽: "基于Lucene框架的Latex数学公式", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133168A (en) * 2016-12-01 2018-06-08 北京新唐思创教育科技有限公司 Formula searching method and its device in a kind of text identification
CN108133168B (en) * 2016-12-01 2021-04-30 北京新唐思创教育科技有限公司 Formula searching method and device in text recognition
CN107153640A (en) * 2017-05-08 2017-09-12 成都准星云学科技有限公司 A kind of segmenting method towards elementary mathematics field
CN107463553B (en) * 2017-09-12 2021-03-30 复旦大学 Text semantic extraction, representation and modeling method and system for elementary mathematic problems
CN107463553A (en) * 2017-09-12 2017-12-12 复旦大学 For the text semantic extraction, expression and modeling method and system of elementary mathematics topic
CN107885870A (en) * 2017-11-24 2018-04-06 北京神州泰岳软件股份有限公司 A kind of service profile formulas Extraction method and device
CN108304383A (en) * 2018-01-29 2018-07-20 北京神州泰岳软件股份有限公司 The formula info extracting method and device of service profile
CN108304383B (en) * 2018-01-29 2019-06-25 北京神州泰岳软件股份有限公司 The formula info extracting method and device of service profile
CN108319724A (en) * 2018-02-28 2018-07-24 北京仁和汇智信息技术有限公司 A kind of Homepage Publishing method and device with formula file
CN108399156A (en) * 2018-02-28 2018-08-14 北京仁和汇智信息技术有限公司 The composition method and device of formula in a kind of pdf document
CN110888993A (en) * 2018-08-20 2020-03-17 珠海金山办公软件有限公司 Composite document retrieval method and device and electronic equipment
CN111078724A (en) * 2019-12-11 2020-04-28 中国建设银行股份有限公司 Method, device and equipment for searching test questions in learning system and storage medium
CN111597393A (en) * 2020-04-14 2020-08-28 北京金山云网络技术有限公司 Theorem search method, device, equipment and storage medium
CN112613279A (en) * 2020-12-24 2021-04-06 北京乐学帮网络技术有限公司 File conversion method and device, computer device and readable storage medium
CN116108326A (en) * 2023-04-12 2023-05-12 山东工程职业技术大学 Mathematic tool software control method, device, equipment and storage medium
CN116483943A (en) * 2023-06-21 2023-07-25 山东网安安全技术有限公司 Full text retrieval method and full text retrieval system

Similar Documents

Publication Publication Date Title
CN105868177A (en) Universal formula search method
CN110399457B (en) Intelligent question answering method and system
Tuarob et al. Automatic tag recommendation for metadata annotation using probabilistic topic modeling
CN108280114B (en) Deep learning-based user literature reading interest analysis method
US9311388B2 (en) Semantic and contextual searching of knowledge repositories
CN109947952B (en) Retrieval method, device, equipment and storage medium based on English knowledge graph
CN113190687B (en) Knowledge graph determining method and device, computer equipment and storage medium
CN111190920B (en) Data interaction query method and system based on natural language
CN110147425A (en) A kind of keyword extracting method, device, computer equipment and storage medium
CN111061828B (en) Digital library knowledge retrieval method and device
US20180232410A1 (en) Refining structured data indexes
CN115757689A (en) Information query system, method and equipment
Spitz et al. EVELIN: Exploration of event and entity links in implicit networks
Aparna et al. ANNOTATING SEARCH RESULTS FROM WEB DATABASE USING IN-TEXT PREFIX/SUFFIX ANNOTATOR
Wu et al. Searching online book documents and analyzing book citations
RU2473119C1 (en) Method and system for semantic search of electronic documents
KR101476225B1 (en) Method for Indexing Natural Language And Mathematical Formula, Apparatus And Computer-Readable Recording Medium with Program Therefor
CN112015907A (en) Method and device for quickly constructing discipline knowledge graph and storage medium
CN116523041A (en) Knowledge graph construction method, retrieval method and system for equipment field and electronic equipment
Priyadarshini et al. Semantic retrieval of relevant sources for large scale virtual documents
TWI636370B (en) Establishing chart indexing method and computer program product by text information
Nghiem et al. Which one is better: presentation-based or content-based math search?
Blaz̆ek et al. Video hunter at VBS 2017
CN114117242A (en) Data query method and device, computer equipment and storage medium
JP2010282403A (en) Document retrieval method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160817