CN110134846A - Proper noun processing method, device and the computer equipment of text - Google Patents

Proper noun processing method, device and the computer equipment of text Download PDF

Info

Publication number
CN110134846A
CN110134846A CN201910311158.3A CN201910311158A CN110134846A CN 110134846 A CN110134846 A CN 110134846A CN 201910311158 A CN201910311158 A CN 201910311158A CN 110134846 A CN110134846 A CN 110134846A
Authority
CN
China
Prior art keywords
proper noun
text
marked
word
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910311158.3A
Other languages
Chinese (zh)
Inventor
许剑勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Smart Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Smart Technology Co Ltd filed Critical OneConnect Smart Technology Co Ltd
Priority to CN201910311158.3A priority Critical patent/CN110134846A/en
Publication of CN110134846A publication Critical patent/CN110134846A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application involves a kind of proper noun processing methods of text, device, computer equipment and storage medium, obtain the proper noun mark instructions that user is sent by terminal, each text to be marked is obtained according to proper noun mark instructions, after getting each text to be marked, the proprietary dictionary established by big data being automatically based upon in database analyzes each text to be marked, automatically extract the proper noun of each text to be marked, it is automatically performed the label of the proper noun to each text to be marked, when receiving explanation request of the user by the proper noun marked in terminal triggering text, according to explanation request access address;The content of pages in access address is crawled by regular expression, obtains the explanation information of proper noun.Labeling process is participated in without user, to improve work efficiency, and proper noun is clicked by terminal, corresponding explanation content is can be obtained, improves user experience.

Description

Proper noun processing method, device and the computer equipment of text
Technical field
This application involves field of computer technology, more particularly to a kind of proper noun processing method of text, device, meter Calculate machine equipment and storage medium.
Background technique
With the continuous development of Internet technology, many business are completed by internet.Such as: user asks It can include the content of many technical terms and noun, user is not necessarily if questionnaire is related to professional problem when volume investigation Understand it is what meaning, will lead to user and give an irrelevant answer.In order to make user that can understand the profession in text during answer Proper noun in text is picked out by artificial mode in advance, is marked before user's answer by term and noun Note will perhaps be linked in the explanation of proper noun by artificial mode and is associated with the proper noun in text.This is often A large amount of manpower is needed to do the addition of handmarking and url, and working efficiency is low.
Summary of the invention
Based on this, it is necessary in view of the above technical problems, provide a kind of proprietary name of text that can be improved working efficiency Word treatment method, device, computer equipment and storage medium.
A kind of proper noun processing method of text, which comprises
Obtain proper noun mark instructions;
Each text to be marked is obtained according to the proper noun mark instructions;
Each text to be marked is analyzed based on the proprietary dictionary in database, is extracted each described to be marked The proper noun of text;
The proper noun of each text to be marked is marked;
When receiving explanation request of the user by the proper noun marked in terminal triggering text, according to described Explain request access address;
The content of pages in the access address is crawled by regular expression, obtains the solution of the proper noun Release information.
It is described in one of the embodiments, that each text to be marked is carried out based on the proprietary dictionary in database The step of analysis, the proper noun of each text to be marked of extraction, comprising:
Word segmentation processing is carried out to the text to be marked, obtains each word;
Obtain the intersection of the proper noun in each word and the proprietary dictionary;
Word in intersection is determined as the proper noun in each text to be marked.
The mode of establishing of the proprietary dictionary includes: in one of the embodiments,
Text relevant to the content of text to be marked is obtained by big data;
Word segmentation processing is carried out to the content in the relevant text of the content of text to be marked, obtains each word to be analyzed Language;
Each word to be analyzed is analyzed, determines proper noun;
Each proper noun is saved in proprietary dictionary.
Described the step of each word to be analyzed is analyzed, determines proper noun in one of the embodiments, Include:
Obtain the search data of search engine;
Obtain the intersection according to search term and each word to be analyzed in search data;
Each word to be analyzed in intersection is determined as proper noun.
Described the step of each word to be analyzed is analyzed, determines proper noun in one of the embodiments, Further include:
Described search word not in intersection is analyzed, determines the searching times of described search word;
The search term that described search number is greater than preset times is determined as proper noun.
In one of the embodiments,
It is described when receiving user and triggering the explanation of the proper noun marked in text by terminal and request, according to The step of explanation request access address, comprising:
When receiving explanation request of the user by the proper noun marked in terminal triggering text, described in acquisition Explain the proper noun carried in request;
The proper noun and preset access address template are spliced, access address is obtained.
In one of the embodiments,
It is described that the content of pages in the access address is crawled by regular expression, obtain the proper noun Explanation information the step of, comprising:
The content of pages in the access address is crawled by regular expression, obtains content of pages;
Information interception is carried out to content of pages according to preset information interception rule, obtains the explanation letter of the proper noun Breath.
A kind of proper noun processing unit of text, described device include:
Instruction acquisition module, for obtaining proper noun mark instructions;
Questionnaire obtains module, for obtaining each text to be marked according to the proper noun mark instructions;
Proper noun extraction module, for being divided based on the proprietary dictionary in database each text to be marked The proper noun of each text to be marked is extracted in analysis;
Proper noun mark module, for the proper noun of each text to be marked to be marked;
Access address obtains module, triggers the proper noun marked in text by terminal for that ought receive user Explanation request when, according to the explanation request access address;
Explain that data obtaining module is obtained for crawling by regular expression to the content of pages in access address Obtain the explanation information of proper noun.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing The step of device realizes the method when executing the computer program.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor The step of method is realized when row.
Proper noun processing method, device, computer equipment and the storage medium of the text of above-mentioned text, when user needs When proper noun in text is marked, the proper noun mark instructions that user is sent by terminal are obtained, according to proprietary Noun marker instruction obtains each text to be marked, after getting each text to be marked, is automatically based upon proprietary in database Dictionary analyzes each text to be marked, automatically extracts the proper noun of each text to be marked, be automatically performed to respectively to The label of the proper noun of the text of label passes through the explanation of the proper noun marked in terminal triggering text when receiving user When request, according to explanation request access address;The content of pages in access address is crawled by regular expression, Obtain the explanation information of proper noun.Labeling process is participated in without user, to improve work efficiency, and passes through end point Proper noun is hit, corresponding explanation content is can be obtained, improves user experience.
Detailed description of the invention
Fig. 1 is the application scenario diagram of the proper noun processing method of text in one embodiment;
Fig. 2 is the flow diagram of the proper noun processing method of text in one embodiment;
Fig. 3 is the flow diagram of the proper noun processing method of text in one embodiment;
Fig. 4 is the structural block diagram of the proper noun processing unit of text in one embodiment;
Fig. 5 is the internal structure chart of computer equipment in one embodiment.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.
The proper noun processing method of text provided by the present application, can be applied in application environment as shown in Figure 1.Its In, terminal 102 is communicated with server 104 by network by network.Server 104 obtains the proprietary of the transmission of terminal 102 Noun marker instruction;Server 104 obtains each text to be marked according to the proper noun mark instructions;Based in database Proprietary dictionary each text to be marked is analyzed, extract the proper noun of each text to be marked;It will be each The proper noun of the text to be marked is marked, and triggers the institute marked in text by terminal 102 when receiving user When stating the explanation request of proper noun, server 104 is according to the explanation request access address;Pass through regular expression pair Content of pages in the access address is crawled, and the explanation information of the proper noun is obtained.Wherein, terminal 102 can be with But it is not limited to various personal computers, laptop, smart phone, tablet computer and portable wearable device, is serviced Device 104 can be realized with the server cluster of the either multiple server compositions of independent server.
In one embodiment, it as shown in Fig. 2, providing a kind of proper noun processing method of text, answers in this way For being illustrated for the server in Fig. 1, including step S220 to step S320:
Step S220 obtains proper noun mark instructions.
Wherein, when user needs to upload text in the server, the text does not carry out proper noun label also, need by Proper noun label in text, user passes through the proper noun mark instructions of terminal trigger the server, to proprietary in text Noun is marked.When answer user carries out answer, text after entitled label can pass through label and obtain proper noun Explanation information.
Step S240 obtains each text to be marked according to proper noun mark instructions.
Wherein, text to be marked can be the text for people's new edited of setting a question, and does not upload answering system also, is also possible to Pervious text can be the text of unmarked mistake, is also possible to labeled text, is marked again, can increase specially There is noun marker, server obtains each text to be marked after receiving proper noun mark instructions, can be to presetting Database for storing text to be marked obtains, and is also possible to connect and file is called to upload by proper noun mark instructions Text to be marked is uploaded mouth using file by terminal and is uploaded to server by interface, user, and server is got respectively wait mark The text of note.Text to be marked can be the text of questionnaire, can be used for the text that Products are introduced.
Step S260 analyzes each text to be marked based on the proprietary dictionary in database, extracts each to be marked Text proper noun.
Wherein, the proprietary dictionary in database refers to relate to for being stored in insurance industry, insurance clause and text And the dictionary of the technical term and noun (collectively referred to here in as proper noun) arrived, proprietary dictionary can be established in advance, accumulated gradually More and more proper nouns.Each text to be marked is analyzed based on the proprietary dictionary in database, is extracted respectively wait mark The proper noun of the text of note, such as: being compared, identified each with each text to be marked based on the proprietary dictionary in database The word having in the proprietary dictionary of word in the database in text to be marked, by both in each text to be marked and also Word in proprietary dictionary is determined as the proper noun of each text to be marked.
The proper noun of each text to be marked is marked step S280.
Wherein, according to the proper noun of determining each text to be marked, to all proprietary in each text to be marked Noun is marked, and the label is associated with proper noun interpretative order, when user is when carrying out text answer, encounters and is ignorant of Proper noun, labeled proper noun can be clicked, to trigger proper noun interpretative order, user can be obtained this specially There is the explanation information of noun.
Step S300, when receiving explanation request of the user by the proper noun marked in terminal triggering text, root According to explanation request access address.
Wherein, when user is during checking text, the proper noun in text, which has, to be ignorant of, if the proper noun Be it is labeled, user clicks the proper noun by terminal, can be sent by terminal to server to the solution of proper noun Request is released, after server receives explanation request, according to explanation request access address.
Step S320 crawls the content of pages in access address by regular expression, obtains proper noun Explain information.
Wherein, the content of pages in access address is crawled by regular expression, obtains the explanation of proper noun Information is shown at the terminal.Regular expression refers to a kind of logical formula to string operation, is exactly with fixed in advance The combination of justice good some specific characters and these specific characters, forms one " regular character string ", this " regular character string " For expressing a kind of filter logic to character string, regular expression, which is usually used to retrieval, replaces those meets some mode The text of (rule).The explanation information of proper noun is to crawl acquisition based on access address and regular expression, passes through access Location obtains the page where the explanation information of proper noun, and there are many content of the page, and user is it is only necessary to know that the proper noun What probably refers to, if the proper noun of Baidupedia is explained, other than explaining the corresponding meaning, there are also sources etc., need The content of the place page is purified, the content of preset field can be obtained, as proper noun according to preset field The content of explanation, such as: in Baidupedia, generally in the explanation content that the first segment of text is proper noun.
In the proper noun processing method of above-mentioned text, when user needs that the proper noun in text is marked, The proper noun mark instructions that user is sent by terminal are obtained, obtain each text to be marked according to proper noun mark instructions This, after getting each text to be marked, the proprietary dictionary being automatically based upon in database analyzes each text to be marked, The proper noun for automatically extracting each text to be marked is automatically performed the label of the proper noun to each text to be marked, when When receiving explanation request of the user by the proper noun marked in terminal triggering text, according to explanation request access Location;The content of pages in access address is crawled by regular expression, obtains the explanation information of proper noun.Without use Family participates in labeling process, to improve work efficiency, and clicks proper noun by terminal, can be obtained corresponding explanation Content improves user experience.
In one embodiment, each text to be marked is analyzed based on the proprietary dictionary in database, is extracted each The step of proper noun of text to be marked, comprising:
Word segmentation processing is carried out to text to be marked, obtains each word;Obtain each word with it is special in the proprietary dictionary There is the intersection of noun;Word in intersection is determined as the proper noun in each text to be marked.
Wherein, word segmentation processing refers to for a sentence sequence being cut into individual word one by one, and participle is exactly will be continuous Word sequence the process of word sequence is reassembled into according to certain specification, word segmentation processing can be based on the participle of string matching Method can also be also based on segmenting method of statistics etc. based on the segmenting method of understanding, it is therefore intended that will be to be marked Sentence in text is cut into individual word one by one;Intersection refers to: setting A, B is two set, belongs to set A by all And belong to set composed by the element of set B, it is called the intersection of set A Yu set B, such as: set { 1,2,3 } and { 2,3,4 } Intersection be { 2,3 }.The intersection for obtaining the proper noun in each word and the proprietary dictionary is to obtain each text to be marked In word proprietary dictionary in the database in the word that has, i.e., in the set of words and database in each text to be marked Proprietary dictionary proper noun intersection, the word in intersection is determined as the proper noun in each text to be marked.It is logical Can automatically the word in text to be marked be extracted by crossing word segmentation processing, can be special out with automatic distinguishing based on proprietary dictionary There is noun, reduces user and participate in labeling process, to improve work efficiency.
In one embodiment, referring to Fig. 3, the mode of establishing of proprietary dictionary includes step S520 to step S580:
Step 520, text relevant to content of text to be marked is obtained by big data.
Wherein, big data (big data), which refers to, can not be captured with conventional software tool within the scope of certain time, be managed The data acquisition system of reason and processing is to need new tupe that could have stronger decision edge, see clearly discovery power and process optimization Magnanimity, high growth rate and the diversified information assets of ability, a kind of scale arrive greatly big in terms of acquisition, storage, management, analysis The data acquisition system for having exceeded traditional database software means capability range greatly, data scale, quick data flow with magnanimity Turn, the data type and the low four big feature of value density of multiplicity.It is related that content of text to be marked is obtained by big data analysis Text, such as: text to be marked be insurance investigation questionnaire when, the relevant text of content of text to be marked can be security bar Money, insurance investigation questionnaire etc. text.When text to be marked is that Products introduce text, content of text to be marked is related Text can be the similar product of the product and introduce text, product technology text etc..
Step 540, word segmentation processing is carried out to the content in the relevant text of content of text to be marked, obtained each to be analyzed Word.
Wherein, word segmentation processing refers to for a sentence sequence being cut into individual word one by one, and participle is exactly will be continuous Word sequence the process of word sequence is reassembled into according to certain specification, word segmentation processing can be based on the participle of string matching Method can also be also based on segmenting method of statistics etc. based on the segmenting method of understanding, it is therefore intended that will be to be marked The sentence in content in the relevant text of content of text is cut into individual word one by one, obtains each word to be analyzed.
Step 560, each word to be analyzed is analyzed, determines proper noun.
Wherein, the word to be analyzed that the content in the relevant text of content of text to be marked is obtained by word segmentation processing, It can't determine whether to need to analyze each word to be analyzed, determine special in each word to be analyzed for proper noun There is noun, can be obtained by the search data of acquisition search engine according to the search term and each word to be analyzed in search data Each word to be analyzed in intersection is determined as proper noun by the intersection of language;Can also by obtain Chinese vocabulary bank in word, With the intersection of each word to be analyzed, each word to be analyzed in intersection is determined as proper noun, Chinese vocabulary bank can be preservation There is the database of Chinese word.
Step 580, each proper noun is saved in proprietary dictionary.
Wherein, determining each proper noun is saved to the proprietary dictionary for the proper noun for being used to determine text to be marked In, step S520 to step S580 can be repeated, to increase the amount of the proper noun of proprietary dictionary, so as to be marked Text proper noun be marked during, be able to satisfy more user demands.
In one embodiment, the step of each word to be analyzed being analyzed, determines proper noun, comprising: acquisition is searched Index the search data held up;Obtain the intersection according to search term and each word to be analyzed in search data;It will be each in intersection Word to be analyzed is determined as proper noun.
Wherein, search engine refers to according to certain strategy, collects letter from internet with specific computer program Breath provides retrieval service for user, the relevant information of user search is showed user after carrying out tissue and processing to information System, when user with keyword search information when, search engine can be searched in the database, want if found with user Ask the website that content is consistent, just use special algorithm, generally according to the matching degree of keyword in webpage, appearance position, The frequency, link quality calculate the degree of correlation and ranking grade of each webpage, then according to degree of association height, in order by these Web page interlinkage returns to user, when user is scanned for using search engine, has corresponding search record, i.e. search number According to, searching in data includes that user searches for the keyword that uses, i.e. search term in search data, user individually use the word into Row search, can determine that user is the meaning for wondering the search term, can determine that the search term is word, when the word also goes out When in present each word to be analyzed, can determine the word be can be related in insurance industry, insurance clause and text it is special Industry term and noun can then determine that the word is proper noun.To increase the amount of the proper noun of proprietary dictionary, so that right During the proper noun of text to be marked is marked, it is able to satisfy more user demands.
In one embodiment, the step of each word to be analyzed being analyzed, determines proper noun, further includes: to not Search term in intersection is analyzed, and determines the searching times of search term;Described search number is greater than searching for preset times Rope word is determined as proper noun.
Wherein, the search term not in intersection refers to the word not occurred in the relevant text of content of text, Searching times refer to each user when scanning for by search engine, and the number that same word occurs, preset times can root It is set according to actual conditions, such as: being ok for 10 times, 100 times, even if some search terms are not in the relevant text of content of text Middle appearance, but the searching times based on user, it can be seen that user is the meaning for not knowing the search term in many cases , in order to meet the demand of user, the search term more than searching times can be also determined as proper noun.It is proprietary to increase The amount of the proper noun of dictionary, so that being able to satisfy more during the proper noun to text to be marked is marked User demand.
In one embodiment, when receive user pass through terminal trigger text in mark proper noun explanation request When, according to the step of explaining request access address, comprising: special by being marked in terminal triggering text when receiving user When thering is the explanation of noun to request, the proper noun explained and carried in request is obtained;By proper noun and preset access address mould Plate is spliced, and access address is obtained.
Wherein, user requests the explanation of proper noun by the proper noun triggering that terminal clicks label.Proper noun It include the proper noun in interpretative order.Spliced according to the proper noun and preset access address to be spliced, is generated Access address, such as: preset access address to be spliced can be determining according to the access address of the explanation information of proper noun, Does is such as: by accessing Baidupedia, the url of Baidupedia https: //baike.baidu.com/search/none? word= Xxx&pn=0&rn=1&enc=utf8, xxx are for the proper noun replaced, Hou Mian &pn=0&rn=1&enc It is page 0 that=utf8, which respectively indicates pageNo, rn be number be 1, enc be to be encoded to uf8, this url dynamic splice, Xxx is replaced with to the proper noun for needing to explain and generates access address.Access address is automatically generated, user's thing one by one is not necessarily to First by insertion access link in the proper noun of label, workflow is saved, is improved work efficiency.
In one embodiment, when user needs to carry out literary questionnaire to fill in, user can be by terminal to service What device sent questionnaire fills in request, server get questionnaire fill in request after, will fill in corresponding in requesting Questionnaire is sent to terminal and is shown, user can carry out answer, when user is during answer, text according to item content In proper noun have and be ignorant of, if the proper noun be it is labeled, click the word, terminal to server can be passed through It sends the explanation to proper noun to request, after server receives explanation request, based on the proper noun explained in request, lead to Cross regular expression to crawl the content of pages in access address, obtain the explanation information of proper noun, at the terminal into Row display.
The content of pages in access address is crawled by regular expression, obtains the explanation information of proper noun Step, comprising: the content of pages in access address is crawled by regular expression, obtains content of pages;According to default Information interception rule to content of pages carry out information interception, obtain the explanation information of proper noun.
Wherein, the explanation information of proper noun is to crawl acquisition based on access address and regular expression, passes through access Location obtains the page where the explanation information of proper noun, and there are many content of the page, and user is it is only necessary to know that the proper noun What probably refers to, if the proper noun of Baidupedia is explained, other than explaining the corresponding meaning, there are also sources etc., need The content of the place page is purified, content of pages can be intercepted, preset according to preset information interception rule Information interception rule can be set according to the page feature crawled, such as: in Baidupedia, being generally in the first segment of text Therefore the explanation content of proper noun only extracts the content that the content of first segment is explained as proper noun, by preset information Interception rule settings are to intercept preset field to obtain the content of preset field according to preset field, as proper noun solution The content released, and the explanation content is shown in terminal.User can be can be obtained corresponding by clicking proper noun It explains content, improves user experience.
In one embodiment, there may be not labeled proper noun in text, when user is doing the process inscribed In, occur having in the stem of text not labeled word not understand the meaning, user can select to be ignorant of by selecting long-pressing Word is clicked and is determined, after server receives determine instruction, obtains the word of user's selection, the word which is selected As proper noun, request is explained in triggering, according to the explanation request access address, by regular expression to access Content of pages in location is crawled, and the explanation information of proper noun is obtained.It obtains corresponding proper noun and explains content, and will The explanation content is shown in terminal, and the word is saved into proprietary dictionary.When user is unknown to unlabelled word The word is obtained proper noun as proper noun and explains content, improve user experience by Bai Shi by manually selecting.
It should be understood that although each step in the flow chart of Fig. 2-3 is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, at least one in Fig. 2-3 Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, the execution sequence in these sub-steps or stage is also not necessarily successively It carries out, but can be at least part of the sub-step or stage of other steps or other steps in turn or alternately It executes.
In one embodiment, as shown in figure 4, providing a kind of proper noun processing unit of text, comprising: instruction obtains Modulus block 310, questionnaire obtain module 320, proper noun extraction module 330, proper noun mark module 340, access Location obtains module 350 and explains data obtaining module 360, in which:
Instruction acquisition module 310, for obtaining proper noun mark instructions.
Questionnaire obtains module 320, for obtaining each text to be marked according to proper noun mark instructions.
Proper noun extraction module 330, for being divided based on the proprietary dictionary in database each text to be marked The proper noun of each text to be marked is extracted in analysis.
Proper noun mark module 340, for the proper noun of each text to be marked to be marked;
Access address obtains module 350, triggers the proper noun marked in text by terminal for that ought receive user Explanation request when, according to explain request access address;
Explain data obtaining module 360, for being crawled by regular expression to the content of pages in access address, Obtain the explanation information of proper noun.
In one embodiment, proper noun extraction module 330 includes: word segmentation processing unit, for text to be marked This progress word segmentation processing obtains each word;Intersection computing unit, for obtaining the proprietary name in each word and the proprietary dictionary The intersection of word;The first determination unit of proper noun, it is proprietary in each text to be marked for the word in intersection to be determined as Noun.
In one embodiment, the proper noun processing unit of text further include: text obtains module, for by counting greatly According to acquisition text relevant to content of text to be marked;Word to be analyzed obtains module, for content of text to be marked Content in relevant text carries out word segmentation processing, obtains each word to be analyzed;Proper noun determining module, for respectively wait divide Analysis word is analyzed, and determines proper noun;Preserving module, for each proper noun to be saved in proprietary dictionary.
In one embodiment, proper noun determining module includes: data capture unit, for obtaining searching for search engine Rope data;Intersection acquiring unit, for obtaining the intersection according to search term and each word to be analyzed in search data;Proprietary name The second determination unit of word, for each word to be analyzed in intersection to be determined as proper noun.
In one embodiment, proper noun determining module further include: searching times determination unit, for not in intersection In search term analyzed, determine the searching times of search term;Proper noun third determination unit, for described search is secondary The search term that number is greater than preset times is determined as proper noun.
In one embodiment, it includes: proper noun acquiring unit that access address, which obtains module 350, is received for working as When user is requested by the explanation that terminal triggers the proper noun marked in text, the proprietary name explained and carried in request is obtained Word;Access address obtaining unit obtains access address for splicing proper noun and preset access address template.
In one embodiment, explain that data obtaining module 360 is also used to: by regular expression in access address Content of pages is crawled, and content of pages is obtained;Information interception is carried out to content of pages according to preset information interception rule, is obtained Obtain the explanation information of proper noun.
The specific of proper noun processing unit about text limits the proper noun that may refer to above for text The restriction of processing method, details are not described herein.Modules in the proper noun processing unit of above-mentioned text can whole or portion Divide and is realized by software, hardware and combinations thereof.Above-mentioned each module can be embedded in the form of hardware or independently of computer equipment In processor in, can also be stored in a software form in the memory in computer equipment, in order to processor calling hold The corresponding operation of the above modules of row.
In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 5.The computer equipment include by system bus connect processor, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment is used to store the proper noun data in proprietary dictionary.The network interface of the computer equipment be used for it is outer The terminal in portion passes through network connection communication.At a kind of proper noun when the computer program is executed by processor to realize text Reason method.
It will be understood by those skilled in the art that structure shown in Fig. 5, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with Computer program, the processor perform the steps of when executing computer program
Obtain proper noun mark instructions;Each text to be marked is obtained according to proper noun mark instructions;Based on data Proprietary dictionary in library analyzes each text to be marked, extracts the proper noun of each text to be marked;It will be respectively wait mark The proper noun of the text of note is marked;Pass through the explanation of the proper noun marked in terminal triggering text when receiving user When request, according to explanation request access address;The content of pages in access address is crawled by regular expression, Obtain the explanation information of proper noun.
In one embodiment, processor execute computer program when also perform the steps of to text to be marked into Row word segmentation processing obtains each word;Obtain the intersection of the proper noun in each word and the proprietary dictionary;By the word in intersection Language is determined as the proper noun in each text to be marked.
In one embodiment, processor execute computer program when also perform the steps of by big data obtain with The relevant text of content of text to be marked;Word segmentation processing is carried out to the content in the relevant text of content of text to be marked, Obtain each word to be analyzed;Each word to be analyzed is analyzed, determines proper noun;Each proper noun is saved in proprietary word In library.
In one embodiment, it is also performed the steps of when processor executes computer program and obtains searching for search engine Rope data;Obtain the intersection according to search term and each word to be analyzed in search data;By each word to be analyzed in intersection It is determined as proper noun.
In one embodiment, it also performs the steps of when processor executes computer program to searching not in intersection Rope word is analyzed, and determines the searching times of search term;The search term that described search number is greater than preset times is determined as specially There is noun.
In one embodiment, user ought be received by, which also performing the steps of when processor executes computer program, passes through When the explanation request of the proper noun marked in terminal triggering text, the proper noun explained and carried in request is obtained;It will be proprietary Noun is spliced with preset access address template, obtains access address.
In one embodiment, it also performs the steps of when processor executes computer program through regular expression pair Content of pages in access address is crawled, and content of pages is obtained;According to preset information interception rule to content of pages into Row information interception, obtains the explanation information of proper noun.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program performs the steps of when being executed by processor
Obtain proper noun mark instructions;Each text to be marked is obtained according to proper noun mark instructions;Based on data Proprietary dictionary in library analyzes each text to be marked, extracts the proper noun of each text to be marked;It will be respectively wait mark The proper noun of the text of note is marked;Pass through the explanation of the proper noun marked in terminal triggering text when receiving user When request, according to explanation request access address;The content of pages in access address is crawled by regular expression, Obtain the explanation information of proper noun.
In one embodiment, it also performs the steps of when computer program is executed by processor to text to be marked Word segmentation processing is carried out, each word is obtained;Obtain the intersection of the proper noun in each word and the proprietary dictionary;It will be in intersection Word is determined as the proper noun in each text to be marked.
In one embodiment, it also performs the steps of when computer program is executed by processor and is obtained by big data Text relevant to content of text to be marked;Content in the relevant text of content of text to be marked is carried out at participle Reason obtains each word to be analyzed;Each word to be analyzed is analyzed, determines proper noun;Each proper noun is saved in specially Have in dictionary.
In one embodiment, it is also performed the steps of when computer program is executed by processor and obtains search engine Search for data;Obtain the intersection according to search term and each word to be analyzed in search data;By each word to be analyzed in intersection Language is determined as proper noun.
In one embodiment, it also performs the steps of when computer program is executed by processor to not in intersection Search term is analyzed, and determines the searching times of search term;The search term that described search number is greater than preset times is determined as Proper noun.
In one embodiment, it is also performed the steps of when computer program is executed by processor logical when receiving user When crossing the explanation request of the proper noun marked in terminal triggering text, the proper noun explained and carried in request is obtained;It will be special There is noun to be spliced with preset access address template, obtains access address.
In one embodiment, it is also performed the steps of when computer program is executed by processor and passes through regular expression Content of pages in access address is crawled, content of pages is obtained;According to preset information interception rule to content of pages Information interception is carried out, the explanation information of proper noun is obtained.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the concept of this application, various modifications and improvements can be made, these belong to the protection of the application Range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims (10)

1. a kind of proper noun processing method of text, which comprises
Obtain proper noun mark instructions;
Each text to be marked is obtained according to the proper noun mark instructions;
Each text to be marked is analyzed based on the proprietary dictionary in database, extracts each text to be marked Proper noun;
The proper noun of each text to be marked is marked;
When receiving explanation request of the user by the proper noun marked in terminal triggering text, according to the explanation Request access address;
The content of pages in the access address is crawled by regular expression, obtains the explanation letter of the proper noun Breath.
2. the method according to claim 1, wherein the proprietary dictionary based in database to it is each it is described to The step of text of label is analyzed, and the proper noun of each text to be marked is extracted, comprising:
Word segmentation processing is carried out to the text to be marked, obtains each word;
Obtain the intersection of the proper noun in each word and the proprietary dictionary;
Word in intersection is determined as the proper noun in each text to be marked.
3. the method according to claim 1, wherein the mode of establishing of the proprietary dictionary includes:
Text relevant to the content of text to be marked is obtained by big data;
Word segmentation processing is carried out to the content in the relevant text of the content of text to be marked, obtains each word to be analyzed;
Each word to be analyzed is analyzed, determines proper noun;
Each proper noun is saved in proprietary dictionary.
4. according to the method described in claim 3, determining it is characterized in that, described analyze each word to be analyzed The step of proper noun, comprising:
Obtain the search data of search engine;
Obtain the intersection according to search term and each word to be analyzed in search data;
Each word to be analyzed in intersection is determined as proper noun.
5. according to the method described in claim 4, determining it is characterized in that, described analyze each word to be analyzed The step of proper noun, further includes:
Described search word not in intersection is analyzed, determines the searching times of described search word;
The search term that described search number is greater than preset times is determined as proper noun.
6. the method according to claim 1, wherein it is described when receive user by terminal trigger text get the bid When the explanation request of the proper noun of note, the step of according to the explanation request access address, comprising:
When receiving explanation request of the user by the proper noun marked in terminal triggering text, the explanation is obtained The proper noun carried in request;
The proper noun and preset access address template are spliced, access address is obtained.
7. according to the method described in claim 1, it is described by regular expression to the content of pages in the access address into The step of row crawls, obtains the explanation information of the proper noun, comprising:
The content of pages in the access address is crawled by regular expression, obtains content of pages;
Information interception is carried out to the content of pages according to preset information interception rule, obtains the explanation letter of the proper noun Breath.
8. a kind of proper noun processing unit of text, which is characterized in that described device includes:
Instruction acquisition module, for obtaining proper noun mark instructions;
Questionnaire obtains module, for obtaining each text to be marked according to the proper noun mark instructions;
Proper noun extraction module, for being analyzed based on the proprietary dictionary in database each text to be marked, Extract the proper noun of each text to be marked;
Proper noun mark module, for the proper noun of each text to be marked to be marked;
Access address obtains module, for working as the solution for receiving user and passing through the proper noun marked in terminal triggering text When releasing request, according to the explanation request access address;
Explain that data obtaining module is obtained for crawling by regular expression to the content of pages in the access address Obtain the explanation information of proper noun.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
CN201910311158.3A 2019-04-18 2019-04-18 Proper noun processing method, device and the computer equipment of text Pending CN110134846A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910311158.3A CN110134846A (en) 2019-04-18 2019-04-18 Proper noun processing method, device and the computer equipment of text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910311158.3A CN110134846A (en) 2019-04-18 2019-04-18 Proper noun processing method, device and the computer equipment of text

Publications (1)

Publication Number Publication Date
CN110134846A true CN110134846A (en) 2019-08-16

Family

ID=67570219

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910311158.3A Pending CN110134846A (en) 2019-04-18 2019-04-18 Proper noun processing method, device and the computer equipment of text

Country Status (1)

Country Link
CN (1) CN110134846A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569370A (en) * 2019-09-16 2019-12-13 北京百度网讯科技有限公司 Knowledge graph construction method and device, electronic equipment and storage medium
CN112434137A (en) * 2020-12-11 2021-03-02 乐山师范学院 Poetry retrieval method and system based on artificial intelligence
CN112784006A (en) * 2020-06-05 2021-05-11 珠海金山办公软件有限公司 Book recommendation method and device, electronic equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1609835A (en) * 2003-10-21 2005-04-27 国际商业机器公司 Comment method, apparatus and system for electronic file
CN101901236A (en) * 2009-05-26 2010-12-01 英业达股份有限公司 System and method for explaining technical terms
CN105843962A (en) * 2016-04-18 2016-08-10 百度在线网络技术(北京)有限公司 Information processing and displaying methods, information processing and displaying devices as well as information processing and displaying system
CN106528796A (en) * 2016-11-11 2017-03-22 苏州工讯科技有限公司 Method for quickly identifying proper nouns in industrial product e-commerce search engine
CN108418745A (en) * 2018-02-09 2018-08-17 深圳百诺国际生命科技有限公司 Information insertion method based on instant communication information between doctors and patients and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1609835A (en) * 2003-10-21 2005-04-27 国际商业机器公司 Comment method, apparatus and system for electronic file
CN101901236A (en) * 2009-05-26 2010-12-01 英业达股份有限公司 System and method for explaining technical terms
CN105843962A (en) * 2016-04-18 2016-08-10 百度在线网络技术(北京)有限公司 Information processing and displaying methods, information processing and displaying devices as well as information processing and displaying system
CN106528796A (en) * 2016-11-11 2017-03-22 苏州工讯科技有限公司 Method for quickly identifying proper nouns in industrial product e-commerce search engine
CN108418745A (en) * 2018-02-09 2018-08-17 深圳百诺国际生命科技有限公司 Information insertion method based on instant communication information between doctors and patients and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569370A (en) * 2019-09-16 2019-12-13 北京百度网讯科技有限公司 Knowledge graph construction method and device, electronic equipment and storage medium
CN110569370B (en) * 2019-09-16 2022-09-02 北京百度网讯科技有限公司 Knowledge graph construction method and device, electronic equipment and storage medium
CN112784006A (en) * 2020-06-05 2021-05-11 珠海金山办公软件有限公司 Book recommendation method and device, electronic equipment and readable storage medium
CN112434137A (en) * 2020-12-11 2021-03-02 乐山师范学院 Poetry retrieval method and system based on artificial intelligence
CN112434137B (en) * 2020-12-11 2023-04-11 乐山师范学院 Poetry retrieval method and system based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN108304378B (en) Text similarity computing method, apparatus, computer equipment and storage medium
WO2021114810A1 (en) Graph structure-based official document recommendation method, apparatus, computer device, and medium
CN108595695B (en) Data processing method, data processing device, computer equipment and storage medium
US7617202B2 (en) Systems and methods that employ a distributional analysis on a query log to improve search results
CN109829628A (en) Method for prewarning risk, device and computer equipment based on big data
CN109766438A (en) Biographic information extracting method, device, computer equipment and storage medium
CN110377900A (en) Checking method, device, computer equipment and the storage medium of Web content publication
CN109829153A (en) Intension recognizing method, device, equipment and medium based on convolutional neural networks
CN109800335A (en) Generation method, device, computer equipment and the storage medium of enterprise's map
CN110321470A (en) Document processing method, device, computer equipment and storage medium
US20090077065A1 (en) Method and system for information searching based on user interest awareness
CN109829629A (en) Generation method, device, computer equipment and the storage medium of risk analysis reports
CN105630938A (en) Intelligent question-answering system
CN110134846A (en) Proper noun processing method, device and the computer equipment of text
CN109766430A (en) Contract audit method, apparatus, computer equipment and storage medium
CN109543925A (en) Risk Forecast Method, device, computer equipment and storage medium based on machine learning
CN108306864A (en) Network data detection method, device, computer equipment and storage medium
CN106844640A (en) A kind of web data analysis and processing method
CN108399150A (en) Text handling method, device, computer equipment and storage medium
CN110362799A (en) Processing method, device and computer equipment are generated based on the award arbitrated online
CN106446113A (en) Mobile big data analysis method and device
WO2022142635A1 (en) Service information inputting method and apparatus, and server and storage medium
CN108306878A (en) Detection method for phishing site, device, computer equipment and storage medium
CN110909120A (en) Resume searching/delivering method, device and system and electronic equipment
CN109447412A (en) Construct method, apparatus, computer equipment and the storage medium of business connection map

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination