CN104133913A - System and method for automatically establishing city shop information library based on video analysis, searching and aggregation - Google Patents

System and method for automatically establishing city shop information library based on video analysis, searching and aggregation Download PDF

Info

Publication number
CN104133913A
CN104133913A CN201410391136.XA CN201410391136A CN104133913A CN 104133913 A CN104133913 A CN 104133913A CN 201410391136 A CN201410391136 A CN 201410391136A CN 104133913 A CN104133913 A CN 104133913A
Authority
CN
China
Prior art keywords
information
classification
business information
unit
error correction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410391136.XA
Other languages
Chinese (zh)
Other versions
CN104133913B (en
Inventor
朱明�
雷鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN201410391136.XA priority Critical patent/CN104133913B/en
Publication of CN104133913A publication Critical patent/CN104133913A/en
Application granted granted Critical
Publication of CN104133913B publication Critical patent/CN104133913B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The invention discloses a system and method for automatically establishing a city shop information library based on video analysis, searching and aggregation. The system comprises an automatic merchant information and group buying information searching and aggregating unit, a unit for establishing an ontology knowledge base of merchant types, a unit for obtaining merchant information to be classified, a unit for establishing a merchant information correcting word bank and correcting information and an automatic merchant information classifying unit. The automatic merchant information and group buying information searching and aggregating unit is used for searching and aggregating merchant information. The unit for establishing the ontology knowledge base of the merchant types is used for classifying unclassified merchant information. The unit for obtaining the merchant information to be classified is used for obtaining the merchant information to be classified in a wide range. The unit for establishing the merchant information correcting word bank and correcting the information is used for correcting the merchant information which is identified to be incorrect. The automatic merchant information classifying unit is used for automatically obtaining the correct type of the merchant information. By the adoption of the system and method for automatically establishing the city shop information library based on video analysis, searching and aggregation, execution is more convenient and more efficient, and information is more accurate.

Description

A kind of based on video analysis and city retail shop information bank automatic build system and the method for searching for polymerization
Technical field
The present invention relates to a kind of city retail shop information bank automatic build system based on video analysis and search polymerization, belong to internet and image recognition technology field.
Background technology
At present, various merchant web site in internet, group buying websites, information content is numerous, type is also intricate, the criteria for classification that neither one is unified, consumer wants the commodity that find oneself to need, and don't know what classification it belongs to, will be from finding website one by one, for example the commodity of popular comment net just have up to 9 kinds, the commodity of U.S. group net also have 8 kinds more than, the criteria for classification of each website is different, will make commodity be placed in different classes of webpage, and consumer often needs to have clicked behind this website, also to open another website, find own commodity place webpage, often need to click tens times even tens of times, and in internet, occur some purchase by group and gather website, often information is comprehensive not, even repeatedly the commodity and the corresponding huge mistake of classification difference of commodity that occur.We can search for each main flow businessman group buying websites of polymerization internet in a large number in the urgent need to a kind of, and the criteria for classification oneself setting according to us is placed information, accomplish accurately, efficient, very clear.
The a large amount of businessman's group buying websites of search polymerization, will inevitably produce a large amount of Business Information, for a large amount of Business Information, we need to know the classification that wherein each information belongs to, if manually classified one by one, this will be a great engineering, also be unpractical, if we have Business Information storehouse, one's own city, this will produce the effect of getting twice the result with half the effort, and not yet has at present relevant report.
Summary of the invention
Technology of the present invention is dealt with problems: overcome the deficiencies in the prior art, provide a kind of based on video analysis and city retail shop information bank automatic build system and the method for searching for polymerization, realize the complicacy of raising the efficiency and reducing information.
The technology of the present invention solution: a kind of based on video analysis and the city retail shop information bank automatic build system of searching for polymerization, comprise: Business Information and purchase by group information automatic search polymerized unit, the creating unit in the ontology knowledge storehouse of businessman's classification, obtain Business Information to be sorted unit, the establishment of Business Information error correction dictionary and error correction information unit, Business Information automatic classification unit;
Business Information and purchase by group information automatic search polymerized unit, relevant Top Site on automatic search internet, according to different websites, create corresponding web crawlers framework, determine as required the more required information format crawling, the classification of the required division of information and the form of depositing information carry out time set simultaneously, can start by set date, new data more regularly;
The creating unit in the ontology knowledge storehouse of businessman's classification, according to Business Information with purchase by group the data of obtaining in information automatic search polymerized unit, carry out pre-service, and deposit in the text document of each classification, then using Lucence is that all documents are set up corresponding index;
Obtain Business Information unit, adopt the mode shooting the video to obtain a sheet of businessman trade name of a street or stretch, also can adopt the mode of taking a picture to obtain businessman's trade name, then need video to cut, obtain picture, carry out image recognition, obtain corresponding Business Information;
Business Information error correction dictionary creates and error correction information unit, according to Business Information with purchase by group the Business Information that information automatic search polymerized unit crawls, with certain forms, store, form error correction dictionary, according to error correction dictionary, carry out image recognition again, the wrong Business Information identifying is carried out to error correction, obtain correct Business Information;
Business Information automatic classification unit, obtain Business Information to be sorted, then this information is carried out to word segmentation processing, obtain keyword set, bring the ontology knowledge storehouse that the creating unit in the ontology knowledge storehouse of businessman's classification creates into, the index creating according to it, calculate keyword set and be combined in the similarity sum in each classification document, what this similarity was calculated employing is the similarity calculating based on dynamic programming algorithm, obtain classification document corresponding to maximum similarity sum, this classification is Business Information classification.
City retail shop information bank method for auto constructing based on video analysis with search polymerization, step is as follows:
(1) Business Information and purchase by group information automatic search polymerization procedure: relevant Top Site on automatic search internet, according to different websites, create corresponding web crawlers framework, determine as required the more required information format crawling, the classification of the required division of information and the form of depositing information, carry out time set simultaneously, can start by set date, new data more regularly;
(2) foundation step in the ontology knowledge storehouse of businessman's classification: according to Business Information with purchase by group the data of obtaining in information automatic search polymerization procedure, carry out pre-service, and deposit in the text document of each classification, then using Lucence is that all documents are set up corresponding index;
(3) obtain Business Information step: adopt the mode shooting the video to obtain a sheet of businessman trade name of a street or stretch, also can adopt the mode of taking a picture to obtain businessman's trade name, then need video to cut, obtain picture, carry out image recognition, obtain corresponding Business Information;
(4) Business Information error correction dictionary creates and error correction information step: according to Business Information with purchase by group the Business Information that information automatic search polymerization procedure crawls, with certain forms, store, form error correction dictionary, according to error correction dictionary, carry out image recognition again, the wrong Business Information identifying is carried out to error correction, obtain correct Business Information;
(5) Business Information automatic classification step: obtain Business Information to be sorted, then this information is carried out to word segmentation processing, obtain keyword set, bring the ontology knowledge storehouse that the foundation step in the ontology knowledge storehouse of businessman's classification creates into, the index creating according to it, calculate keyword set and be combined in the similarity sum in each classification document, what this similarity was calculated employing is the similarity calculating based on dynamic programming algorithm, obtain classification document corresponding to maximum similarity sum, this classification is Business Information classification.
The present invention's advantage is compared with prior art: the present invention is by crawling Business Information and the automatic search that purchases by group information, aggregation information, user can be for scientific research, also can be used for commercialization, for example sort and just find out at hot item, also can find out prime location by popular businessman address, find out rule etc., build the ontology knowledge storehouse for businessman's classification, can make information classification more accurate; Obtain businessman's correct information unit, can the store name on certain a street or road be recorded by the mode of video capture (also can in the mode of taking pictures), then carry out video cutting, image recognition, identifies store name one by one; Build Business Information error correction dictionary library, picture recognition wrong trade name out can be corrected; Business Information automatic classification, can be attributed to Business Information one class automatically, has more orderliness, very clear, is easier to people and accepts, and has improved the service efficiency of information, also for developer provides convenience.
Accompanying drawing explanation
Fig. 1 is businessman of the present invention and purchases by group information automatic search schematic diagram;
Fig. 2 is the establishment schematic diagram in the ontology knowledge storehouse of the Business Information in the present invention;
Fig. 3 is the Business Information schematic diagram to be sorted that obtains in the present invention;
Fig. 4 is establishment and the error correction information schematic diagram of the Business Information error correction dictionary in the present invention;
Fig. 5 is the Business Information automatic classification schematic diagram in the present invention;
Fig. 6 is whole workflow schematic diagram of the present invention.
Embodiment
In order to be illustrated more clearly in the technical scheme of the embodiment of the present invention, below the accompanying drawing of required use during embodiment is described is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, do not paying under the prerequisite of creative work, can also obtain other accompanying drawings according to these accompanying drawings.
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Based on embodiments of the invention, those of ordinary skills, not making the every other embodiment obtaining under creative work prerequisite, belong to protection scope of the present invention, and example of the present invention is implemented based on Java programming language.
As shown in Figure 1, the embodiment of the present invention provides a kind of businessman and purchases by group the automatic search polymerized unit of information, comprises automatic searching businessman and group buying websites, determine information format, create web crawlers framework and carry out timing and crawl data, data are carried out to duplicate removal and merge processing, more new database:
First, system is searched a plurality of businessmans Top Sites (commenting on net etc. as masses) automatically in internet, definite information format that will crawl, as firm name, city, place, address, contact method, longitude and latitude, label etc., specifically need to determine a plurality of category attributes that can distinguish, as cuisines class, life kind, hotel's class, beauty's class, hotel's class etc., talk about more specifically, we can also continue to be divided into less class from each class above as required, as cuisines class can continue to be divided into local dish, foreign cuisine, chafing dish, bread dessert, other etc., life kind can continue to be divided into housekeeping, finishing, educational training, shopping etc., then according to divided classification, carrying out orientation crawls.
Then, system is automatically searched a plurality of businessmans and is purchased by group Top Site (rolling into a ball net etc. as U.S.) in internet, determines the information format that will crawl, as trade name, purchase by group information, city, place, address, contact method, longitude and latitude, purchased by group number, purchase by group the movable cut-off date etc., same, need to determine a plurality of category attributes that can distinguish, be consistent while crawling businessman's Top Site with above-mentioned, then according to divided classification, carry out orientation and crawl.
Create web crawlers framework and carry out timing and crawl data, need the type according to different web sites, create different rule and the strategies of crawling.
The rule that crawls and strategy are write based on Java and XPath, and XPath is XML path language, and it is that a kind of being used for determined xML( standard generalized markup languagesubset) language of certain part position in document.The tree structure of XPath based on XML, provides the ability of looking for node in data-structure tree.We can obtain the html source code of webpage, then by searching, find the XPath of the information that need to obtain to crawl rule.Then we import the jar bag about XPath, and the API based on XPath writes relevant function, and next we pass through the html source code obtaining information of webpage according to write function and XPath rule.
When running into emergency situations, there is information wrong or that occur obtaining (obtaining information not in program operation, information saves as NULL entirely), the wrong time of appearance can be issued to user with the mistake occurring with the form of mail, user can find and occur wrong position according to wrong type, finds reason, xpath rule is changed into correct, thereby can more rapidly and efficiently crawl data; Carry out timing and crawl data, need setup times, during every stipulated time, system automatically crawls work and deposits database in.
Setup times, use is the job scheduling framework of increasing income based on Quartz here, creates simply a java class that realizes org.quartz.Job interface.Job interface comprises unique method:
public?void?execute(JobExecutionContext?context)
throws?JobExecutionException;
It adds some logics to execute () method in Job Interface realization class the inside, and two elementary cells of Quartz scheduling packets are operation and trigger.Operation is executing the task of can dispatching, and trigger provides the scheduling to operation.
Quartz also needs to arrange corresponding expression formula, for example: " 0012**? " represent that every day, 12 noon triggered, program brings into operation, here the Business Information website that we arrange 1:00 AM operation on every Mondays, purchase by group Business Information website 1:00 AM every day and upgrade, corresponding expression formula be respectively " 00001**? * MON ", " 00001**? "
Data are carried out to duplicate removal and merge to process,, to Business Information with while purchasing by group the information that occurs same businessman in information, merge, delete the data that repeat.
More new database, during the information not having in there is database, need to expand data, and out-of-date and non-existent information is deleted.
The Business Information of the embodiment of the present invention and the information that purchases by group are searched plain polymerized unit automatically, timing and information format that user can set by oneself, from internet, search own needed information and automatically crawl, and carry out fusion and the renewal of information, make Information preservation up-to-date (information that purchases by group is upgraded once every day), and guaranteed the quantity (remaining on 100,000 left and right) of information, readable, accuracy and availability, and reduced the complicacy of information, more humane while making user use information, more scientific efficient.
By Business Information and purchase by group the mode of the automatic search polymerization of information, can make information obtain quicker, utilize more fully, in information search polymerization field, there is larger development prospect, can be widely used in news information search polymerization in internet, words information searching polymerization, video information search polymerization, pictorial information is searched for polymerization etc., fully meets user's demand.
As shown in Figure 2, the embodiment of the present invention provides the construction unit of retail shop's Information Ontology knowledge base, comprises the basic ontology knowledge storehouse that builds businessman, carries out data pre-service, creates index:
First the data that the search of the Business Information based on above polymerization obtains, extract the information needing, and as businessman's title, purchase by group information etc., are deposited in text document, thereby build basic businessman's ontology knowledge storehouse.
Then the data in knowledge base are carried out to pre-service, partition data, according to data category attribute, be deposited in each classification text document, for example: the Business Information of cuisines class is deposited in cuisines text document, each text document is deposited the Business Information of a certain classification, character string dissimilar in text document is converted into the character string of unified standard, facilitates later word segmentation processing and similarity coupling, the full-shape punctuate of document is converted into half-angle punctuate.
Next, using Lucence is all document creation index, and is unique No. ID of one of each document setting, for example: if information is divided into cuisines, beauty, life, hotel, leisure 5 large classes, can be for No. ID meishi so, liren, shenguo, jiudian, xiuxian, or be s 1, s 2, s 3, s 4, s 5.For ensuing classification work ready.
As shown in Figure 3, example of the present invention provides and obtains Business Information to be sorted unit, what mainly obtain here is that the trade name of businessman (also can obtain other information of businessman as price, special goods etc.), by the mode of capture video, record the trade name of all businessmans on a street or Yi Tiao road.
Video is cut, obtain a sheet by a sheet picture that comprises businessman's trade name.
Pick out the picture that can identify, then cut.
Then with image recognition software, identify (what adopt is OCR software) here, obtaining the character string identifying is businessman's title.
First obtaining businessman's title, bring retail shop's information automatic classification unit into, if the classification of last output is not the correct classification of regulation, will analyze the reason of makeing mistakes, if because before picture recognition out be wrong businessman's trade name, will carry out error correction to it.
The establishment of Business Information error correction dictionary and error correction information unit, comprise and create error correction dictionary as shown in Figure 4, data pre-service and the correct businessman of acquisition title.Error correction dictionary, is error message to be carried out to the standard of error correction, by Fig. 1 businessman with purchase by group in the information that the automatic search polymerized unit of information obtains, picks out the information (what adopt is businessman's trade name) of a certain type here.
Next carry out data processing, the redundant information that data are contained is removed, what adopt here is canonical coupling, processing power by means of the powerful character string of canonical, for example the self-service barbecue in Jin Fu river Chuzhou Lu Dian, is the self-service barbecue in Jin Fu river after processing, and is then stored in text document.
Finally need that Fig. 3 is obtained to businessman's trade name that Business Information to be sorted unit obtains and carry out error correction, what adopt here is that similarity based on dynamic algorithm is calculated, and is about to treat that the information comprising in error correction trade name and above-mentioned error correction dictionary is carried out similarity coupling.
The thought that the similarity is here calculated is:
(1) use jcseg participle instrument that pending character string is carried out to word segmentation processing, obtaining the word number of wherein getting is num, and the word of getting is given in a character string array str;
(2) i=0 is set, then the character string of str array the inside and each information in error correction dictionary is mated, for each information, if there is a string matching to arrive, i adds 1; If finally have n bar character string and a certain information matches in str, str array (pending character string) is s=(n/num) * 100% with the similarity of this information;
(3) similarity of supposing every information in this pending character string and error correction dictionary is s 1, s 2, s 3, s 4, similarity is s to the maximum max=max (s 1, s 2, s 3, s 4), and write down the corresponding information of maximum similarity, and the information of similarity maximum to be returned, this information is correct trade name.
As shown in Figure 5, the embodiment of the present invention provides Business Information automatic classification unit, comprises retail shop's information of obtaining needs classification, and word segmentation processing is calculated similarity according to ontology knowledge storehouse, obtains correct classification:
Obtain and need the Business Information of classification (what adopt is Business Information) here, as businessman's trade name, merchandise news etc.
Word segmentation processing, need participle instrument that Business Information is carried out to participle operation, what adopt here is jcseg participle instrument, then extracts the keyword in participle, (for example the self-service barbecue of Han Site can be divided into Han Site, self-service, barbecue keyword set), obtain corresponding keyword set.
According to ontology knowledge storehouse, calculate similarity, be about to above-mentioned keyword set, the ontology knowledge storehouse index creating according to Fig. 2, one by one with each classification document of knowledge base in data mate, calculate the similarity of each keyword, what the similarity of the calculating was here used is the similarity realizing based on dynamic programming algorithm.
Obtain correct classification, calculate keyword set and be combined in the similarity summation in each classification document, the classification of the classification document of similarity summation maximum is the correct classification that Business Information need to be classified.
Whole computation process is:
(1) (if be divided into cuisines, leisure, beauty, life, hotel's 5 classes, number is respectively a first to calculate the number a of the contained information of each classification in ontology knowledge storehouse 1, a 2, a 3, a 4, a 5);
(2) then using jcseg is that Business Information to be sorted carries out word segmentation processing, obtains keyword set, and number is n, by keyword, gives a character string array str;
(3) then each character string in str array being carried out to similarity with each classification text document mates, calculate the number of times that this character string occurs in each classification text document, if certain packets of information is containing this character string in classification document, calculate this character string and occur once, the number of times that this character string occurs in each classification is respectively b i1, b i2, b i3, b i4, b i5(i=1,2,3 ... n), the similarity of Business Information and each classification document is s 1=(b 11+ b 21+ b 31+ b 41+ ... + b n1)/a 1* 100%, s2=(b 12+ b 22+ b 32+ b 42+ ... + b n2)/a 2* 100%
(4) having maximum similarity is s max=max (s 1, s 2, s 3), its corresponding classification is the corresponding classification of Business Information.
Retail shop's information automatic classification unit of the embodiment of the present invention, user can be by non-classified Business Information, through above-mentioned flow process, classification under can obtaining, in information classification field, there is vast potential for future development, can be widely used in Web page classifying, news information classification, separated film, song classification, the fields such as picture classification, for user, no matter be daily life or research work, all provide a great convenience.
As shown in Figure 6, the flow process of the automatic structure in the Business Information storehouse, city of whole search polymerization and automatic classification system:
1, carry out Business Information and the automatic search polymerization that purchases by group information;
2, build the ontology knowledge storehouse based on Business Information;
3, obtain Business Information to be sorted;
4, build Business Information error correction dictionary and and Business Information is carried out to error correction;
5, for the information of required correct classification, classify, obtain classification.
One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment, to complete by computer programming language, described program can complete under windows system or linux system, this program, when carrying out, can comprise as the flow process of the embodiment of above-mentioned each side method.Wherein above-mentioned computer programming language can be the main flow programming languages such as Java, Python.

Claims (2)

  1. One kind based on video analysis with search polymerization city retail shop information bank automatic build system, it is characterized in that comprising: Business Information and purchase by group information automatic search polymerized unit, the creating unit in the ontology knowledge storehouse of businessman's classification, obtain Business Information to be sorted unit, the establishment of Business Information error correction dictionary and error correction information unit, Business Information automatic classification unit;
    Business Information and purchase by group information automatic search polymerized unit, relevant Top Site on automatic search internet, according to different websites, create corresponding web crawlers framework, determine as required the more required information format crawling, the classification of the required division of information and the form of depositing information carry out time set simultaneously, can start by set date, new data more regularly;
    The creating unit in the ontology knowledge storehouse of businessman's classification, according to Business Information with purchase by group the data of obtaining in information automatic search polymerized unit, carry out pre-service, and deposit in the text document of each classification, then using Lucence is that all documents are set up corresponding index;
    Obtain Business Information unit, adopt the mode shooting the video to obtain a sheet of businessman trade name of a street or stretch, also can adopt the mode of taking a picture to obtain businessman's trade name, then need video to cut, obtain picture, carry out image recognition, obtain corresponding Business Information;
    Business Information error correction dictionary creates and error correction information unit, according to Business Information with purchase by group the Business Information that information automatic search polymerized unit crawls, with certain forms, store, form error correction dictionary, according to error correction dictionary, carry out image recognition again, the wrong Business Information identifying is carried out to error correction, obtain correct Business Information;
    Business Information automatic classification unit, obtain Business Information to be sorted, then this information is carried out to word segmentation processing, obtain keyword set, bring the ontology knowledge storehouse that the creating unit in the ontology knowledge storehouse of businessman's classification creates into, the index creating according to it, calculate keyword set and be combined in the similarity sum in each classification document, what this similarity was calculated employing is the similarity calculating based on dynamic programming algorithm, obtain classification document corresponding to maximum similarity sum, this classification is Business Information classification.
  2. 2. based on video analysis and a city retail shop information bank method for auto constructing of searching for polymerization, it is characterized in that step is as follows:
    (1) Business Information and purchase by group information automatic search polymerization procedure: relevant Top Site on automatic search internet, according to different websites, create corresponding web crawlers framework, determine as required the more required information format crawling, the classification of the required division of information and the form of depositing information, carry out time set simultaneously, can start by set date, new data more regularly;
    (2) foundation step in the ontology knowledge storehouse of businessman's classification: according to Business Information with purchase by group the data of obtaining in information automatic search polymerization procedure, carry out pre-service, and deposit in the text document of each classification, then using Lucence is that all documents are set up corresponding index;
    (3) obtain Business Information step: adopt the mode shooting the video to obtain a sheet of businessman trade name of a street or stretch, also can adopt the mode of taking a picture to obtain businessman's trade name, then need video to cut, obtain picture, carry out image recognition, obtain corresponding Business Information;
    (4) Business Information error correction dictionary creates and error correction information step: according to Business Information with purchase by group the Business Information that information automatic search polymerization procedure crawls, with certain forms, store, form error correction dictionary, according to error correction dictionary, carry out image recognition again, the wrong Business Information identifying is carried out to error correction, obtain correct Business Information;
    (5) Business Information automatic classification step: obtain Business Information to be sorted, then this information is carried out to word segmentation processing, obtain keyword set, bring the ontology knowledge storehouse that the foundation step in the ontology knowledge storehouse of businessman's classification creates into, the index creating according to it, calculate keyword set and be combined in the similarity sum in each classification document, what this similarity was calculated employing is the similarity calculating based on dynamic programming algorithm, obtain classification document corresponding to maximum similarity sum, this classification is Business Information classification.
CN201410391136.XA 2014-08-07 2014-08-07 A kind of city retail shop information bank automatic build system being polymerized with search based on video analysis and method Active CN104133913B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410391136.XA CN104133913B (en) 2014-08-07 2014-08-07 A kind of city retail shop information bank automatic build system being polymerized with search based on video analysis and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410391136.XA CN104133913B (en) 2014-08-07 2014-08-07 A kind of city retail shop information bank automatic build system being polymerized with search based on video analysis and method

Publications (2)

Publication Number Publication Date
CN104133913A true CN104133913A (en) 2014-11-05
CN104133913B CN104133913B (en) 2017-06-16

Family

ID=51806591

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410391136.XA Active CN104133913B (en) 2014-08-07 2014-08-07 A kind of city retail shop information bank automatic build system being polymerized with search based on video analysis and method

Country Status (1)

Country Link
CN (1) CN104133913B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106412008A (en) * 2016-08-26 2017-02-15 乐视控股(北京)有限公司 Identifier correcting method and device
CN108268883A (en) * 2016-12-31 2018-07-10 上海交通大学 Mobile terminal information model based on open data builds system certainly
CN109271850A (en) * 2018-08-02 2019-01-25 北京三快在线科技有限公司 A kind of Business Information method for uploading, device, electronic equipment and storage medium
CN111626049A (en) * 2020-05-27 2020-09-04 腾讯科技(深圳)有限公司 Title correction method and device for multimedia information, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001202475A (en) * 2000-01-19 2001-07-27 Sharp Corp Character recognizer and its control method
KR20130026601A (en) * 2011-08-12 2013-03-14 주식회사 케이티 Method and system of recognizing and storing business card information using business card management server

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001202475A (en) * 2000-01-19 2001-07-27 Sharp Corp Character recognizer and its control method
KR20130026601A (en) * 2011-08-12 2013-03-14 주식회사 케이티 Method and system of recognizing and storing business card information using business card management server

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106412008A (en) * 2016-08-26 2017-02-15 乐视控股(北京)有限公司 Identifier correcting method and device
CN108268883A (en) * 2016-12-31 2018-07-10 上海交通大学 Mobile terminal information model based on open data builds system certainly
CN109271850A (en) * 2018-08-02 2019-01-25 北京三快在线科技有限公司 A kind of Business Information method for uploading, device, electronic equipment and storage medium
WO2020024625A1 (en) * 2018-08-02 2020-02-06 北京三快在线科技有限公司 Merchant information upload
CN109271850B (en) * 2018-08-02 2021-08-20 北京三快在线科技有限公司 Merchant information uploading method and device, electronic equipment and storage medium
CN111626049A (en) * 2020-05-27 2020-09-04 腾讯科技(深圳)有限公司 Title correction method and device for multimedia information, electronic equipment and storage medium
CN111626049B (en) * 2020-05-27 2022-12-16 深圳市雅阅科技有限公司 Title correction method and device for multimedia information, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN104133913B (en) 2017-06-16

Similar Documents

Publication Publication Date Title
CN104050196B (en) A kind of interest point data redundant detecting method and device
CN102880649B (en) A kind of customized information disposal route and system
CN104636402A (en) Classification, search and push methods and systems of service objects
CN104933164A (en) Method for extracting relations among named entities in Internet massive data and system thereof
CN102831121A (en) Method and system for extracting webpage information
CN105164710A (en) Entity bidding
CN104685495A (en) A system and method for automatic generation of information-rich content from multiple microblogs, each microblog containing only sparse information
CN110188107A (en) A kind of method and device of the Extracting Information from table
CN103294781A (en) Method and equipment used for processing page data
Markou et al. Predicting taxi demand hotspots using automated internet search queries
Sharma et al. Latent DIRICHLET allocation (LDA) based information modelling on BLOCKCHAIN technology: a review of trends and research patterns used in integration
CN108021715B (en) Heterogeneous label fusion system based on semantic structure feature analysis
CN103984705A (en) Search result displaying method, device and system
Alves et al. A spatial and temporal sentiment analysis approach applied to Twitter microtexts
CN104133913A (en) System and method for automatically establishing city shop information library based on video analysis, searching and aggregation
CN112487109A (en) Entity relationship extraction method, terminal and computer readable storage medium
CN105718457B (en) Information pushing method and system based on electronic bill
Lee et al. Research trend analysis for sustainable qr code use-focus on big data analysis
CN103699568A (en) Method for extracting hyponymy relation of field terms from wikipedia
CN103761312B (en) Information extraction system and method for multi-recording webpage
CN109062551A (en) Development Framework based on big data exploitation command set
Rajšp et al. Neo4j graph dataset of cycling paths in Slovenia
Suman et al. Direct marketing with the application of data mining
CN106716403A (en) Automated generation of web site entry pages
Cao E-Commerce Big Data Mining and Analytics

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant