CN103544186B - The method and apparatus excavating the subject key words in picture - Google Patents
The method and apparatus excavating the subject key words in picture Download PDFInfo
- Publication number
- CN103544186B CN103544186B CN201210246688.2A CN201210246688A CN103544186B CN 103544186 B CN103544186 B CN 103544186B CN 201210246688 A CN201210246688 A CN 201210246688A CN 103544186 B CN103544186 B CN 103544186B
- Authority
- CN
- China
- Prior art keywords
- candidate keywords
- term
- picture
- key word
- identification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5846—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/24—Character recognition characterised by the processing or recognition method
- G06V30/242—Division of the character sequences into groups prior to recognition; Selection of dictionaries
- G06V30/244—Division of the character sequences into groups prior to recognition; Selection of dictionaries using graphical properties, e.g. alphabet type or font
Abstract
The present invention relates to a kind of method and apparatus excavating the subject key words in picture.The method excavating the subject key words in picture includes:Initial retrieval word identification step, the key word in identification picture is as initial term;Candidate keywords extraction step, using the retrieval word and search subject web page related to picture therefrom to extract candidate keywords;Term selects step, the linking relationship between term according to used by candidate keywords and search candidate keywords, selects a part of candidate keywords as the term used by next candidate keywords extraction step from candidate keywords;And repeat candidate keywords extraction step and term selection step until meeting predetermined condition.
Description
Technical field
The present invention relates to field of information processing and in particular to excavate picture in subject key words method and apparatus.
Background technology
Word in picture is often extremely important to the content understanding this picture.For example, advertising pictures Chinese version information pair
Client understands that ad content has important function.Using character recognition(For example, OCR identification)Result and the network information can be more
Plus comprehensively extract the content of text of advertisement, and by excavating these information and extracting the theme of advertisement, will be to its expansion of lead referral
Exhibition application or service.
Because character recognition technologies can not lock representative picture(For example, advertising pictures)The key word of theme, so by
The substantial amounts of text message in the Internet, verifies and extracts the text in advertising image.Using keyword retrieval in character identification result,
The data mining means such as text cluster and coupling, can obtain the subject web page related with advertisement(The webpage of retrieval and advertisement itself
All express a content).Yet with character identification result, there is certain imperfection or incorrectness, lead to Partial key
The webpage that word and search goes out is likely to be of diversity, generates noise data, and if the webpage of keyword search dissipates, its input
The correct recognition result of key word will be dropped it is impossible to recall.
Accordingly, it would be desirable to a kind of technology that can solve the problem that the problems referred to above.
Content of the invention
Brief overview with regard to the present invention is given below, to provide the basic reason with regard to certain aspects of the invention
Solution.It should be appreciated that this general introduction is not the exhaustive general introduction with regard to the present invention.It is not intended to determine the key of the present invention
Or pith, nor is it intended to limit the scope of the present invention.Its purpose only provides some concepts in simplified form, with
This is as the preamble in greater detail discussed after a while.
One main purpose of the present invention is, provides a kind of method and apparatus excavating the subject key words in picture.
According to an aspect of the invention, it is provided a kind of method excavating the subject key words in picture includes:Initially
Term identification step, the key word in identification picture is as initial term;Candidate keywords extraction step, using retrieval
The word and search subject web page related to picture is therefrom to extract candidate keywords;Term selects step, according to candidate keywords
The linking relationship and term used by search candidate keywords between, selects a part of candidate keywords from candidate keywords
As the term used by next candidate keywords extraction step;And repeat candidate keywords extraction step and retrieval selected ci poem
Select step until meeting predetermined condition.
According to another aspect of the present invention, there is provided a kind of excavate picture in subject key words equipment, including:Just
Beginning term identification module, be arranged to identify picture in key word as initial term;Candidate keywords are extracted
Module, is arranged to using the term search subject web page related to picture therefrom to extract candidate keywords;Term
Selecting module, is arranged to the linking relationship between the term according to used by candidate keywords and search candidate keywords,
Select a part of candidate keywords as candidate keywords extraction module search next time candidate keywords institute from candidate keywords
Term;And control module, it is arranged to control candidate keywords extraction module and the circulation of term selecting module
Operation is until meeting predetermined condition.
In addition, embodiments of the invention additionally provide the computer program for realizing said method.
Additionally, embodiments of the invention additionally provide the computer program of at least computer-readable medium form, its
Upper record has the computer program code for realizing said method.
By the detailed description to highly preferred embodiment of the present invention below in conjunction with accompanying drawing, the these and other of the present invention is excellent
Point will be apparent from.
Brief description
Below with reference to the accompanying drawings illustrate embodiments of the invention, can be more readily understood that the above of the present invention and its
Its objects, features and advantages.Part in accompanying drawing is intended merely to illustrate the principle of the present invention.In the accompanying drawings, identical or similar
Technical characteristic or part will be represented using same or similar reference.
Fig. 1 is the flow chart illustrating the according to embodiments of the present invention method of subject key words excavated in picture;
Fig. 2 is the schematic diagram of the method for subject key words excavated in picture illustrating an example according to the present invention;
Fig. 3 is the schematic diagram illustrating to select candidate keywords by Feature Fusion;
Fig. 4 is an example illustrating the picture according to the present invention;
Fig. 5 is an example illustrating the search and webpage according to the present invention;
Fig. 6 is the schematic diagram of the linking relationship illustrating term and candidate keywords;
Fig. 7 is to illustrate the block diagram excavating the equipment of subject key words in picture according to an embodiment of the invention;
Fig. 8 is the block diagram of the configuration illustrating term selecting module;
Fig. 9 is the frame illustrating the equipment of subject key words excavating in picture according to another embodiment of the invention
Figure;
Figure 10 is the block diagram of the configuration illustrating candidate keywords extraction module;And
Figure 11 is the meter illustrating can be used for the method and apparatus of subject key words excavating in picture implementing the present invention
The structure chart of the citing of calculation equipment.
Specific embodiment
Embodiments of the invention to be described with reference to the accompanying drawings.An accompanying drawing or a kind of embodiment of the present invention are retouched
The element stated and feature can be combined with the element shown in one or more other accompanying drawings or embodiment and feature.Should
Work as attention, for purposes of clarity, eliminate in accompanying drawing and explanation known to unrelated to the invention, those of ordinary skill in the art
Part and process expression and description.
Fig. 1 is the flow chart illustrating the according to embodiments of the present invention method 100 of subject key words excavated in picture.
As shown in figure 1, in step s 102, key word in picture can be identified as initial term.For example, may be used
With by OCR(Optical Character Recognition)Method is identifying the key word in picture.But character recognition
Method not limited to this, and can be using arbitrarily suitable character identifying method.Picture can be arbitrarily to need picture to be processed, example
Such as, advertising pictures, the picture intercepting from video or any other pictures.
In step S104, it is possible to use the retrieval word and search subject web page related to picture is therefrom to extract candidate key
Word.
In step s 106, can according to candidate keywords and search candidate keywords used by term between link
Relation, selects a part of candidate keywords as the retrieval used by next candidate keywords extraction step from candidate keywords
Word.For example, it is possible to the candidate keywords that prioritizing selection is retrieved by more terms extract step as next candidate keywords
Suddenly term used.
In step S108, judge whether predetermined condition is satisfied.
If judging that in step S108 predetermined condition is not satisfied, return to step S104.
If judging that in step S108 predetermined condition is satisfied, terminate flow process.
Described predetermined condition can be arbitrarily suitable condition herein, the including but not limited to predetermined condition of convergence,
Predetermined cycle-index or its combination etc..
When executing term selection step S106, can also be using the key word of identification and candidate keywords from picture
Between similarity.For example, it is possible to according to the similarity between the key word of identification and candidate keywords and root from picture
According to the linking relationship between the term used by candidate keywords and search candidate keywords, from candidate keywords, select one
Divide candidate keywords as the term used by next candidate keywords extraction step S104.
The framework of the subject key words excavated in picture of an example according to the present invention to be described hereinafter with reference to Fig. 2
Flow process 200.
First, in step S202, by suitable text recognition method such as OCR(Optical Character
Recognition)Text recognition method is identifying the character in picture.
Then, in step S204-1, extract the key word in picture from the character of identification(Hereinafter referred to as from picture
The key word of identification).Initially, the knot in step S206 and step S208 will should be used directly as by the key word of identification from picture
Really, i.e. a part as the initial term in step S210.
Furthermore, it is possible to extract entity name in step S204-2 from the character identifying, entity name can include
Trade (brand) name occurring in name, place name, mechanism's name, time, quantity and other self-defining entity names, such as picture etc..By
To search related web page, there is important indicative function in these entity names, so in step S210, using in step S204-
The entity name extracted in 2 and the combining form of the OCR key word extracting in step S204-1 to generate term.Change sentence
Talk about, the form of the term generating in step S210 can be the knot of a key word and one or more entity name
Close.But in fact, the form not limited to this of term.For example, term can only include one or more key words, and does not wrap
Include entity name.
Then, in step S212, retrieval in search engine put in the term generating in step S210.
In step S214 using text cluster and and in step S216, subject web page is extracted by text matches mode.
Specifically, text cluster is that the webpage searching out is clustered, this is because the webpage that can cluster more has
May the description theme related to picture.
Although additionally, the webpage of cluster is more similar each other, but it cannot be guaranteed that these webpages all describe and picture
Related theme.For example, if input entity name:Name, place name and mechanism's name etc., then the webpage clustering may only describe institute
State the details of input entity name, and the non-depicted theme related to picture.For example, referring to the picture in Fig. 4, if with
" bank " carrys out search and webpage for term and executes cluster, then the webpage clustering may only describe " bank ", and non-depicted with
The related theme " coffee " of picture.Therefore, in step S216, description is excavated further in text matches mode related to picture
Subject web page.Specifically, in step S216, on the basis of the text cluster of step S214, by each webpage and should
The OCR recognition result of picture does matching primitives.
Then, in step S218, the score value according to text matches is ranked up to webpage, to select to describe and picture phase
The webpage of the theme closing, i.e. subject web page.
Obtaining subject web page it should be appreciated that arriving notwithstanding by text cluster and text matches, may be used herein
Step after directly being executed using the webpage searching with not executing text cluster and text matches, or can only hold
One of row text cluster and text matches are carrying out webpage screening.
Then, in step S220, judge whether predetermined condition is satisfied.Described predetermined condition can be any herein
Suitable condition, the including but not limited to predetermined condition of convergence, predetermined cycle-index or its combine etc..
If judging that in step S220 predetermined condition is not satisfied, and proceeds to step S206.
In step S206, according to the character in subject web page and from picture identification key word between similarity from
Candidate keywords are extracted in subject web page.Preferably, can be according to the specific editing distance formula being described later on and by multinomial
The mode of Feature Fusion is calculating similarity.
In step S208, can chain between candidate keywords and the term searching for used by this candidate keywords
The relation that connects selects a part of candidate keywords from candidate keywords.For example, it is possible to prioritizing selection is retrieved by more terms
One or more candidate keywords as subsequent term or term a part(Another part can be physical name
Claim), will be described in after a while.
For example, it is possible to the candidate keywords being retrieved by most terms and entity name combination producing are executed next time
Term used during step S210.
Next execution step S212 is to step S220.If judging that in step S220 predetermined condition is not satisfied,
Then again proceed to step S206.When judging that in step S220 predetermined condition is satisfied, for example, when key word meets in advance
During fixed condition, terminate flow process.Herein, this predetermined condition can be manual type given threshold.
Next, the calculating by the similarity describing between the key word identifying from picture and candidate keywords.Phase
Calculating like degree is related to editing distance and multiple features selection and fusion.
The editing distance computational methods of the confidence level based on the key word identifying in picture are described first.
Because character recognition algorithm may not be entirely accurate, for example, the problems such as mistake, noise, institute in character recognition
Can extract the key word of identification from picture using editing distance algorithm(That is, initial term or initial term
A part).The calculating of editing distance is found currently minimum editor's cost to realize in dynamic programming mode.Editor's cost
Including three kinds:Increase the cost that a character is spent, delete the cost that a character is spent, and replace a character institute
The cost spending.
In one embodiment of the invention, general editing distance algorithm is improved.
Each character due to character recognition has confidence level.The value of confidence level represents the accuracy rate of character recognition.Put
Reliability is higher, illustrates that character recognition is more accurate.Therefore, in the present invention, have modified editor's cost function, i.e. by each character
Replacement function be transformed into the confidence level of character.
Assume that the key word character string identifying from picture is O=O1, O2... ..., OmWith corresponding candidate keywords character
Go here and there as C=C1, C2... ..., Cn, then as follows from the editing distance δ (O, C) of character string O to character string C:
δ (O, C)=min γ (S) | editor's sequence for O to C for the S } (1)
Above-mentioned formula can recursive definition as follows:
γ (S) represents the cost function of editor's sequence S, and ε represents empty string, γ (Oi→ ε) represent and delete character Oi, modification
Replace cost and be changed into confidence value confidence (Oi).
Fig. 4 is the example illustrating the picture according to the present invention.
Picture in Fig. 4 is advertising pictures.Each word of one of key word of identification " cangue 1 afternoon " from this picture
Symbol(" cangue ", " 1 ", " ", " noon ", " afterwards ", ", ")All there is confidence level.Specific as follows:" cangue 1 afternoon, " overall confidence level
For 0.8827, the confidence level of " cangue " is 0.3346, and the confidence level of " 1 " is " 0.7777 ", " " confidence level be 0.8571, " noon "
Confidence level be " 0.9577 ", the confidence level of " afterwards " is 0.9417, and the confidence level of ", " is " -1.0000 ".
This key word and candidate keywords editing distance is as follows:
The editing distance of the substring Cj of [0....j] in substring Oi to the C of [0....i] in Edit (i, j) expression O, f (i,
J) represent that in O, i-th character O (i) is transformed into the operation cost required for j-th character C (j) in C, if O (i)=C (j),
Do not need any operation f (i, j)=0;Otherwise, replacement operation, f (i, j)=conf (i, j) are needed.
If i=0 and j=0, edit (0,0)=1
If i=0 and j>0, then edit (0, j)=edit (0, j-1)+1
If i>0 and j=0, edit (i, 0)=edit (i-1,0)+1
If i>0 and j>0, then edit (i, j)=min (edit (i-1, j)+1, edit (i, j-1)+1, edit (i-1, j-
1)+conf(i,j))
Multiple features selection and fusion are below described.Fig. 3 is the signal illustrating to select candidate keywords by Feature Fusion
Figure.
From picture, the key word of identification and the feature of subject web page have important function to the selection of candidate keywords, its
Feature is as shown in Figure 3.
Can calculate in the way of using Feature Fusion between the key word O identifying from picture and candidate keywords C
Similarity Sim (O, C) is as follows:
Sim(O,C)=α1f1+α2f2+……+αnfn(3)
Wherein, α1,α2,……,αnThe parameter being characterized, f1,f2,……,fnFor the feature that can select, O is from picture
In the key word that identifies, C is candidate keywords.
Wherein, feature f1,f2,……,fnAt least one in the following can be included:The key of identification from picture
Position in corresponding text of the size of word, candidate keywords, candidate keywords and from picture the key word of identification public
Substring, from picture identification key word mutual information in corresponding text of the geometric distance in picture, candidate keywords, with
And from picture identification key word and candidate keywords between editing distance.
The size description information importance of the key word of identification from picture.From picture, the key word of identification is more big then more
Can illustrate that picture wants to present to the information of user in itself, more can represent the meaning of this picture.For example, it is possible to pass through following formula(4)
Using the size normalization of the key word of identification from picture one of as features described above.
Wherein, NormalizationiRepresent the normalized size of i-th key word of identification from picture, SizeiTable
Show the size of not normalized i-th key word, Max (Size) represents the size of that maximum key word.
One of skill in the art will understand that, not necessarily execute normalization, and can be directly using the size of key word.
Candidate keywords are from web page contents text, and its position being located has different weights, such as title, pluck
Will, content there are different weight meanings, so candidate keywords position in the text be a key feature.
The public substring of candidate keywords C and the key word O identifying from picture represents that the candidate extracting from webpage is closed
The similarity degree of the keyword C and key word O of identification from picture.So public substring number also have impact on and select institute candidate
The credibility of key word.
The text composition of picture illustrates the coupling degree of dependence of the important information of picture in fact.From geometric angle,
Multiple character arrangements of picture closely illustrate that they are representing same meaning, or in one activity of supplementary notes and
In the characteristic of product, therefore text, the co-occurrence degree of multiple characters more can explain in detail the information of picture, using character recognition
Coordinate information is as follows come the feature to extract multiple characters Euclidean distance each other:
X and Y is the key word of identification from picture respectively, subscript left, and right, on, down represent respectively from picture
Left and right, the upper and lower coordinate of the key word of identification.
Candidate keywords in the text of subject web page each other mutual information its text degree of dependence each other is described, its
Mutual information is bigger, and co-occurrence degree is bigger, and pictorial information is more comprehensive.Mutual information I (A, B) can be calculated as follows:
P (A) represents word X probability in the text, and P (A, B) represents A and B joint probability in the text.
By analysis, one or more of above-mentioned multiple features can be merged, the pass based on identification from picture
Keyword is selecting the candidate keywords in the text in subject web page.To describe based on above-mentioned feature referring to Fig. 4 and Fig. 5
Merge and to produce an example of candidate keywords.Fig. 4 is an example illustrating the picture according to the present invention.Fig. 5 is to illustrate
One example of the search and webpage according to the present invention.
As shown in figure 4, the character in rectangle frame is identified character string.The coordinate of these character strings and normalized big
Little result is as shown in table 1 below.
Table 1:
By way of above-mentioned Feature Fusion, in the webpage from Fig. 5, extract candidate keywords(With rectangle collimation mark
Show).That is, " the pleasantly surprised courteous reception or treatment of the brush credit card is shared only needs half cost ", " leisurely afternoon, please work together and have a cup of coffee ", " tacit agreement is more
Further, cost only needs half " and " satisfied originally simple thing ".
As shown in figure 5, " leisurely afternoon, please work together and have a cup of coffee ", " further, cost only needs half for tacit agreement " and
The key word of identification from picture that " satisfied originally simple thing " is represented with sequence number 4,5 and 6 respectively has the longest public
Substring.And the Euclidean distance of the coordinate of each character in these candidate keywords is close.Can see, by Feature Fusion
Mode can easily from webpage extract candidate keywords.
The example calculations method of vocabulary score explained below.
In order to excavate the subject key words of picture, make full use of the candidate key word information calculating every time, on the one hand will
Candidate keywords are as term next time, the on the other hand points relationship of analysis term and candidate keywords.When one
When term generates candidate keywords, term just has to this candidate keywords sensing, constantly circulation behaviour
Make, candidate keywords will have the sensing of multiple terms, this sensing illustrates that its candidate keywords more can illustrate picture
Information, more can representative picture subject key words.Under this scene, a kind of new vocabulary scoring method is proposed.This algorithm
Information using the key word of identification from picture calculates the relation of term and candidate keywords, and excavates the pass of picture theme
Keyword.This algorithm is related to two kinds of word, term S and candidate keywords C.In initialization, this algorithm pertains only to retrieve
Word S, each term Si(I=1,2 ... ..., n)All it is transfused in searching system, through retrieval, website construction, webpage coupling
And produce candidate keywordsWherein SiRepresent term.Each candidate keywords that will produceAs new term.
Repeat aforesaid operations.
Fig. 6 is the schematic diagram of the linking relationship illustrating term and candidate keywords.
In Fig. 6, each frame represents a term(Or candidate keywords).There are two kinds of frames, the frame table of background blank in Fig. 6
Show the term under init state(Serial number 1,2,3,4,5,6), the frame of background shadow represents newfound candidate keywords,
Wherein, candidate keywords are simultaneously also using the term as retrieval next time(Serial number 7,8).When a term a produces one
Individual candidate keywords, and using this candidate keywords as term b when, term a has a directional arrow to b.Wherein, each
The size of frame be by be pointed at several number and corresponding frame size determine.As shown in fig. 6, the size of frame 4 is by frame 1,2,3 and
7 frame size sums determine, the size of frame 5 is determined by the size sum of frame 2 and 3, the size of frame 7 is only determined by frame 4 size, frame 8
Determined by the size of frame 5.In this example, the size of frame can be understood as the size of vocabulary score.In other words, by more inspections
The vocabulary score of the candidate keywords that rope word and search arrives is bigger.When carrying out the candidate keywords extraction step of next time, preferential choosing
Select the bigger candidate keywords of vocabulary score as term.
It is assumed that there being n term according to one embodiment of present invention(S1,S2,……,Sn)Point to candidate key
Word C, i.e. candidate keywords C can be retrieved by this n term.The computing formula of vocabulary score PR (C) can be:
Wherein PR (Si) represent i-th term S pointing to candidate keywords CiVocabulary score.O(Si) represent to retrieve
Word SiEnter line retrieval and the candidate keywords of generation number.D represents damped coefficient.In formula (7)Represent retrieval
Word SiProduced candidate keywords are equivalent probability.
According to another embodiment of the invention, candidate keywords probability of occurrence in the text can be different
's.The computing formula of vocabulary score PR (C) can be:
PR (C)=(1-d)+d (P (S1→C)×PR(S1)+P(S2→C)×PR(S2)+…+P(Sn→C)×PR(Sn)) (8)
Wherein, P (Si→C) it is by term SiProduce the probability of candidate keywords C, PR (Si) it is term SiVocabulary obtain
Point, wherein, i=1,2 ... ... n, d are damped coefficients.Wherein,
In P (Si→j) in, each term SiCandidate keywords C producingjIt is all by the verification with character identification result
Draw, so the value of its similarity will be used as its weight calculation.
OkRepresent the key word of identification from described picture,Represent and OkDo the candidate keywords calculating,Represent OkWithBetween similarity.RepresentThe probability occurring, because candidate keywords are come
Come from the text in subject web page, so its value is the probability in its candidate key set of words.
Due to the webpage enormous amount of actual treatment, and candidate keywords are also to be on the increase, thus can adopt iteration
Mode calculating vocabulary score.Table 2 is the result of candidate keywords iteration 13 times in example picture, and damped coefficient d takes 0.5.
Table 2:The vocabulary score value of candidate keywords in example picture
Find out from upper table, through successive ignition, one stationary value of vocabulary score value programmable single-chip system of each candidate keywords.And
And learnt by analysis, the size of vocabulary score value explains the key word impact of picture theme.As follows by impact size sequence:
" COSTA ", " credit card ", " leisurely in the afternoon please work together and have a cup of coffee ", " share and only need half cost ", " pleasantly surprised courteous reception or treatment ", " satisfied
The originally simple thing of meaning ", " meeting originally simple thing ", " the originally simple thing of joy ".And " leisurely
In the afternoon please work together and have a cup of coffee ", " satisfied originally simple thing " does not identify correctly in the Text region stage, and at this specially
Can identify in sharp solution, and become subject key words, improve recall rate, solve the problems, such as anticipation.Permissible
See, it is possible to use selected a part of candidate keywords are excavating the subject key words in picture.
Fig. 7 is to illustrate the frame excavating the equipment 700 of subject key words in picture according to an embodiment of the invention
Figure.
As shown in fig. 7, equipment 700 includes initial retrieval word identification module 702, candidate keywords extraction module 704, retrieval
Word selecting module 706 and control module 708.
Initial retrieval word identifies that 702 modules can identify the key word in picture as initial term.
Candidate keywords extraction module 704 can be using the term search subject web page related to picture therefrom to extract
Candidate keywords.
Term selecting module 706 can be according to the candidate keywords of candidate keywords extraction module 704 extraction and candidate
Linking relationship between keyword extracting module 704 execution search term used, selects a part from candidate keywords
Candidate keywords are as candidate keywords extraction module execution 704 next time term used.For example, term selecting module
706 can be searched using the candidate keywords that prioritizing selection is retrieved by more terms as execution next time of candidate keywords extraction module
The term of Suo Suoyong.
Term selecting module 706 carries as candidate keywords selecting a part of candidate keywords from candidate keywords
Delivery block 704 execution next time search term used be also conceivable to from picture the key word of identification and candidate keywords it
Between similarity.In other words, term selecting module 706 can be according to the key word of identification and candidate keywords from picture
Between similarity and according to candidate keywords and search candidate keywords used by term between linking relationship, from time
Select and select a part of candidate keywords in key word as candidate keywords extraction module execution search 704 next time retrieval used
Word.
Control module 708 can control candidate keywords extraction module and term selecting module circulate operation until meeting
Predetermined condition.Wherein, predetermined condition includes the predetermined condition of convergence and/or predetermined number of times.
Fig. 8 is the block diagram of the configuration illustrating term selecting module 706.
As shown in figure 8, term selecting module 706 can include vocabulary score calculation unit 706-2 and term selects
Unit 706-4.
Vocabulary score calculation unit 706-2 can calculate the vocabulary score of each candidate keywords C
Wherein, SiIt is that retrieval candidate is closed
I-th term that keyword C is utilized, PR (Si) it is term SiVocabulary score, O (Si) it is using term SiExamined
The number of candidate keywords produced by rope, wherein, i=1,2 ... ..., n, d are damped coefficients.
Alternatively, vocabulary score calculation unit 706-2 can calculate the vocabulary of each candidate keywords C according to following equation
Score PR (C) is as follows:
PR (C)=(1-d)+d (P (S1→C)×PR(S1)+P(S2→C)×PR(S2)+…+P(Sn→C)×PR(Sn))
Wherein P (Si→C) it is by term SiProduce the probability of candidate keywords C, PR (Si) it is term SiVocabulary obtain
Point, wherein, i=1,2 ... ... n, d are damped coefficients, wherein
OkRepresent the key word of identification from picture,Represent and OkDo the candidate keywords calculating,Represent OkWith
Between similarity,RepresentThe probability occurring.
Term select unit 706-4 can be closed using high candidate keywords C of prioritizing selection vocabulary score PR (C) as candidate
Keyword extraction module execution search 704 next time term used.
Wherein, similarity is to be calculated according to the key word of identification from picture and the feature of candidate keywords.
Feature used by calculating similarity includes at least one in the following:From picture, the key word of identification is big
Little, candidate keywords are in corresponding text position, candidate keywords and from picture the public substring of key word of identification, from
In picture, the key word of identification is in mutual information in corresponding text of the geometric distance in picture, candidate keywords and from figure
Editing distance between the key word of identification and candidate keywords in piece.
Preferably, the character that can be calculated according to the confidence level of the key word of identification from picture in editing distance is replaced
Cost.
During execution retrieval, term used can also include entity name, entity name include from picture identification with
Time, the place vocabulary relevant with title.
Fig. 9 is the equipment 700 ' illustrating the subject key words excavated in picture according to another embodiment of the invention
Block diagram.
The difference of the equipment 700 in the equipment 700 ' in Fig. 9 and Fig. 7 is, equipment 700 ' also includes subject key words and digs
Pick module 710.
Subject key words are excavated module 710 and can be excavated the master in picture using selected a part of candidate keywords
Topic key word.
Figure 10 is the block diagram of the configuration illustrating candidate keywords extraction module 704.
As shown in Figure 10, candidate keywords extraction module 704 can include text matches unit 704-2, subject web page choosing
Select unit 704-4 and candidate keywords extraction unit 706.
Text matches unit 704-2 can carry out text to the recognition result of the webpage searching by term and picture
Coupling.
Subject web page select unit 704-4 can select and picture phase from the webpage searching according to text matches result
The subject web page closing.
Candidate keywords extraction unit 704-6 can extract candidate keywords from subject web page.
To sum up, in the above-described embodiments, according to character recognition(For example, OCR)The result to picture recognition for the technology, using mutual
Networking solutions are retrieved to OCR result, website construction, the selection of webpage coupling and candidate keywords, and according to search key and
The linking relationship of candidate keywords, selects a part of candidate keywords as new term, repeats described web search and retrieval
Selected ci poem selects step until meeting predetermined condition.
Whole framework flow process is made up of one or more of multiple technologies scheme, including entity name identification, search skill
Art, Text Clustering Algorithm, document matches, candidate keywords verification, this patent is edited described in candidate keywords checking procedure
Apart from innovatory algorithm, feature selection and vocabulary scoring method.
According to embodiments of the invention, can be to character recognition(For example, OCR)Result and the various features of the Internet text
Merged, extracted picture(Such as advertising pictures)The similar key of text, by new vocabulary scoring method to multiple passes
Keyword carries out score calculation, and the key word of final ads lock theme.
Describe the ultimate principle of the present invention above in association with specific embodiment, however, it is desirable to it is noted that to this area
It is to be understood that whole or any steps of methods and apparatus of the present invention or part, Ke Yi for those of ordinary skill
Any computing device(Including processor, storage medium etc.)Or in the network of computing device, with hardware, firmware, software or
Combinations thereof is realized, and this is that those of ordinary skill in the art use them in the case of the explanation having read the present invention
Basic programming skill can be achieved with.
Therefore, the purpose of the present invention can also by run on any computing device a program or batch processing Lai
Realize.Described computing device can be known fexible unit.Therefore, the purpose of the present invention can also comprise only by offer
The program product of program code realizing methods described or device is realizing.That is, such program product is also constituted
The present invention, and the storage medium of such program product that is stored with also constitutes the present invention.Obviously, described storage medium can be
Any known storage medium or any storage medium being developed in the future.
In the case that embodiments of the invention are realized by software and/or firmware, from storage medium or network to having
The computer of specialized hardware structure, such as the general purpose computer 1100 shown in Figure 11 installs the program constituting this software, this calculating
Machine, when being provided with various program, is able to carry out various functions etc..
In fig. 11, CPU (CPU) 1101 according in read only memory (ROM) 1102 storage program or from
Storage part 1108 is loaded into the various process of program performing of random access memory (RAM) 1103.In RAM1103, also root
Store the data required when CPU1101 executes various process etc. according to needs.CPU1101, ROM1102 and RAM1103 via
Bus 1104 link each other.Input/output interface 1105 also link to bus 1104.
Components described below link is to input/output interface 1105:Importation 1106(Including keyboard, mouse etc.), output
Part 1107(Including display, such as cathode ray tube (CRT), liquid crystal display (LCD) etc., and speaker etc.), storage part
Divide 1108(Including hard disk etc.), communications portion 1109(Including NIC such as LAN card, modem etc.).Communication unit
Divide 1109 via network such as the Internet execution communication process.As needed, driver 1110 also can link connect to input/output
Mouth 1105.Detachable media 1111 such as disk, CD, magneto-optic disk, semiconductor memory etc. are installed in drive as needed
So that the computer program reading out is installed in storage part 1108 as needed on dynamic device 1110.
In the case that above-mentioned series of processes is realized by software, such as removable from network such as the Internet or storage medium
Unload medium 1111 and the program constituting software is installed.
It will be understood by those of skill in the art that this storage medium is not limited to the journey that is wherein stored with shown in Figure 11
Sequence and equipment are separately distributed to provide a user with the detachable media 1111 of program.The example bag of detachable media 1111
Containing disk (comprising floppy disk (registered trade mark)), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)),
Magneto-optic disk(Comprise mini-disk (MD) (registered trade mark)) and semiconductor memory.Or, storage medium can be ROM1102, deposit
Hard disk comprising in storage part 1108 etc., wherein computer program stored, and it is distributed to user together with the equipment comprising them.
The present invention also proposes a kind of program product of the instruction code of the machine-readable that is stored with.Instruction code is read by machine
When taking and executing, can perform above-mentioned method according to embodiments of the present invention.
Correspondingly, the storage medium for carrying the program product of the instruction code of the above-mentioned machine-readable that is stored with also wraps
Include in disclosure of the invention.Storage medium includes but is not limited to floppy disk, CD, magneto-optic disk, storage card, memory stick etc..
It should be appreciated by those skilled in the art that enumerated at this is exemplary, the invention is not limited in this.
In this manual, " first ", " second " and " n-th " etc. statement be in order to by described feature in word
On distinguish, so that the present invention is explicitly described.Therefore, should not serve to that there is any determinate implication.
As an example, each step of said method and all modules of the said equipment and/or unit can
To be embodied as software, firmware, hardware or a combination thereof, and as the part in relevant device.In said apparatus, each forms mould
Block, unit when being configured by way of software, firmware, hardware or a combination thereof spendable specific means or mode be ability
Known to field technique personnel, will not be described here.
As an example, in the case of being realized by software or firmware, can be from storage medium or network to having
The computer of specialized hardware structure(General purpose computer 1100 for example shown in Figure 11)The program constituting this software, this calculating are installed
Machine, when being provided with various program, is able to carry out various functions etc..
In the description to the specific embodiment of the invention above, for a kind of description of embodiment and/or the feature that illustrates
Can be used in one or more other embodiments in same or similar mode, with the feature in other embodiment
Combined, or substitute the feature in other embodiment.
It should be emphasized that term "comprises/comprising" refers to the presence of feature, key element, step or assembly herein when using, but simultaneously
It is not excluded for other features one or more, the presence of key element, step or assembly or additional.
Additionally, the method for the present invention be not limited to specifications described in time sequencing executing it is also possible to according to it
He time sequencing ground, concurrently or independently execute.Therefore, the execution sequence of the method described in this specification is not to this
Bright technical scope is construed as limiting.
The present invention and its advantage are it should be appreciated that in the essence without departing from the present invention being defined by the claims appended hereto
Various changes, replacement and conversion can be carried out in the case of god and scope.And, the scope of the present invention is not limited only to description institute
The process of description, equipment, means, the specific embodiment of method and steps.One of ordinary skilled in the art is from the present invention's
Disclosure will readily appreciate that, according to the present invention can using the execution function essentially identical to corresponding embodiment in this or
Obtain result, the existing and in the future to be developed process essentially identical with it, equipment, means, method or step.Cause
This, appended claim is directed in the range of them including such process, equipment, means, method or step.
With regard to the embodiment of above example, following remarks is also disclosed.
A kind of method excavating the subject key words in picture of remarks 1., including:
Initial retrieval word identification step, the key word in the described picture of identification is as initial term;
Candidate keywords extraction step, using the described retrieval word and search subject web page related to described picture therefrom to carry
Take candidate keywords;
Term selects step, between the term according to used by described candidate keywords and the described candidate keywords of search
Linking relationship, select a part of candidate keywords to extract as next described candidate keywords from described candidate keywords
Term used by step;And
Repeat described candidate keywords extraction step and described term selects step until meeting predetermined condition.
Method according to remarks 1 for the remarks 2., wherein, described term selects step to include:
According to the similarity between the key word of identification and described candidate keywords from described picture and according to described
The linking relationship between term used by candidate keywords and the described candidate keywords of search, selects from described candidate keywords
Select a part of candidate keywords as the term used by next described candidate keywords extraction step.
Method according to remarks 1 or 2 for the remarks 3., wherein, described according to described candidate keywords and search for described time
The linking relationship between the term used by key word is selected to select a part of candidate keywords conduct from described candidate keywords
The term used by described candidate keywords extraction step of next time includes:Wait selecting a part from described candidate keywords
When selecting key word as term used by next described candidate keywords extraction step, prioritizing selection is examined by more terms
The candidate keywords that rope arrives are as the term used by next described candidate keywords extraction step.
Method according to remarks 3 for the remarks 4., wherein, the candidate that described prioritizing selection is retrieved by more terms is closed
Keyword includes as the term used by next described candidate keywords extraction step:
Calculate vocabulary score PR (C) of each described candidate keywords C,
Wherein, SiIt is to retrieve i-th term that described candidate keywords C are utilized, PR (Si) it is term SiVocabulary score, O
(Si) it is using described term SiEnter the number of candidate keywords produced by line retrieval, wherein, i=1,2 ... ..., n, d are
Damped coefficient;And
Vocabulary score PR (C) of described candidate keywords C is higher, and more candidate keywords C described in prioritizing selection are as next time
The term used by described candidate keywords extraction step.
Method according to remarks 3 for the remarks 5., wherein, the candidate that described prioritizing selection is retrieved by more terms is closed
Keyword includes as the term used by next described candidate keywords extraction step:
Calculate vocabulary score PR (C) of each described candidate keywords C, PR (C)=(1-d)+d (P (S1→C)×PR(S1)+
P(S2→C)×PR(S2)+…+P(Sn→C)×PR(Sn)),
Wherein, P (Si→C) it is by term SiProduce the probability of candidate keywords C, PR (Si) it is term SiVocabulary
Score, wherein, i=1,2 ... ... n, d are damped coefficients,
Wherein,
Wherein, OkRepresent the key word of identification from described picture,Represent and OkDo the candidate keywords calculating,Represent OkWithBetween similarity,RepresentThe probability occurring,
Vocabulary score PR (C) of described candidate keywords C is higher, and more candidate keywords C described in prioritizing selection are as next time
The term used by described candidate keywords extraction step.
Method according to remarks 2 or 5 for the remarks 6., wherein, according to the key word and described of identification from described picture
The feature of candidate keywords is calculating described similarity.
Method according to remarks 6 for the remarks 7., wherein, described feature includes at least one in the following:From institute
State in picture the size of key word of identification, the described candidate keywords position in corresponding text, described candidate keywords and
From described picture the public substring of key word of identification, from described picture identification geometry in described picture for the key word
Distance, described candidate keywords the mutual information in corresponding text and from described picture identification key word and described time
Select the editing distance between key word.
Method according to remarks 7 for the remarks 8., wherein, according to from described picture identification key word confidence level Lai
Calculate the cost that the character in described editing distance is replaced.
Method according to any one of remarks 1 to 8 for the remarks 9., wherein, described term also includes entity name, institute
State that entity name includes from described picture identification with time, place and the relevant vocabulary of title.
Method according to any one of remarks 1 to 8 for the remarks 10., also includes:
Excavate the subject key words in described picture using selected a part of candidate keywords.
Method according to any one of remarks 1 to 8 for the remarks 11., wherein, described candidate keywords extraction step bag
Include:
Text matches are carried out to the recognition result of the webpage being searched by described term and described picture;
The subject web page related to described picture is selected from the webpage searching according to text matches result;And
Described candidate keywords are extracted from described subject web page.
Method according to any one of remarks 1 to 8 for the remarks 12., wherein, described predetermined condition includes predetermined convergence
Condition and/or predetermined number of times.
A kind of equipment excavating the subject key words in picture of remarks 13., including:
Initial retrieval word identification module, is arranged to identify key word in described picture as initial term;
Candidate keywords extraction module, is arranged to the subject web related to described picture using the search of described term
Page is therefrom to extract candidate keywords;
Term selecting module, is arranged to be held according to described candidate keywords and described candidate keywords extraction unit
The linking relationship between term used by line search, selects a part of candidate keywords as institute from described candidate keywords
State candidate keywords extraction module execution next time term used;And
Control module, is arranged to control described candidate keywords extraction module and the circulation of described term selecting module
Operation is until meeting predetermined condition.
Equipment according to remarks 13 for the remarks 14., wherein, described term selecting module is arranged to:
According to the similarity between the key word of identification and described candidate keywords from described picture and according to described
The linking relationship between term used by candidate keywords and the described candidate keywords of search, selects from described candidate keywords
Select a part of candidate keywords as described candidate keywords extraction module execution next time search term used.
Equipment according to remarks 13 or 14 for the remarks 15., wherein, described term selecting module is arranged to preferentially
Select the candidate keywords being retrieved by more terms used as the execution next time search of described candidate keywords extraction module
Term.
Equipment according to remarks 15 for the remarks 16., wherein, described term selecting module includes:
Vocabulary score calculation unit, is arranged to calculate vocabulary score PR (C) of each described candidate keywords C,Wherein, SiIt is to retrieve described candidate keywords
I-th term that C is utilized, PR (Si) it is term SiVocabulary score, O (Si) it is using described term SiExamined
The number of candidate keywords produced by rope, wherein, i=1,2 ... ..., n, d are damped coefficients;And
Term select unit, is arranged to the high described candidate of prioritizing selection vocabulary score PR (C)
Key word C is as described candidate keywords extraction module execution next time search term used.
Equipment according to remarks 15 for the remarks 17., wherein, described term selecting module includes:
Vocabulary score calculation unit, is arranged to calculate the vocabulary of each described candidate keywords C according to following equation
Score PR (C):
PR (C)=(1-d)+d (P (S1→C)×PR(S1)+P(S2→C)×PR(S2)+…+P(Sn→C)×PR(Sn))
Wherein P (Si→C) it is by term SiProduce the probability of candidate keywords C, PR (Si) it is term SiVocabulary obtain
Point, wherein, i=1,2 ... ... n, d are damped coefficients, wherein
OkRepresent the key word of identification from described picture,Represent and OkDo the candidate keywords calculating,Represent Ok
WithBetween similarity,RepresentThe probability occurring,
Term select unit, is arranged to the high described candidate keywords C conduct of prioritizing selection vocabulary score PR (C)
Described candidate keywords extraction module execution next time search term used.
Equipment according to remarks 14 or 17 for the remarks 18., wherein, described similarity is to identify according to from described picture
Key word and described candidate keywords feature calculating.
Equipment according to remarks 18 for the remarks 19., wherein, described feature includes at least one in the following:From
The size of key word of identification, position in corresponding text for the described candidate keywords, described candidate keywords in described picture
With the public substring of the key word of identification from described picture, from described picture, the key word of identification is several in described picture
What distance, described candidate keywords the mutual information in corresponding text and from described picture identification key word and described
Editing distance between candidate keywords.
Equipment according to remarks 19 for the remarks 20., wherein, according to the confidence level of the key word of identification from described picture
To calculate the cost of the replacement of the character in described editing distance.
Equipment according to any one of remarks 13 to 20 for the remarks 21., wherein, described term also includes physical name
Claim, described entity name include from described picture identification with time, place and the relevant vocabulary of title.
Equipment according to any one of remarks 13 to 20 for the remarks 22., also includes:
Subject key words excavate module, are arranged to excavate described figure using selected a part of candidate keywords
Subject key words in piece.
Equipment according to any one of remarks 13 to 20 for the remarks 23., wherein, described candidate keywords extraction module bag
Include:
Text matches unit, is arranged to the identification knot to the webpage searching by described term and described picture
Fruit carries out text matches;
Subject web page select unit, be arranged to according to text matches result select from the webpage searching with described
The related subject web page of picture;And
Candidate keywords extraction unit, is arranged to extract described candidate keywords from described subject web page.
Method according to any one of remarks 13 to 20 for the remarks 24., wherein, described predetermined condition includes predetermined receipts
Hold back condition and/or predetermined number of times.
Claims (10)
1. a kind of method excavating the subject key words in picture, including:
Initial retrieval word identification step, the key word in the described picture of identification is as initial term;
Candidate keywords extraction step, using the described retrieval word and search subject web page related to described picture therefrom to extract time
Select key word;
Term selects step, the chain between term according to used by described candidate keywords and the described candidate keywords of search
Connect relation, select a part of candidate keywords from described candidate keywords as next described candidate keywords extraction step
Term used;And
Repeat described candidate keywords extraction step and described term selects step until meeting predetermined condition.
2. method according to claim 1, wherein, described term selects step to include:
According to the similarity between the key word of identification and described candidate keywords from described picture and according to described candidate
The linking relationship between term used by key word and the described candidate keywords of search, selects one from described candidate keywords
Part candidate keywords are as the term used by next described candidate keywords extraction step.
3. method according to claim 1 and 2, wherein, described according to described candidate keywords, described candidate is closed with search
The linking relationship between term used by keyword selects a part of candidate keywords as next time from described candidate keywords
The term used by described candidate keywords extraction step include:Close selecting a part of candidate from described candidate keywords
When keyword is as term used by next described candidate keywords extraction step, prioritizing selection is retrieved by more terms
Candidate keywords as the term used by next described candidate keywords extraction step.
4. method according to claim 3, wherein, the candidate keywords that described prioritizing selection is retrieved by more terms
Include as the term used by next described candidate keywords extraction step:
Calculate vocabulary score PR (C) of each described candidate keywords C,
Wherein, SiIt is to retrieve i-th term that described candidate keywords C are utilized, PR (Si) it is term SiVocabulary score, O
(Si) it is using described term SiEnter the number of candidate keywords produced by line retrieval, wherein, i=1,2 ... ..., n, d are
Damped coefficient;And
Vocabulary score PR (C) of described candidate keywords C is higher, and more candidate keywords C described in prioritizing selection are as next institute
State the term used by candidate keywords extraction step.
5. method according to claim 3, wherein, the candidate keywords that described prioritizing selection is retrieved by more terms
Include as the term used by next described candidate keywords extraction step:
Calculate vocabulary score PR (C) of each described candidate keywords C, PR (C)=(1-d)+d (P (S1→C)×PR(S1)+P
(S2→C)×PR(S2)+…+P(Sn→C)×PR(Sn)),
Wherein, P (Si→C) it is by term SiProduce the probability of candidate keywords C, PR (Si) it is term SiVocabulary score,
Wherein, i=1,2 ... ... n, d are damped coefficients,
Wherein,And
Wherein, P (Si→j) it is by term SiProduce candidate keywords CjProbability, OkRepresent the pass of identification from described picture
Keyword,Represent and OkDo the candidate keywords calculating,Represent OkWithBetween similarity,RepresentThe probability occurring, wherein, j=1 ... ... m,
Vocabulary score PR (C) of described candidate keywords C is higher, and more candidate keywords C described in prioritizing selection are as next institute
State the term used by candidate keywords extraction step.
6. the method according to claim 2 or 5, wherein, according to the key word of identification and described candidate from described picture
The feature of key word is calculating described similarity.
7. method according to claim 6, wherein, described feature includes at least one in the following:From described figure
The size of key word of identification in piece, position in corresponding text for the described candidate keywords, described candidate keywords and from institute
State in picture identification the public substring of key word, from described picture identification geometry in described picture for the key word away from
From, described candidate keywords the mutual information in corresponding text and from described picture the key word of identification and described candidate
Editing distance between key word.
8. method according to claim 7, wherein, calculates according to the confidence level of the key word of identification from described picture
The cost that character in described editing distance is replaced.
9. method according to claim 1, wherein, described candidate keywords extraction step includes:
Text matches are carried out to the recognition result of the webpage being searched by described term and described picture;
The subject web page related to described picture is selected from the webpage searching according to text matches result;And
Described candidate keywords are extracted from described subject web page.
10. a kind of equipment excavating the subject key words in picture, including:
Initial retrieval word identification module, is arranged to identify key word in described picture as initial term;
Candidate keywords extraction module, be arranged to using the described term search subject web page related to described picture with
Therefrom extract candidate keywords;
Term selecting module, is arranged to according to described candidate keywords and searches for the retrieval used by described candidate keywords
Linking relationship between word, selects a part of candidate keywords to extract as described candidate keywords from described candidate keywords
Module searches for the term used by described candidate keywords next time;And
Control module, is arranged to control described candidate keywords extraction module and described term selecting module circulate operation
Until meeting predetermined condition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210246688.2A CN103544186B (en) | 2012-07-16 | 2012-07-16 | The method and apparatus excavating the subject key words in picture |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210246688.2A CN103544186B (en) | 2012-07-16 | 2012-07-16 | The method and apparatus excavating the subject key words in picture |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103544186A CN103544186A (en) | 2014-01-29 |
CN103544186B true CN103544186B (en) | 2017-03-01 |
Family
ID=49967649
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210246688.2A Expired - Fee Related CN103544186B (en) | 2012-07-16 | 2012-07-16 | The method and apparatus excavating the subject key words in picture |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103544186B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105812231B (en) * | 2014-12-29 | 2019-11-05 | 阿里巴巴集团控股有限公司 | The method for quickly identifying and its device of chat record |
CN108572971B (en) * | 2017-03-09 | 2022-11-01 | 百度在线网络技术(北京)有限公司 | Method and device for mining keywords related to search terms |
CN110020042B (en) * | 2017-08-25 | 2021-09-10 | 杭州海康威视数字技术股份有限公司 | Image acquisition method and device based on webpage |
CN107633460A (en) * | 2017-09-18 | 2018-01-26 | 北京奇艺世纪科技有限公司 | Content distribution control method and device |
CN107798070A (en) * | 2017-09-26 | 2018-03-13 | 平安普惠企业管理有限公司 | A kind of web data acquisition methods and terminal device |
CN111488512A (en) * | 2019-01-25 | 2020-08-04 | 深信服科技股份有限公司 | Target to be collected obtaining method, device, equipment and storage medium |
CN111859095A (en) * | 2019-04-02 | 2020-10-30 | 搜狗(杭州)智能科技有限公司 | Picture identification method and device |
CN113590861A (en) * | 2020-04-30 | 2021-11-02 | 北京搜狗科技发展有限公司 | Picture information processing method and device and electronic equipment |
CN112199545B (en) * | 2020-11-23 | 2021-09-07 | 湖南蚁坊软件股份有限公司 | Keyword display method and device based on picture character positioning and storage medium |
CN114547404B (en) * | 2022-01-10 | 2023-02-17 | 普瑞纯证医疗科技(苏州)有限公司 | Big data platform system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1763798A1 (en) * | 2004-06-17 | 2007-03-21 | Nokia Corporation | System and method for search operations |
CN101464903A (en) * | 2009-01-09 | 2009-06-24 | 江阴明伦科技有限公司 | OCR picture and text recognition and retrieval method and system through web mode |
CN102073653A (en) * | 2009-11-20 | 2011-05-25 | 富士通株式会社 | Information extraction method and device |
-
2012
- 2012-07-16 CN CN201210246688.2A patent/CN103544186B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1763798A1 (en) * | 2004-06-17 | 2007-03-21 | Nokia Corporation | System and method for search operations |
CN101464903A (en) * | 2009-01-09 | 2009-06-24 | 江阴明伦科技有限公司 | OCR picture and text recognition and retrieval method and system through web mode |
CN102073653A (en) * | 2009-11-20 | 2011-05-25 | 富士通株式会社 | Information extraction method and device |
Also Published As
Publication number | Publication date |
---|---|
CN103544186A (en) | 2014-01-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103544186B (en) | The method and apparatus excavating the subject key words in picture | |
CN104239300B (en) | The method and apparatus that semantic key words are excavated from text | |
Culotta et al. | Reducing labeling effort for structured prediction tasks | |
US8812299B1 (en) | Class-based language model and use | |
US8082151B2 (en) | System and method of generating responses to text-based messages | |
US9245243B2 (en) | Concept-based analysis of structured and unstructured data using concept inheritance | |
US20110231347A1 (en) | Named Entity Recognition in Query | |
CN101799802B (en) | Method and system for extracting entity relationship by using structural information | |
US20080005051A1 (en) | Lexicon generation methods, computer implemented lexicon editing methods, lexicon generation devices, lexicon editors, and articles of manufacture | |
CN106815307A (en) | Public Culture knowledge mapping platform and its use method | |
CN103365849B (en) | Keyword retrieval method and apparatus | |
US10949452B2 (en) | Constructing content based on multi-sentence compression of source content | |
US20070233668A1 (en) | Method, system, and computer program product for semantic annotation of data in a software system | |
CN103577414B (en) | Data processing method and device | |
KR100835290B1 (en) | System and method for classifying document | |
WO2014081762A1 (en) | Mobile-commerce store generator that automatically extracts and converts data | |
CN110110218B (en) | Identity association method and terminal | |
CN107329770A (en) | The personalized recommendation method repaired for software security BUG | |
JP2004318510A (en) | Original and translation information creating device, its program and its method, original and translation information retrieval device, its program and its method | |
US11663407B2 (en) | Management of text-item recognition systems | |
US20050065947A1 (en) | Thesaurus maintaining system and method | |
CN107784019A (en) | Word treatment method and system are searched in a kind of searching service | |
JP2012221489A (en) | Method and apparatus for efficiently processing query | |
CN117252186A (en) | XAI-based information processing method, device, equipment and storage medium | |
CN103514194B (en) | Determine method and apparatus and the classifier training method of the dependency of language material and entity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170301 Termination date: 20180716 |