CN109635135A - Image index generation method, device, terminal and storage medium - Google Patents
Image index generation method, device, terminal and storage medium Download PDFInfo
- Publication number
- CN109635135A CN109635135A CN201811457455.0A CN201811457455A CN109635135A CN 109635135 A CN109635135 A CN 109635135A CN 201811457455 A CN201811457455 A CN 201811457455A CN 109635135 A CN109635135 A CN 109635135A
- Authority
- CN
- China
- Prior art keywords
- image
- descriptive statement
- index
- keyword
- language description
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000012549 training Methods 0.000 claims description 32
- 238000012790 confirmation Methods 0.000 claims description 19
- 238000013135 deep learning Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000004321 preservation Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 244000000626 Daucus carota Species 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 241000220317 Rosa Species 0.000 description 1
- 108010074506 Transfer Factor Proteins 0.000 description 1
- 235000005770 birds nest Nutrition 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 235000005765 wild carrot Nutrition 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/51—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application discloses a kind of image index generation method, device, terminal and storage mediums.This method comprises: obtaining the first image;Image recognition is carried out to the first image, obtains the corresponding recognition result of at least two objects in the first image;Descriptive statement is generated by language description model;By descriptive statement storage corresponding with the first image, the index of the first image is obtained.In the embodiment of the present application, by identifying the corresponding recognition result of each object included in image, and it is generated by language description model including above-mentioned recognition result, and the descriptive statement for describing the first image, foregoing description sentence is determined as to the index of the image, it is subsequent when user needs to search for the image, word included in the index can be inputted, or word similar in the meaning with word included in the index, terminal can accurately search first image according to the word that user inputs, improve the search efficiency that image is searched in photograph album.
Description
Technical field
The invention relates to search technique field, in particular to a kind of image index generation method, device, terminal and
Storage medium.
Background technique
Currently, being usually mounted with photograph album application program in terminal, which obtains commonly used in storage shooting
Image, the image etc. saved from network.
When the image saved in photograph album is more, if desired user quickly finds oneself institute from the image of above-mentioned preservation
The image needed, then need terminal to establish image index in photograph album, so that only needing when subsequent user needs to search for a certain image
Index corresponding to the image is inputted, terminal can quickly find the image according to the index, and show the image.
Summary of the invention
The embodiment of the present application provides a kind of image index generation method, device, terminal and storage medium.Technical solution is such as
Under:
On the one hand, the embodiment of the present application provides a kind of image index generation method, which comprises
Obtain the first image;
Image recognition is carried out to the first image, at least two objects obtained in the first image are corresponding
Recognition result;
Descriptive statement is generated by language description model, the descriptive statement includes that at least two object respectively corresponds
Recognition result;The descriptive statement is for describing the first image;
By descriptive statement storage corresponding with the first image, the index of the first image is obtained.
On the other hand, the embodiment of the present application provides a kind of image index generating means, and described device includes:
Image collection module, for obtaining the first image;
Picture recognition module obtains in the first image at least for carrying out image recognition to the first image
The corresponding recognition result of two objects;
Sentence generation module, for generating descriptive statement by language description model, the descriptive statement include it is described extremely
Few corresponding recognition result of two objects;The descriptive statement is for describing the first image;
Generation module is indexed, for obtaining first figure for descriptive statement storage corresponding with the first image
The index of picture.
Another aspect, the embodiment of the present application provide a kind of terminal, and the terminal includes processor and memory, the storage
Device is stored with computer program, and the computer program is loaded by the processor and executed to realize that above-mentioned image index generates
Method.
Another aspect, the embodiment of the present application provide a kind of computer readable storage medium, the computer-readable storage medium
Computer program is stored in matter, the computer program is loaded by processor and executed to realize above-mentioned image index generation side
Method.
Technical solution provided by the embodiments of the present application can be brought the following benefits:
By identifying the corresponding recognition result of each object included in image, and pass through language description model
It generates including above-mentioned recognition result, and the descriptive statement for describing the first image, foregoing description sentence is determined as the figure
The index of picture, it is subsequent when user needs to search for the image, can input in the index included word, or with the index
In included word meaning similar in word, terminal can accurately search first figure according to the word that user inputs
Picture improves the search efficiency that image is searched in photograph album.
Detailed description of the invention
Fig. 1 is the flow chart for the image index generation method that the application one embodiment provides;
Fig. 2 is the flow chart for the image index generation method that the application one embodiment provides;
Fig. 3 is the flow chart for the image index generation method that the application one embodiment provides;
Fig. 4 is the flow chart for the image index generation method that the application one embodiment provides;
Fig. 5 is the block diagram for the image index generating means that the application one embodiment provides;
Fig. 6 is the block diagram for the terminal that the application one embodiment provides.
Specific embodiment
To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application embodiment party
Formula is described in further detail.
The embodiment of the present application provides a kind of image index generation method, device, terminal and storage medium, by identifying
The included corresponding recognition result of each object in image, and generated by language description model including above-mentioned identification
As a result, and for describing the descriptive statement of the first image, foregoing description sentence is determined as to the index of the image, it is subsequent to work as user
It when needing to search for the image, can input in the index included word, or contain with word included in the index
Word similar in justice, terminal can accurately search first image according to the word that user inputs, improve and search in photograph album
The search efficiency of rope image.
Technical solution provided by the embodiments of the present application, the executing subject of each step are terminal.Optionally, it is equipped in terminal
Photograph album application program.Terminal can be mobile phone, tablet computer, personal computer etc..
Referring to FIG. 1, it illustrates the flow charts of the image index generation method shown in the application one embodiment.The party
Method may include steps of:
Step 101, the first image is obtained.
In one possible implementation, the first image can be the camera acquired image in terminal.It is optional
Ground is provided with camera in terminal and is equipped with shooting class application program, and when shooting class application program operation, terminal is received
When acting on the trigger signal of the shooting control on current shooting interface, camera acquired image is obtained as the first figure
Picture.
In alternatively possible implementation, the first image is the image in network.Optionally, when display circle of terminal
An image is shown in face, when terminal receives the preservation instruction corresponding to the image, according to preservation instruction from network
The image is obtained as the first image.
In addition, the embodiment of the present application is not construed as limiting the acquisition modes of the first image and opportunity.
Step 102, the first image is identified, obtains the corresponding identification of at least two objects in the first image
As a result.
It may include one or more objects, such as personage, animal, building, landscape etc. in first image.In the application
In embodiment, terminal determines classification belonging to each object difference as follows: by image recognition model to the first figure
As carrying out image recognition, classification belonging at least two objects difference in the first image is obtained.
Image recognition model is trained to obtain using multiple sample images to deep learning network, multiple sample graphs
The object to be identified in each sample image as in is corresponding with tag along sort.In some embodiments of the present application, image is known
Other model includes: an input layer, at least one convolutional layer (such as including the first convolutional layer, the second convolutional layer and third convolution
Layer totally 3 convolutional layer), at least one full articulamentum (for example include that the first full articulamentum and the second full articulamentum connect for 2 totally entirely
Layer) and an output layer.The input data of input layer is the first image, and the output of output layer is the result is that first image is wrapped
Classification belonging at least two objects difference included.Image recognition processes are as follows: the first image is input to image recognition model
Input layer, the feature of first image is extracted by the convolutional layer of image recognition model, then by image recognition model entirely connecting
It connects layer features described above is combined and is abstracted, obtains being suitable for the data that output layer is classified, finally be exported by output layer
The corresponding recognition result of at least two objects included by first image.
In the embodiment of the present application, the specific structure of the convolutional layer to image recognition model and full articulamentum is not construed as limiting,
Image recognition model shown in above-described embodiment is only exemplary and explanatory, is not used to limit the disclosure.In general,
The number of plies of convolutional neural networks is more, and effect is better but the calculating time also can be longer, in practical applications, in combination with to identification essence
The requirement of degree and efficiency, designs the convolutional neural networks of the appropriate number of plies.
Sample image refers to previously selected picture for being trained to deep learning network.Samples pictures have
Scene tag, the scene tag of samples pictures is usually by manually determining, for describing the corresponding scene of samples pictures, article, people
Object etc..
Optionally, alexNet network, VGG-16 network, GoogleNet network, Deep can be used in deep learning network
Residual Learning (study of depth residual error) network etc., the embodiment of the present application is not construed as limiting this.In addition, training is deep
Spending used algorithm when learning network obtains image recognition model can be BP (Back-Propagation, backpropagation calculation
Method), faster RCNN (Regions with Convolutional Neural Network, region convolutional neural networks) calculate
Method etc., the embodiment of the present application is not construed as limiting this.
Below by train deep learning network when obtaining image recognition model used algorithm be BP algorithm for, to figure
As the training process of identification model is explained: being randomly provided each layer in deep learning network of parameter first;Secondly by sample
This image inputs deep learning network, obtains recognition result;Then recognition result is compared with tag along sort, is identified
As a result the error between tag along sort;Finally based on each layer in above-mentioned error transfer factor deep learning network of parameter, repeat
Above-mentioned steps obtain image recognition model until the error between recognition result and tag along sort is less than default value at this time.
Step 103, descriptive statement is generated by language description model.
Descriptive statement is for describing the first image.It include the corresponding identification knot of at least two objects in descriptive statement
Fruit.It optionally, further include other words in descriptive statement, which can be used for describing following at least one: at least two
Movement that positional relationship, certain an object between a object are carrying out, certain an object state in which etc..Illustratively,
First image is identified, obtaining the object in the first image includes dog and meadow, and posture of the dog on meadow is
It runs, then the corresponding descriptive statement of the first image is " dog run on turf dynamic ".
In some embodiments of the present application, language description model includes: an input layer, at least one convolutional layer (ratio
Such as include the first convolutional layer, the second convolutional layer and third convolutional layer totally 3 convolutional layers), at least one full articulamentum (such as including
First full articulamentum and the second full articulamentum totally 2 full articulamentums) and an output layer.The input data of input layer is first
Tag along sort belonging to object in image and the first image, the output of output layer the result is that this first image is corresponding retouches
Predicate sentence.The generating process of descriptive statement is as follows: tag along sort belonging to the object in the first image and the first image is defeated
Enter to the input layer of language description model, the feature of above-mentioned input content is extracted by the convolutional layer of language description model, then
Features described above is combined and is abstracted by the full articulamentum of language description model, obtains being suitable for the number that output layer is classified
According to finally exporting the corresponding descriptive statement of the first image by output layer.
In the embodiment of the present application, the specific structure of the convolutional layer to language description model and full articulamentum is not construed as limiting,
Language description model shown in above-described embodiment is only exemplary and explanatory, is not used to limit the disclosure.In general,
The number of plies of convolutional neural networks is more, and effect is better but the calculating time also can be longer, in practical applications, in combination with to identification essence
The requirement of degree and efficiency, designs the convolutional neural networks of the appropriate number of plies.
Optionally, step 103 may include following sub-step:
Optionally, step 103 may be implemented are as follows:
Recognition result is converted into the first term vector by step 103a;
Step 103b handles the first term vector by language description model, obtains descriptive statement.
In the embodiment of the present application, recognition result is converted into corresponding term vector by term vector model by terminal, and will
Above-mentioned term vector input language descriptive model exports descriptive statement by language description model.Above-mentioned term vector model can be
Word2vec model.
Optionally, step 103b is also implemented as:
Step 103b1 obtains the position letter of the first image when the first image is the image that terminal is acquired by camera
Breath.
Location information is converted into the second term vector by step 103b2;
Step 103b3 is handled the first term vector and the second term vector by language description model, obtains description language
Sentence.
Location information is used to indicate geographical location when the first image of shooting.The location information can be by determining in terminal
Hyte part obtains.The mode that location information is converted into term vector can not be repeated herein with reference to step 103a.In the application
In embodiment, the corresponding descriptive statement of the first image is generated by the geographical location in conjunction with the first image of shooting, it can be more
It is fully described by first image, subsequent user can search for first image by multiple and different keywords, and promotion is searched
The convenience of rope.
Illustratively, the first image is identified, obtaining the object in the first image includes dog and meadow, and the dog
Posture on meadow is to run, in addition, the geographical location for shooting first image is the park XX, then first image is corresponding
Descriptive statement is " dog is dynamic in running on turf for the park xx ".
Step 104, by descriptive statement storage corresponding with the first image, the index of the first image is obtained.
Descriptive statement is carried out corresponding storage with the first image by terminal, obtains the index of the first image.If subsequent user needs
Search first image, then need to only input at least one word that the descriptive statement includes, or in the descriptive statement
Similarity between word is greater than the word of preset threshold, then terminal can find first figure according to the word that user inputs
Picture, and first image is showed into user.
In addition, the embodiment of the present application is not construed as limiting the path of storage descriptive statement and the first image, it can be by terminal
It presets, setting can also be customized by the user.
In conclusion technical solution provided by the embodiments of the present application, by identifying each object included in image
Corresponding recognition result, and generated by language description model including above-mentioned recognition result, and for describing the first figure
Foregoing description sentence is determined as the index of the image by the descriptive statement of picture, subsequent when user needs to search for the image, can be with
Included word is inputted in the index, or word similar in the meaning with word included in the index, terminal can be with
First image is accurately searched according to the word that user inputs, improves the search efficiency for searching for image in photograph album.
Referring to FIG. 2, the flow chart of the image index generation method provided it illustrates the application one embodiment.The party
Method may include steps of:
Step 201, the first image is obtained.
Step 202, image recognition is carried out to the first image, at least two objects obtained in the first image are corresponding
Recognition result.
Step 203, descriptive statement is generated by language description model.
Step 204, inquiry message is shown.
Inquiry message is used to ask whether the index that confirmation generates the first image.Illustratively, inquiry message is the " image
Corresponding descriptive statement is " seeing concert in Bird's Nest ", if confirmation? ".
In the embodiment of the present application, user can be with preview by language description model descriptive statement generated, and determines
Whether the descriptive statement of above-mentioned generation is determined as to the index of the first image.
Step 205, it when receiving the confirmation instruction corresponding to inquiry message, deposits descriptive statement is corresponding with the first image
Storage, obtains the index of the first image.
If user determines the index that the descriptive statement of the generation is determined as to the image, which can be assigned
Confirmation instruction.Confirmation instruction corresponding to inquiry message is used to indicate confirmation and the descriptive statement of the generation is determined as the image
Index.Optionally, the side of inquiry message shows confirmation control, when terminal receive act on the confirmation control triggering letter
Number when, receive corresponding to inquiry message confirmation instruction.
Step 206, when not receiving confirmation instruction, input frame is shown.
Input frame is for inputting the corresponding descriptive statement of the first image.Optionally, when terminal does not receive within a preset time
To the trigger signal for acting on the confirmation control, then terminal does not receive confirmation instruction.Optionally, the side of inquiry message is also aobvious
It is shown with and denies control, when terminal receives the trigger signal for denying control corresponding to this, then terminal does not receive confirmation instruction.
Step 207, the sentence inputted in input frame is received.
In the embodiment of the present application, when descriptive statement of the user to generation is dissatisfied, it can voluntarily input and describe the mesh
The descriptive statement of logo image.
Step 208, by the storage corresponding with the first image of the sentence of input, the index of the first image is obtained.
In conclusion technical solution provided by the embodiments of the present application, by being unsatisfied with terminal description generated in user
In the case where sentence, the corresponding descriptive statement of the image is voluntarily inputted by user, so that subsequent user can be according to itself institute
The descriptive statement of input scans for the image.
After the index for generating the first image, user can search for the first image according to the index in photograph album.Below
The search process is explained.In an alternative embodiment based on Fig. 1 or embodiment illustrated in fig. 2 offer, in step 104
Later, alternatively, after step 208, which further includes following steps:
Step 301, search box is shown.
Search box is used to input search key for user, matches so that terminal can be searched with the search key
Image.In one possible implementation, the search box is shown in the main interface of photograph album application program.Another kind can
In the implementation of energy, the main interface of photograph album application program shows search control, when user triggers the search control, terminal
The trigger signal corresponding to the search control is received, and search box is shown according to the trigger signal.
Step 302, the first keyword inputted in search box is received.
First keyword is inputted by user, can be " the Forbidden City ", " cat " " rose " etc., the embodiment of the present application is to this
It is not construed as limiting.
Step 302, the second image to match with the first keyword is searched in photograph album.
The quantity of second image can be one, be also possible to multiple.The corresponding descriptive statement of second image is for describing
Second image.It include first object keyword in the corresponding descriptive statement of second image.First object keyword can be
The included corresponding recognition result of object, is also possible in descriptive statement other words in addition to recognition result in two images
Language, the embodiment of the present application are not construed as limiting this.By the above-mentioned means, user can search for same figure by different keywords
Picture reduces the difficulty of search image.
Similarity between first object keyword and the first keyword meets preset condition.Above-mentioned preset condition can be
Similarity between first object keyword and the first keyword is greater than preset threshold, and above-mentioned preset threshold can be according to practical need
Setting is asked, the embodiment of the present application is not construed as limiting this.
In the embodiment of the present application, terminal first calculates word included by each descriptive statement that terminal is stored and
The word that similarity between first keyword meets preset condition is determined as later by the similarity between one keyword
One target keywords, finally using the corresponding image of the descriptive statement comprising the first object keyword as with the first keyword phase
Matched second image.
In addition, the embodiment of the present application is calculated in the following way between word included by the first keyword and descriptive statement
Similarity: the first key table is shown as primary vector by term vector model by terminal, by word included by descriptive statement
It is expressed as secondary vector, later by calculating the COS distance between primary vector and secondary vector, to calculate the first keyword
With the similarity between word included by descriptive statement.
Step 304, the second image to match with the first keyword is shown.
Terminal shows second image in result of page searching.When the quantity of the second image is multiple, terminal can be with
According to the size of the similarity between first object keyword and the first keyword, to be ranked up to the second image.Optionally,
Similarity between first object keyword and the first keyword is bigger, then includes the descriptive statement pair of the first object keyword
The second image answered putting in order in result of page searching is more forward;Between first object keyword and the first keyword
Similarity is smaller, then row of corresponding second image of descriptive statement comprising the first object keyword in result of page searching
Column sequence is more rearward.
In conclusion technical solution provided by the embodiments of the present application, by according to foregoing embodiments image rope generated
Attract carry out picture search, user need to only input word included in the index, or with word included in the index
Meaning similar in word, terminal can according to user input word accurately search the image, improve and searched in photograph album
The search efficiency of rope image.
When user inputs the first keyword, terminal is more according to the quantity of the second image of first keyword search
When, user, which needs to filter out in the second more image, at this time oneself it is expected the image that searches, and search efficiency is still more
Lowly.
Referring to FIG. 4, the flow chart of the image index generation method provided it illustrates the application one embodiment.The figure
As index generation method can be used for solving the second image arrived according to the first keyword search it is more when, search efficiency is low to ask
Topic.This method comprises the following steps:
Step 401, search box is shown.
Step 402, the first keyword inputted in search box is received.
Step 403, the second image to match with the first keyword is searched in photograph album.
Step 404, when the quantity of the second image is greater than preset quantity, display reminding information.
Preset quantity can be set according to actual needs, and the embodiment of the present application is not construed as limiting this.Illustratively, present count
Amount is 10.Prompt information is for prompting the second keyword of input.Second keyword is different from the first keyword.
In the embodiment of the present application, for terminal when finding the second image to match with the first keyword, first detection should
Whether the quantity of the second image is greater than preset quantity.It is directly aobvious if the quantity of second image is less than or equal to preset quantity
Show second image.If the quantity of the second image is greater than preset quantity, user is prompted to input more keywords, so that eventually
End is matched in the continuous screening of above-mentioned the second image relay to match with the first keyword for the first keyword, the second keyword
The second image.
Step 405, the second keyword is obtained.
Second keyword is also inputted by user, different from the first keyword.
Step 406, search and matched second image of the first keyword, the second keyword in photograph album.
It include first object keyword and the second target keywords in the corresponding descriptive statement of second image.Second target is closed
Similarity between key word and the second keyword meets the second preset condition.Above-mentioned second preset condition can be the second target pass
Similarity between key word and the second keyword is greater than preset threshold, and above-mentioned preset threshold can be set according to actual needs, this
Application embodiment is not construed as limiting this.
In the embodiment of the present application, terminal first calculates word included by each descriptive statement that terminal is stored and
Included by each descriptive statement that similarity and terminal between one keyword are stored between word and the second keyword
Similarity;The word that the similarity between the first keyword meets the first preset condition is determined as first object later to close
The word that similarity between second keyword meets the second preset condition is determined as the second target keywords by key word;Most
Afterwards using the corresponding image of the descriptive statement comprising the first object keyword and the second target keywords as with the first keyword,
Matched second image of second keyword.In addition, the similarity between word included by the second keyword and descriptive statement
Calculation can refer to step 303, not repeat herein.
Step 407, display and matched second image of the first keyword, the second keyword.
In the embodiment of the present application, the second image herein refers to and the first keyword, the second keyword matched
Two images.
In conclusion technical solution provided by the embodiments of the present application, by when search result is excessive, prompting user's input
More keywords promote picture search so that terminal can carry out picture search according to the keyword inputted respectively twice
Accuracy.
Mentioned in Fig. 1 embodiment, language description model be it is trained in advance, at least two words to be encoded into
The model of whole sentence.The training process of language description model is explained below.
Step 501, training sample set is obtained.
Training sample set includes multiple training sample images, in every training sample image in multiple training sample images
Object marking have tag along sort, every training sample image is corresponding with desired descriptive statement.Object in training sample is marked
The tag along sort of note can be marked manually, can also be obtained by image recognition model.It is expected that descriptive statement can be artificial mark
Note.
Step 502, it for every training sample image, is handled by initial language description model, output is practical
Descriptive statement.
Initial language description model can be deep learning network, for example, alexNet network, VGG-16 network,
GoogleNet network, Deep Residual Learning (study of depth residual error) network.Initial language description model it is each
Parameter can be to be set at random, is also possible to rule of thumb to be set by related technical personnel.In the embodiment of the present application,
Every training sample image is inputted to initial language description model, by the initial practical description language of language description model output
Sentence.
Step 503, the error between expectation descriptive statement and practical descriptive statement is calculated.
Optionally, the difference between desired descriptive statement and practical descriptive statement is determined as error by terminal.
After terminal calculates the error between desired descriptive statement and practical descriptive statement, detect whether the error is greater than
Preset threshold.If error is greater than preset threshold, the parameter of initial language description model is adjusted, and from for every trained sample
The step of this image is handled by initial language description model, exports practical descriptive statement restarts to execute, namely
Repeat step 502 and 503.When error is less than or equal to preset threshold, the language description model that training is completed is generated.
Following is the application Installation practice, can be used for executing the application embodiment of the method.It is real for the application device
Undisclosed details in example is applied, the application embodiment of the method is please referred to.
Referring to FIG. 5, the block diagram of the image index generating means provided it illustrates the application one embodiment.The device
Have the function of realizing the above method, the function can also be executed corresponding software realization by hardware realization by hardware.
The device includes:
Image collection module 601, for obtaining the first image.
Picture recognition module 602 obtains in the first image extremely for carrying out image recognition to the first image
Few corresponding recognition result of two objects.
Sentence generation module 603, for generating descriptive statement by language description model, the descriptive statement includes described
The corresponding recognition result of at least two objects;The descriptive statement is for describing the first image.
Generation module 604 is indexed, for obtaining described first for descriptive statement storage corresponding with the first image
The index of image.
In conclusion technical solution provided by the embodiments of the present application, by identifying each object included in image
Corresponding recognition result, and generated by language description model including above-mentioned recognition result, and for describing the first figure
Foregoing description sentence is determined as the index of the image by the descriptive statement of picture, subsequent when user needs to search for the image, can be with
Included word is inputted in the index, or word similar in the meaning with word included in the index, terminal can be with
First image is accurately searched according to the word that user inputs, improves the search efficiency for searching for image in photograph album.
In the alternative embodiment provided based on embodiment illustrated in fig. 5, the sentence generates model 603, is used for:
The recognition result is converted into the first term vector;
First term vector is handled by the language description model, obtains the descriptive statement.
Optionally, the sentence generation module 603, is used for:
When the first image is the image that terminal is acquired by camera, the position letter of the first image is obtained
Breath, the location information are used to indicate geographical location when shooting the first image;
The location information is converted into the second term vector;
First term vector and the second term vector are handled by the language description model, obtain the description
Sentence.
In the alternative embodiment provided based on embodiment illustrated in fig. 5, described device further include: the first display module
(not shown).
First display module, for showing inquiry message, the inquiry message is for asking whether that confirmation generates described the
The index of one image.
The index generation module 604, for executing institute when receiving the confirmation instruction corresponding to the inquiry message
The step of stating descriptive statement storage corresponding with the first image, obtaining the index of the first image.
Optionally, described device further include: the second display module and the first receiving module (not shown).
Second display module, for showing input frame, the input frame is for defeated when not receiving confirmation instruction
Enter the corresponding descriptive statement of the first image.
First receiving module, for receiving the sentence in input frame input.
The index generation module 604 is also used to obtain the storage corresponding with the first image of the sentence of the input
The index of the first image.
In the alternative embodiment provided based on embodiment illustrated in fig. 5, described device further include: third shows mould
Block, the second receiving module, search module, the 4th display module (not shown).
Third display module, for showing search box.
Second receiving module, for receiving the first keyword in the input of described search frame.
Search module, for searching for the second image to match with first keyword, second figure in photograph album
As including first object keyword in corresponding descriptive statement, between the first object keyword and first keyword
Similarity meets the first preset condition.
4th display module, for showing second image.
Optionally, described device further include: the 5th display module (not shown).
5th display module, for when the quantity of second image is greater than preset quantity, display reminding information to be described
Prompt information is for prompting the second keyword of input.
Second receiving module, for obtaining second keyword.
Search module, is also used to search in the photograph album and matches with first keyword, second keyword
The second image, include the first object keyword and the second target critical in the corresponding descriptive statement of second image
Word, the similarity between second target keywords and second keyword meet the second preset condition.
In the alternative embodiment provided based on embodiment illustrated in fig. 5, described image identification module 602, for leading to
It crosses image recognition model and image recognition is carried out to the first image, obtain at least two objects difference in the first image
Corresponding recognition result;Wherein, described image identification model is to be trained using multiple sample images to deep learning network
It obtains, the object to be identified in each sample image in the multiple sample image is corresponding with tag along sort.
In the alternative embodiment provided based on embodiment illustrated in fig. 5, described device further include: training module (figure
In be not shown).
Training module is used for:
Training sample set is obtained, the training sample set includes multiple training sample images, multiple described training sample figures
The object marking in every training sample image as in has tag along sort, and every training sample image is corresponding with expectation and retouches
Predicate sentence;
It for every training sample image, is handled by initial language description model, exports practical description
Sentence;
Calculate the error between the expectation descriptive statement and the practical descriptive statement;
When the error is greater than preset threshold, then the parameter of the initial language description model is adjusted, and from described
It for every training sample image, is handled by initial language description model, exports the step of practical descriptive statement
Suddenly restart to execute;When the error is less than or equal to the preset threshold, the language description model is generated.
It should be noted that device provided by the above embodiment is when realizing its function, only with above-mentioned each functional module
It divides and carries out for example, can according to need in practical application and be completed by different functional modules above-mentioned function distribution,
The internal structure of equipment is divided into different functional modules, to complete all or part of the functions described above.In addition,
Apparatus and method embodiment provided by the above embodiment belongs to same design, and specific implementation process is detailed in embodiment of the method, this
In repeat no more.
With reference to Fig. 6, it illustrates the structural block diagrams for the terminal that one exemplary embodiment of the application provides.In the application
Terminal may include one or more such as lower component: processor 610 and memory 620.
Processor 610 may include one or more processing core.Processor 610 utilizes various interfaces and connection
Various pieces in entire terminal, by running or executing the instruction being stored in memory 620, program, code set or instruction
Collection, and the data being stored in memory 620 are called, execute the various functions and processing data of terminal.Optionally, processor
610 can use Digital Signal Processing (Digital Signal Processing, DSP), field programmable gate array
(Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic
Array, PLA) at least one of example, in hardware realize.Processor 610 can integrating central processor (Central
Processing Unit, CPU) and one or more of modem etc. combination.Wherein, the main processing operation system of CPU
System and application program etc.;Modem is for handling wireless communication.It is understood that above-mentioned modem can not also
It is integrated into processor 610, is realized separately through chip piece.
Optionally, above-mentioned each embodiment of the method mentions under realizing when processor 610 executes the program instruction in memory 620
The image index generation method of confession.
Memory 620 may include random access memory (Random Access Memory, RAM), also may include read-only
Memory (Read-Only Memory).Optionally, which includes non-transient computer-readable medium (non-
transitory computer-readable storage medium).Memory 620 can be used for store instruction, program, generation
Code, code set or instruction set.Memory 620 may include storing program area and storage data area, wherein storing program area can store
Instruction for realizing operating system, the instruction at least one function, for realizing the finger of above-mentioned each embodiment of the method
Enable etc.;Storage data area, which can be stored, uses created data etc. according to terminal.
The structure of above-mentioned terminal is only illustrative, and in actual implementation, terminal may include more or fewer components,
Such as: display screen etc., the present embodiment is not construed as limiting this.
It will be understood by those skilled in the art that the restriction of structure shown in Fig. 6 not structure paired terminal 600, can wrap
It includes than illustrating more or fewer components, perhaps combine certain components or is arranged using different components.
One exemplary embodiment of the application also provides a kind of computer readable storage medium, is stored thereon with computer journey
Sequence, the computer program realize the localization method that above-mentioned each embodiment of the method provides when being loaded and executed by processor.
One exemplary embodiment of the application additionally provides a kind of computer program product comprising instruction, when it is in computer
When upper operation, so that computer executes localization method described in above-mentioned each embodiment.
It should be understood that referenced herein " multiple " refer to two or more."and/or", description association
The incidence relation of object indicates may exist three kinds of relationships, for example, A and/or B, can indicate: individualism A exists simultaneously A
And B, individualism B these three situations.Character "/" typicallys represent the relationship that forward-backward correlation object is a kind of "or".
Above-mentioned the embodiment of the present application serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.
The foregoing is merely the exemplary embodiments of the application, all in spirit herein not to limit the application
Within principle, any modification, equivalent replacement, improvement and so on be should be included within the scope of protection of this application.
Claims (12)
1. a kind of image index generation method, which is characterized in that the described method includes:
Obtain the first image;
Image recognition is carried out to the first image, obtains the corresponding identification of at least two objects in the first image
As a result;
Descriptive statement is generated by language description model, the descriptive statement includes the corresponding knowledge of at least two object
Other result;The descriptive statement is for describing the first image;
By descriptive statement storage corresponding with the first image, the index of the first image is obtained.
2. the method according to claim 1, wherein described generate descriptive statement, packet by language description model
It includes:
The recognition result is converted into the first term vector;
First term vector is handled by the language description model, obtains the descriptive statement.
3. according to right want 2 described in method, which is characterized in that it is described by the language description model to first word to
Amount is handled, and the descriptive statement is obtained, comprising:
When the first image is the image that terminal is acquired by camera, the location information of the first image, institute are obtained
State geographical location when location information is used to indicate shooting the first image;
The location information is converted into the second term vector;
First term vector and second term vector are handled by the language description model, obtain the description
Sentence.
4. the method according to claim 1, wherein described that the descriptive statement is corresponding with the first image
Storage, before obtaining the index of the first image, further includes:
Show that inquiry message, the inquiry message are used to ask whether the index that confirmation generates the first image;
When receiving the confirmation instruction corresponding to the inquiry message, execute described by the descriptive statement and first figure
As corresponding the step of storing, obtaining the index of the first image.
5. according to the method described in claim 4, it is characterized in that, after the display inquiry message, further includes:
When not receiving confirmation instruction, show that input frame, the input frame are corresponding for inputting the first image
Descriptive statement;
Receive the sentence inputted in the input frame;
By the storage corresponding with the first image of the sentence of the input, the index of the first image is obtained.
6. method according to any one of claims 1 to 5, which is characterized in that described by the descriptive statement and described the
The corresponding storage of one image, after obtaining the index of the first image, further includes:
Show search box;
Receive the first keyword inputted in described search frame;
The second image to match with first keyword is searched in photograph album, in the corresponding descriptive statement of second image
Including first object keyword, it is default that the similarity between the first object keyword and first keyword meets first
Condition;
Show second image.
7. according to the method described in claim 6, it is characterized in that, before display second image, further includes:
When the quantity of second image is greater than preset quantity, display reminding information, the prompt information is for prompting input
Second keyword;
Obtain second keyword;
Search and first keyword, matched second image of second keyword in the photograph album, described second
It include the first object keyword and the second target keywords, second target keywords in the corresponding descriptive statement of image
Similarity between second keyword meets the second preset condition.
8. method according to any one of claims 1 to 5, which is characterized in that described to carry out image to the first image
Identification, obtains the corresponding recognition result of at least two objects in the first image, comprising:
Image recognition is carried out to the first image by image recognition model, obtains at least two pairs in the first image
As corresponding recognition result;Wherein, described image identification model be using multiple sample images to deep learning network into
Row training obtains, and the object in each sample image in the multiple sample image is corresponding with tag along sort.
9. method according to any one of claims 1 to 5, which is characterized in that described to be retouched by the generation of language description model
Before predicate sentence, further includes:
Training sample set is obtained, the training sample set includes multiple training sample images, in multiple described training sample images
Every training sample image in object marking have a tag along sort, every training sample image is corresponding with expectation and describes language
Sentence;
It for every training sample image, is handled by initial language description model, exports practical descriptive statement;
Calculate the error between the expectation descriptive statement and the practical descriptive statement;
When the error is greater than preset threshold, then adjust the parameter of the initial language description model, and from it is described for
The step of every training sample image is handled by initial language description model, exports practical descriptive statement weight
Newly start to execute;When the error is less than or equal to the preset threshold, the language description model is generated.
10. a kind of image index generating means, which is characterized in that described device includes:
Image collection module, for obtaining the first image;
Picture recognition module obtains at least two in the first image for carrying out image recognition to the first image
The corresponding recognition result of object;
Sentence generation module, for generating descriptive statement by language description model, the descriptive statement includes described at least two
The corresponding recognition result of a object;The descriptive statement is for describing the first image;
Generation module is indexed, for obtaining the first image for descriptive statement storage corresponding with the first image
Index.
11. a kind of terminal, which is characterized in that the terminal includes processor and memory, and the memory is stored with computer
Program, the computer program are loaded by the processor and are executed to realize image as described in any one of claim 1 to 9
Index generation method.
12. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium
Program, the computer program are loaded by processor and are executed to realize image index as described in any one of claim 1 to 9
Generation method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811457455.0A CN109635135A (en) | 2018-11-30 | 2018-11-30 | Image index generation method, device, terminal and storage medium |
PCT/CN2019/115411 WO2020108234A1 (en) | 2018-11-30 | 2019-11-04 | Image index generation method, image search method and apparatus, and terminal, and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811457455.0A CN109635135A (en) | 2018-11-30 | 2018-11-30 | Image index generation method, device, terminal and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109635135A true CN109635135A (en) | 2019-04-16 |
Family
ID=66070700
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811457455.0A Pending CN109635135A (en) | 2018-11-30 | 2018-11-30 | Image index generation method, device, terminal and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109635135A (en) |
WO (1) | WO2020108234A1 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110083729A (en) * | 2019-04-26 | 2019-08-02 | 北京金山数字娱乐科技有限公司 | A kind of method and system of picture search |
CN110362698A (en) * | 2019-07-08 | 2019-10-22 | 北京字节跳动网络技术有限公司 | A kind of pictorial information generation method, device, mobile terminal and storage medium |
CN110704654A (en) * | 2019-09-27 | 2020-01-17 | 三星电子(中国)研发中心 | Picture searching method and device |
CN111046203A (en) * | 2019-12-10 | 2020-04-21 | Oppo广东移动通信有限公司 | Image retrieval method, image retrieval device, storage medium and electronic equipment |
WO2020108234A1 (en) * | 2018-11-30 | 2020-06-04 | Oppo广东移动通信有限公司 | Image index generation method, image search method and apparatus, and terminal, and medium |
CN111797765A (en) * | 2020-07-03 | 2020-10-20 | 北京达佳互联信息技术有限公司 | Image processing method, image processing apparatus, server, and storage medium |
CN112541091A (en) * | 2019-09-23 | 2021-03-23 | 杭州海康威视数字技术股份有限公司 | Image searching method, device, server and storage medium |
CN112711998A (en) * | 2020-12-24 | 2021-04-27 | 珠海新天地科技有限公司 | 3D model annotation system and method |
CN112925939A (en) * | 2019-12-05 | 2021-06-08 | 阿里巴巴集团控股有限公司 | Picture searching method, description information generating method, device and storage medium |
CN113961833A (en) * | 2021-10-20 | 2022-01-21 | 维沃移动通信有限公司 | Information searching method and device and electronic equipment |
CN118037888A (en) * | 2024-02-01 | 2024-05-14 | 嘉达鼎新信息技术(苏州)有限公司 | AI image generation method and system based on image analysis and language description |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113326389A (en) * | 2021-05-26 | 2021-08-31 | 北京沃东天骏信息技术有限公司 | Image index processing method, image index processing device, image index processing apparatus, storage medium, and program |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103136228A (en) * | 2011-11-25 | 2013-06-05 | 阿里巴巴集团控股有限公司 | Image search method and image search device |
CN106446782A (en) * | 2016-08-29 | 2017-02-22 | 北京小米移动软件有限公司 | Image identification method and device |
CN107908770A (en) * | 2017-11-30 | 2018-04-13 | 维沃移动通信有限公司 | A kind of photo searching method and mobile terminal |
CN108021654A (en) * | 2017-12-01 | 2018-05-11 | 北京奇安信科技有限公司 | A kind of photograph album image processing method and device |
CN108509521A (en) * | 2018-03-12 | 2018-09-07 | 华南理工大学 | A kind of image search method automatically generating text index |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103838724B (en) * | 2012-11-20 | 2018-04-13 | 百度在线网络技术(北京)有限公司 | Image search method and device |
CN107766853B (en) * | 2016-08-16 | 2021-08-06 | 阿里巴巴集团控股有限公司 | Image text information generation and display method and electronic equipment |
CN106708940B (en) * | 2016-11-11 | 2020-06-30 | 百度在线网络技术(北京)有限公司 | Method and device for processing pictures |
US11301509B2 (en) * | 2017-01-20 | 2022-04-12 | Rakuten Group, Inc. | Image search system, image search method, and program |
CN109635135A (en) * | 2018-11-30 | 2019-04-16 | Oppo广东移动通信有限公司 | Image index generation method, device, terminal and storage medium |
-
2018
- 2018-11-30 CN CN201811457455.0A patent/CN109635135A/en active Pending
-
2019
- 2019-11-04 WO PCT/CN2019/115411 patent/WO2020108234A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103136228A (en) * | 2011-11-25 | 2013-06-05 | 阿里巴巴集团控股有限公司 | Image search method and image search device |
CN106446782A (en) * | 2016-08-29 | 2017-02-22 | 北京小米移动软件有限公司 | Image identification method and device |
CN107908770A (en) * | 2017-11-30 | 2018-04-13 | 维沃移动通信有限公司 | A kind of photo searching method and mobile terminal |
CN108021654A (en) * | 2017-12-01 | 2018-05-11 | 北京奇安信科技有限公司 | A kind of photograph album image processing method and device |
CN108509521A (en) * | 2018-03-12 | 2018-09-07 | 华南理工大学 | A kind of image search method automatically generating text index |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020108234A1 (en) * | 2018-11-30 | 2020-06-04 | Oppo广东移动通信有限公司 | Image index generation method, image search method and apparatus, and terminal, and medium |
CN110083729A (en) * | 2019-04-26 | 2019-08-02 | 北京金山数字娱乐科技有限公司 | A kind of method and system of picture search |
CN110083729B (en) * | 2019-04-26 | 2023-10-27 | 北京金山数字娱乐科技有限公司 | Image searching method and system |
CN110362698A (en) * | 2019-07-08 | 2019-10-22 | 北京字节跳动网络技术有限公司 | A kind of pictorial information generation method, device, mobile terminal and storage medium |
CN112541091A (en) * | 2019-09-23 | 2021-03-23 | 杭州海康威视数字技术股份有限公司 | Image searching method, device, server and storage medium |
CN110704654A (en) * | 2019-09-27 | 2020-01-17 | 三星电子(中国)研发中心 | Picture searching method and device |
CN112925939A (en) * | 2019-12-05 | 2021-06-08 | 阿里巴巴集团控股有限公司 | Picture searching method, description information generating method, device and storage medium |
CN111046203A (en) * | 2019-12-10 | 2020-04-21 | Oppo广东移动通信有限公司 | Image retrieval method, image retrieval device, storage medium and electronic equipment |
CN111797765A (en) * | 2020-07-03 | 2020-10-20 | 北京达佳互联信息技术有限公司 | Image processing method, image processing apparatus, server, and storage medium |
CN111797765B (en) * | 2020-07-03 | 2024-04-16 | 北京达佳互联信息技术有限公司 | Image processing method, device, server and storage medium |
CN112711998A (en) * | 2020-12-24 | 2021-04-27 | 珠海新天地科技有限公司 | 3D model annotation system and method |
CN113961833A (en) * | 2021-10-20 | 2022-01-21 | 维沃移动通信有限公司 | Information searching method and device and electronic equipment |
CN118037888A (en) * | 2024-02-01 | 2024-05-14 | 嘉达鼎新信息技术(苏州)有限公司 | AI image generation method and system based on image analysis and language description |
CN118037888B (en) * | 2024-02-01 | 2024-10-01 | 嘉达鼎新信息技术(苏州)有限公司 | AI image generation method and system based on image analysis and language description |
Also Published As
Publication number | Publication date |
---|---|
WO2020108234A1 (en) | 2020-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109635135A (en) | Image index generation method, device, terminal and storage medium | |
CN110633745B (en) | Image classification training method and device based on artificial intelligence and storage medium | |
CN112348117B (en) | Scene recognition method, device, computer equipment and storage medium | |
AU2019264603A1 (en) | Method and system for information extraction from document images using conversational interface and database querying | |
CN110795527B (en) | Candidate entity ordering method, training method and related device | |
CN109902665A (en) | Similar face retrieval method, apparatus and storage medium | |
CN109831572A (en) | Chat picture control method, device, computer equipment and storage medium | |
EP3765995B1 (en) | Systems and methods for inter-camera recognition of individuals and their properties | |
CN108764334A (en) | Facial image face value judgment method, device, computer equipment and storage medium | |
CN105117399B (en) | Image searching method and device | |
CN110033023A (en) | It is a kind of based on the image processing method and system of drawing this identification | |
CN113641797B (en) | Data processing method, apparatus, device, storage medium and computer program product | |
CN113537206B (en) | Push data detection method, push data detection device, computer equipment and storage medium | |
CN115114448B (en) | Intelligent multi-mode fusion power consumption inspection method, device, system, equipment and medium | |
CN112784011B (en) | Emotion problem processing method, device and medium based on CNN and LSTM | |
US20210326383A1 (en) | Search method and device, and storage medium | |
CN107977676A (en) | Text similarity computing method and device | |
CN110609958A (en) | Data pushing method and device, electronic equipment and storage medium | |
CN108446688A (en) | Facial image Sexual discriminating method, apparatus, computer equipment and storage medium | |
WO2017202086A1 (en) | Image screening method and device | |
CN113806564B (en) | Multi-mode informative text detection method and system | |
JP2022100358A (en) | Search method and device in search support system | |
CN111046203A (en) | Image retrieval method, image retrieval device, storage medium and electronic equipment | |
CN116630749A (en) | Industrial equipment fault detection method, device, equipment and storage medium | |
CN111191065B (en) | Homologous image determining method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190416 |