CN109635135A

CN109635135A - Image index generation method, device, terminal and storage medium

Info

Publication number: CN109635135A
Application number: CN201811457455.0A
Authority: CN
Inventors: 侯允; 刘耀勇; 陈岩
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2019-04-16
Also published as: WO2020108234A1

Abstract

This application discloses a kind of image index generation method, device, terminal and storage mediums.This method comprises: obtaining the first image；Image recognition is carried out to the first image, obtains the corresponding recognition result of at least two objects in the first image；Descriptive statement is generated by language description model；By descriptive statement storage corresponding with the first image, the index of the first image is obtained.In the embodiment of the present application, by identifying the corresponding recognition result of each object included in image, and it is generated by language description model including above-mentioned recognition result, and the descriptive statement for describing the first image, foregoing description sentence is determined as to the index of the image, it is subsequent when user needs to search for the image, word included in the index can be inputted, or word similar in the meaning with word included in the index, terminal can accurately search first image according to the word that user inputs, improve the search efficiency that image is searched in photograph album.

Description

Image index generation method, device, terminal and storage medium

Technical field

The invention relates to search technique field, in particular to a kind of image index generation method, device, terminal and Storage medium.

Background technique

Currently, being usually mounted with photograph album application program in terminal, which obtains commonly used in storage shooting Image, the image etc. saved from network.

When the image saved in photograph album is more, if desired user quickly finds oneself institute from the image of above-mentioned preservation The image needed, then need terminal to establish image index in photograph album, so that only needing when subsequent user needs to search for a certain image Index corresponding to the image is inputted, terminal can quickly find the image according to the index, and show the image.

Summary of the invention

The embodiment of the present application provides a kind of image index generation method, device, terminal and storage medium.Technical solution is such as Under:

On the one hand, the embodiment of the present application provides a kind of image index generation method, which comprises

Obtain the first image；

Image recognition is carried out to the first image, at least two objects obtained in the first image are corresponding Recognition result；

Descriptive statement is generated by language description model, the descriptive statement includes that at least two object respectively corresponds Recognition result；The descriptive statement is for describing the first image；

By descriptive statement storage corresponding with the first image, the index of the first image is obtained.

On the other hand, the embodiment of the present application provides a kind of image index generating means, and described device includes:

Image collection module, for obtaining the first image；

Picture recognition module obtains in the first image at least for carrying out image recognition to the first image The corresponding recognition result of two objects；

Sentence generation module, for generating descriptive statement by language description model, the descriptive statement include it is described extremely Few corresponding recognition result of two objects；The descriptive statement is for describing the first image；

Generation module is indexed, for obtaining first figure for descriptive statement storage corresponding with the first image The index of picture.

Another aspect, the embodiment of the present application provide a kind of terminal, and the terminal includes processor and memory, the storage Device is stored with computer program, and the computer program is loaded by the processor and executed to realize that above-mentioned image index generates Method.

Another aspect, the embodiment of the present application provide a kind of computer readable storage medium, the computer-readable storage medium Computer program is stored in matter, the computer program is loaded by processor and executed to realize above-mentioned image index generation side Method.

Technical solution provided by the embodiments of the present application can be brought the following benefits:

By identifying the corresponding recognition result of each object included in image, and pass through language description model It generates including above-mentioned recognition result, and the descriptive statement for describing the first image, foregoing description sentence is determined as the figure The index of picture, it is subsequent when user needs to search for the image, can input in the index included word, or with the index In included word meaning similar in word, terminal can accurately search first figure according to the word that user inputs Picture improves the search efficiency that image is searched in photograph album.

Detailed description of the invention

Fig. 1 is the flow chart for the image index generation method that the application one embodiment provides；

Fig. 2 is the flow chart for the image index generation method that the application one embodiment provides；

Fig. 3 is the flow chart for the image index generation method that the application one embodiment provides；

Fig. 4 is the flow chart for the image index generation method that the application one embodiment provides；

Fig. 5 is the block diagram for the image index generating means that the application one embodiment provides；

Fig. 6 is the block diagram for the terminal that the application one embodiment provides.

Specific embodiment

To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application embodiment party Formula is described in further detail.

The embodiment of the present application provides a kind of image index generation method, device, terminal and storage medium, by identifying The included corresponding recognition result of each object in image, and generated by language description model including above-mentioned identification As a result, and for describing the descriptive statement of the first image, foregoing description sentence is determined as to the index of the image, it is subsequent to work as user It when needing to search for the image, can input in the index included word, or contain with word included in the index Word similar in justice, terminal can accurately search first image according to the word that user inputs, improve and search in photograph album The search efficiency of rope image.

Technical solution provided by the embodiments of the present application, the executing subject of each step are terminal.Optionally, it is equipped in terminal Photograph album application program.Terminal can be mobile phone, tablet computer, personal computer etc..

Referring to FIG. 1, it illustrates the flow charts of the image index generation method shown in the application one embodiment.The party Method may include steps of:

Step 101, the first image is obtained.

In one possible implementation, the first image can be the camera acquired image in terminal.It is optional Ground is provided with camera in terminal and is equipped with shooting class application program, and when shooting class application program operation, terminal is received When acting on the trigger signal of the shooting control on current shooting interface, camera acquired image is obtained as the first figure Picture.

In alternatively possible implementation, the first image is the image in network.Optionally, when display circle of terminal An image is shown in face, when terminal receives the preservation instruction corresponding to the image, according to preservation instruction from network The image is obtained as the first image.

In addition, the embodiment of the present application is not construed as limiting the acquisition modes of the first image and opportunity.

Step 102, the first image is identified, obtains the corresponding identification of at least two objects in the first image As a result.

It may include one or more objects, such as personage, animal, building, landscape etc. in first image.In the application In embodiment, terminal determines classification belonging to each object difference as follows: by image recognition model to the first figure As carrying out image recognition, classification belonging at least two objects difference in the first image is obtained.

Image recognition model is trained to obtain using multiple sample images to deep learning network, multiple sample graphs The object to be identified in each sample image as in is corresponding with tag along sort.In some embodiments of the present application, image is known Other model includes: an input layer, at least one convolutional layer (such as including the first convolutional layer, the second convolutional layer and third convolution Layer totally 3 convolutional layer), at least one full articulamentum (for example include that the first full articulamentum and the second full articulamentum connect for 2 totally entirely Layer) and an output layer.The input data of input layer is the first image, and the output of output layer is the result is that first image is wrapped Classification belonging at least two objects difference included.Image recognition processes are as follows: the first image is input to image recognition model Input layer, the feature of first image is extracted by the convolutional layer of image recognition model, then by image recognition model entirely connecting It connects layer features described above is combined and is abstracted, obtains being suitable for the data that output layer is classified, finally be exported by output layer The corresponding recognition result of at least two objects included by first image.

In the embodiment of the present application, the specific structure of the convolutional layer to image recognition model and full articulamentum is not construed as limiting, Image recognition model shown in above-described embodiment is only exemplary and explanatory, is not used to limit the disclosure.In general, The number of plies of convolutional neural networks is more, and effect is better but the calculating time also can be longer, in practical applications, in combination with to identification essence The requirement of degree and efficiency, designs the convolutional neural networks of the appropriate number of plies.

Sample image refers to previously selected picture for being trained to deep learning network.Samples pictures have Scene tag, the scene tag of samples pictures is usually by manually determining, for describing the corresponding scene of samples pictures, article, people Object etc..

Optionally, alexNet network, VGG-16 network, GoogleNet network, Deep can be used in deep learning network Residual Learning (study of depth residual error) network etc., the embodiment of the present application is not construed as limiting this.In addition, training is deep Spending used algorithm when learning network obtains image recognition model can be BP (Back-Propagation, backpropagation calculation Method), faster RCNN (Regions with Convolutional Neural Network, region convolutional neural networks) calculate Method etc., the embodiment of the present application is not construed as limiting this.

Below by train deep learning network when obtaining image recognition model used algorithm be BP algorithm for, to figure As the training process of identification model is explained: being randomly provided each layer in deep learning network of parameter first；Secondly by sample This image inputs deep learning network, obtains recognition result；Then recognition result is compared with tag along sort, is identified As a result the error between tag along sort；Finally based on each layer in above-mentioned error transfer factor deep learning network of parameter, repeat Above-mentioned steps obtain image recognition model until the error between recognition result and tag along sort is less than default value at this time.

Step 103, descriptive statement is generated by language description model.

Descriptive statement is for describing the first image.It include the corresponding identification knot of at least two objects in descriptive statement Fruit.It optionally, further include other words in descriptive statement, which can be used for describing following at least one: at least two Movement that positional relationship, certain an object between a object are carrying out, certain an object state in which etc..Illustratively, First image is identified, obtaining the object in the first image includes dog and meadow, and posture of the dog on meadow is It runs, then the corresponding descriptive statement of the first image is " dog run on turf dynamic ".

In some embodiments of the present application, language description model includes: an input layer, at least one convolutional layer (ratio Such as include the first convolutional layer, the second convolutional layer and third convolutional layer totally 3 convolutional layers), at least one full articulamentum (such as including First full articulamentum and the second full articulamentum totally 2 full articulamentums) and an output layer.The input data of input layer is first Tag along sort belonging to object in image and the first image, the output of output layer the result is that this first image is corresponding retouches Predicate sentence.The generating process of descriptive statement is as follows: tag along sort belonging to the object in the first image and the first image is defeated Enter to the input layer of language description model, the feature of above-mentioned input content is extracted by the convolutional layer of language description model, then Features described above is combined and is abstracted by the full articulamentum of language description model, obtains being suitable for the number that output layer is classified According to finally exporting the corresponding descriptive statement of the first image by output layer.

In the embodiment of the present application, the specific structure of the convolutional layer to language description model and full articulamentum is not construed as limiting, Language description model shown in above-described embodiment is only exemplary and explanatory, is not used to limit the disclosure.In general, The number of plies of convolutional neural networks is more, and effect is better but the calculating time also can be longer, in practical applications, in combination with to identification essence The requirement of degree and efficiency, designs the convolutional neural networks of the appropriate number of plies.

Optionally, step 103 may include following sub-step:

Optionally, step 103 may be implemented are as follows:

Recognition result is converted into the first term vector by step 103a；

Step 103b handles the first term vector by language description model, obtains descriptive statement.

In the embodiment of the present application, recognition result is converted into corresponding term vector by term vector model by terminal, and will Above-mentioned term vector input language descriptive model exports descriptive statement by language description model.Above-mentioned term vector model can be Word2vec model.

Optionally, step 103b is also implemented as:

Step 103b1 obtains the position letter of the first image when the first image is the image that terminal is acquired by camera Breath.

Location information is converted into the second term vector by step 103b2；

Step 103b3 is handled the first term vector and the second term vector by language description model, obtains description language Sentence.

Location information is used to indicate geographical location when the first image of shooting.The location information can be by determining in terminal Hyte part obtains.The mode that location information is converted into term vector can not be repeated herein with reference to step 103a.In the application In embodiment, the corresponding descriptive statement of the first image is generated by the geographical location in conjunction with the first image of shooting, it can be more It is fully described by first image, subsequent user can search for first image by multiple and different keywords, and promotion is searched The convenience of rope.

Illustratively, the first image is identified, obtaining the object in the first image includes dog and meadow, and the dog Posture on meadow is to run, in addition, the geographical location for shooting first image is the park XX, then first image is corresponding Descriptive statement is " dog is dynamic in running on turf for the park xx ".

Step 104, by descriptive statement storage corresponding with the first image, the index of the first image is obtained.

Descriptive statement is carried out corresponding storage with the first image by terminal, obtains the index of the first image.If subsequent user needs Search first image, then need to only input at least one word that the descriptive statement includes, or in the descriptive statement Similarity between word is greater than the word of preset threshold, then terminal can find first figure according to the word that user inputs Picture, and first image is showed into user.

In addition, the embodiment of the present application is not construed as limiting the path of storage descriptive statement and the first image, it can be by terminal It presets, setting can also be customized by the user.

In conclusion technical solution provided by the embodiments of the present application, by identifying each object included in image Corresponding recognition result, and generated by language description model including above-mentioned recognition result, and for describing the first figure Foregoing description sentence is determined as the index of the image by the descriptive statement of picture, subsequent when user needs to search for the image, can be with Included word is inputted in the index, or word similar in the meaning with word included in the index, terminal can be with First image is accurately searched according to the word that user inputs, improves the search efficiency for searching for image in photograph album.

Referring to FIG. 2, the flow chart of the image index generation method provided it illustrates the application one embodiment.The party Method may include steps of:

Step 201, the first image is obtained.

Step 202, image recognition is carried out to the first image, at least two objects obtained in the first image are corresponding Recognition result.

Step 203, descriptive statement is generated by language description model.

Step 204, inquiry message is shown.

Inquiry message is used to ask whether the index that confirmation generates the first image.Illustratively, inquiry message is the " image Corresponding descriptive statement is " seeing concert in Bird's Nest ", if confirmation? ".

In the embodiment of the present application, user can be with preview by language description model descriptive statement generated, and determines Whether the descriptive statement of above-mentioned generation is determined as to the index of the first image.

Step 205, it when receiving the confirmation instruction corresponding to inquiry message, deposits descriptive statement is corresponding with the first image Storage, obtains the index of the first image.

If user determines the index that the descriptive statement of the generation is determined as to the image, which can be assigned Confirmation instruction.Confirmation instruction corresponding to inquiry message is used to indicate confirmation and the descriptive statement of the generation is determined as the image Index.Optionally, the side of inquiry message shows confirmation control, when terminal receive act on the confirmation control triggering letter Number when, receive corresponding to inquiry message confirmation instruction.

Step 206, when not receiving confirmation instruction, input frame is shown.

Input frame is for inputting the corresponding descriptive statement of the first image.Optionally, when terminal does not receive within a preset time To the trigger signal for acting on the confirmation control, then terminal does not receive confirmation instruction.Optionally, the side of inquiry message is also aobvious It is shown with and denies control, when terminal receives the trigger signal for denying control corresponding to this, then terminal does not receive confirmation instruction.

Step 207, the sentence inputted in input frame is received.

In the embodiment of the present application, when descriptive statement of the user to generation is dissatisfied, it can voluntarily input and describe the mesh The descriptive statement of logo image.

Step 208, by the storage corresponding with the first image of the sentence of input, the index of the first image is obtained.

In conclusion technical solution provided by the embodiments of the present application, by being unsatisfied with terminal description generated in user In the case where sentence, the corresponding descriptive statement of the image is voluntarily inputted by user, so that subsequent user can be according to itself institute The descriptive statement of input scans for the image.

After the index for generating the first image, user can search for the first image according to the index in photograph album.Below The search process is explained.In an alternative embodiment based on Fig. 1 or embodiment illustrated in fig. 2 offer, in step 104 Later, alternatively, after step 208, which further includes following steps:

Step 301, search box is shown.

Search box is used to input search key for user, matches so that terminal can be searched with the search key Image.In one possible implementation, the search box is shown in the main interface of photograph album application program.Another kind can In the implementation of energy, the main interface of photograph album application program shows search control, when user triggers the search control, terminal The trigger signal corresponding to the search control is received, and search box is shown according to the trigger signal.

Step 302, the first keyword inputted in search box is received.

First keyword is inputted by user, can be " the Forbidden City ", " cat " " rose " etc., the embodiment of the present application is to this It is not construed as limiting.

Step 302, the second image to match with the first keyword is searched in photograph album.

The quantity of second image can be one, be also possible to multiple.The corresponding descriptive statement of second image is for describing Second image.It include first object keyword in the corresponding descriptive statement of second image.First object keyword can be The included corresponding recognition result of object, is also possible in descriptive statement other words in addition to recognition result in two images Language, the embodiment of the present application are not construed as limiting this.By the above-mentioned means, user can search for same figure by different keywords Picture reduces the difficulty of search image.

Similarity between first object keyword and the first keyword meets preset condition.Above-mentioned preset condition can be Similarity between first object keyword and the first keyword is greater than preset threshold, and above-mentioned preset threshold can be according to practical need Setting is asked, the embodiment of the present application is not construed as limiting this.

In the embodiment of the present application, terminal first calculates word included by each descriptive statement that terminal is stored and The word that similarity between first keyword meets preset condition is determined as later by the similarity between one keyword One target keywords, finally using the corresponding image of the descriptive statement comprising the first object keyword as with the first keyword phase Matched second image.

In addition, the embodiment of the present application is calculated in the following way between word included by the first keyword and descriptive statement Similarity: the first key table is shown as primary vector by term vector model by terminal, by word included by descriptive statement It is expressed as secondary vector, later by calculating the COS distance between primary vector and secondary vector, to calculate the first keyword With the similarity between word included by descriptive statement.

Step 304, the second image to match with the first keyword is shown.

Terminal shows second image in result of page searching.When the quantity of the second image is multiple, terminal can be with According to the size of the similarity between first object keyword and the first keyword, to be ranked up to the second image.Optionally, Similarity between first object keyword and the first keyword is bigger, then includes the descriptive statement pair of the first object keyword The second image answered putting in order in result of page searching is more forward；Between first object keyword and the first keyword Similarity is smaller, then row of corresponding second image of descriptive statement comprising the first object keyword in result of page searching Column sequence is more rearward.

In conclusion technical solution provided by the embodiments of the present application, by according to foregoing embodiments image rope generated Attract carry out picture search, user need to only input word included in the index, or with word included in the index Meaning similar in word, terminal can according to user input word accurately search the image, improve and searched in photograph album The search efficiency of rope image.

When user inputs the first keyword, terminal is more according to the quantity of the second image of first keyword search When, user, which needs to filter out in the second more image, at this time oneself it is expected the image that searches, and search efficiency is still more Lowly.

Referring to FIG. 4, the flow chart of the image index generation method provided it illustrates the application one embodiment.The figure As index generation method can be used for solving the second image arrived according to the first keyword search it is more when, search efficiency is low to ask Topic.This method comprises the following steps:

Step 401, search box is shown.

Step 402, the first keyword inputted in search box is received.

Step 403, the second image to match with the first keyword is searched in photograph album.

Step 404, when the quantity of the second image is greater than preset quantity, display reminding information.

Preset quantity can be set according to actual needs, and the embodiment of the present application is not construed as limiting this.Illustratively, present count Amount is 10.Prompt information is for prompting the second keyword of input.Second keyword is different from the first keyword.

In the embodiment of the present application, for terminal when finding the second image to match with the first keyword, first detection should Whether the quantity of the second image is greater than preset quantity.It is directly aobvious if the quantity of second image is less than or equal to preset quantity Show second image.If the quantity of the second image is greater than preset quantity, user is prompted to input more keywords, so that eventually End is matched in the continuous screening of above-mentioned the second image relay to match with the first keyword for the first keyword, the second keyword The second image.

Step 405, the second keyword is obtained.

Second keyword is also inputted by user, different from the first keyword.

Step 406, search and matched second image of the first keyword, the second keyword in photograph album.

It include first object keyword and the second target keywords in the corresponding descriptive statement of second image.Second target is closed Similarity between key word and the second keyword meets the second preset condition.Above-mentioned second preset condition can be the second target pass Similarity between key word and the second keyword is greater than preset threshold, and above-mentioned preset threshold can be set according to actual needs, this Application embodiment is not construed as limiting this.

In the embodiment of the present application, terminal first calculates word included by each descriptive statement that terminal is stored and Included by each descriptive statement that similarity and terminal between one keyword are stored between word and the second keyword Similarity；The word that the similarity between the first keyword meets the first preset condition is determined as first object later to close The word that similarity between second keyword meets the second preset condition is determined as the second target keywords by key word；Most Afterwards using the corresponding image of the descriptive statement comprising the first object keyword and the second target keywords as with the first keyword, Matched second image of second keyword.In addition, the similarity between word included by the second keyword and descriptive statement Calculation can refer to step 303, not repeat herein.

Step 407, display and matched second image of the first keyword, the second keyword.

In the embodiment of the present application, the second image herein refers to and the first keyword, the second keyword matched Two images.

In conclusion technical solution provided by the embodiments of the present application, by when search result is excessive, prompting user's input More keywords promote picture search so that terminal can carry out picture search according to the keyword inputted respectively twice Accuracy.

Mentioned in Fig. 1 embodiment, language description model be it is trained in advance, at least two words to be encoded into The model of whole sentence.The training process of language description model is explained below.

Step 501, training sample set is obtained.

Training sample set includes multiple training sample images, in every training sample image in multiple training sample images Object marking have tag along sort, every training sample image is corresponding with desired descriptive statement.Object in training sample is marked The tag along sort of note can be marked manually, can also be obtained by image recognition model.It is expected that descriptive statement can be artificial mark Note.

Step 502, it for every training sample image, is handled by initial language description model, output is practical Descriptive statement.

Initial language description model can be deep learning network, for example, alexNet network, VGG-16 network, GoogleNet network, Deep Residual Learning (study of depth residual error) network.Initial language description model it is each Parameter can be to be set at random, is also possible to rule of thumb to be set by related technical personnel.In the embodiment of the present application, Every training sample image is inputted to initial language description model, by the initial practical description language of language description model output Sentence.

Step 503, the error between expectation descriptive statement and practical descriptive statement is calculated.

Optionally, the difference between desired descriptive statement and practical descriptive statement is determined as error by terminal.

After terminal calculates the error between desired descriptive statement and practical descriptive statement, detect whether the error is greater than Preset threshold.If error is greater than preset threshold, the parameter of initial language description model is adjusted, and from for every trained sample The step of this image is handled by initial language description model, exports practical descriptive statement restarts to execute, namely Repeat step 502 and 503.When error is less than or equal to preset threshold, the language description model that training is completed is generated.

Following is the application Installation practice, can be used for executing the application embodiment of the method.It is real for the application device Undisclosed details in example is applied, the application embodiment of the method is please referred to.

Referring to FIG. 5, the block diagram of the image index generating means provided it illustrates the application one embodiment.The device Have the function of realizing the above method, the function can also be executed corresponding software realization by hardware realization by hardware. The device includes:

Image collection module 601, for obtaining the first image.

Picture recognition module 602 obtains in the first image extremely for carrying out image recognition to the first image Few corresponding recognition result of two objects.

Sentence generation module 603, for generating descriptive statement by language description model, the descriptive statement includes described The corresponding recognition result of at least two objects；The descriptive statement is for describing the first image.

Generation module 604 is indexed, for obtaining described first for descriptive statement storage corresponding with the first image The index of image.

In the alternative embodiment provided based on embodiment illustrated in fig. 5, the sentence generates model 603, is used for:

The recognition result is converted into the first term vector；

First term vector is handled by the language description model, obtains the descriptive statement.

Optionally, the sentence generation module 603, is used for:

When the first image is the image that terminal is acquired by camera, the position letter of the first image is obtained Breath, the location information are used to indicate geographical location when shooting the first image；

The location information is converted into the second term vector；

First term vector and the second term vector are handled by the language description model, obtain the description Sentence.

In the alternative embodiment provided based on embodiment illustrated in fig. 5, described device further include: the first display module (not shown).

First display module, for showing inquiry message, the inquiry message is for asking whether that confirmation generates described the The index of one image.

The index generation module 604, for executing institute when receiving the confirmation instruction corresponding to the inquiry message The step of stating descriptive statement storage corresponding with the first image, obtaining the index of the first image.

Optionally, described device further include: the second display module and the first receiving module (not shown).

Second display module, for showing input frame, the input frame is for defeated when not receiving confirmation instruction Enter the corresponding descriptive statement of the first image.

First receiving module, for receiving the sentence in input frame input.

The index generation module 604 is also used to obtain the storage corresponding with the first image of the sentence of the input The index of the first image.

In the alternative embodiment provided based on embodiment illustrated in fig. 5, described device further include: third shows mould Block, the second receiving module, search module, the 4th display module (not shown).

Third display module, for showing search box.

Second receiving module, for receiving the first keyword in the input of described search frame.

Search module, for searching for the second image to match with first keyword, second figure in photograph album As including first object keyword in corresponding descriptive statement, between the first object keyword and first keyword Similarity meets the first preset condition.

4th display module, for showing second image.

Optionally, described device further include: the 5th display module (not shown).

5th display module, for when the quantity of second image is greater than preset quantity, display reminding information to be described Prompt information is for prompting the second keyword of input.

Second receiving module, for obtaining second keyword.

Search module, is also used to search in the photograph album and matches with first keyword, second keyword The second image, include the first object keyword and the second target critical in the corresponding descriptive statement of second image Word, the similarity between second target keywords and second keyword meet the second preset condition.

In the alternative embodiment provided based on embodiment illustrated in fig. 5, described image identification module 602, for leading to It crosses image recognition model and image recognition is carried out to the first image, obtain at least two objects difference in the first image Corresponding recognition result；Wherein, described image identification model is to be trained using multiple sample images to deep learning network It obtains, the object to be identified in each sample image in the multiple sample image is corresponding with tag along sort.

In the alternative embodiment provided based on embodiment illustrated in fig. 5, described device further include: training module (figure In be not shown).

Training module is used for:

Training sample set is obtained, the training sample set includes multiple training sample images, multiple described training sample figures The object marking in every training sample image as in has tag along sort, and every training sample image is corresponding with expectation and retouches Predicate sentence；

It for every training sample image, is handled by initial language description model, exports practical description Sentence；

Calculate the error between the expectation descriptive statement and the practical descriptive statement；

When the error is greater than preset threshold, then the parameter of the initial language description model is adjusted, and from described It for every training sample image, is handled by initial language description model, exports the step of practical descriptive statement Suddenly restart to execute；When the error is less than or equal to the preset threshold, the language description model is generated.

It should be noted that device provided by the above embodiment is when realizing its function, only with above-mentioned each functional module It divides and carries out for example, can according to need in practical application and be completed by different functional modules above-mentioned function distribution, The internal structure of equipment is divided into different functional modules, to complete all or part of the functions described above.In addition, Apparatus and method embodiment provided by the above embodiment belongs to same design, and specific implementation process is detailed in embodiment of the method, this In repeat no more.

With reference to Fig. 6, it illustrates the structural block diagrams for the terminal that one exemplary embodiment of the application provides.In the application Terminal may include one or more such as lower component: processor 610 and memory 620.

Processor 610 may include one or more processing core.Processor 610 utilizes various interfaces and connection Various pieces in entire terminal, by running or executing the instruction being stored in memory 620, program, code set or instruction Collection, and the data being stored in memory 620 are called, execute the various functions and processing data of terminal.Optionally, processor 610 can use Digital Signal Processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA) at least one of example, in hardware realize.Processor 610 can integrating central processor (Central Processing Unit, CPU) and one or more of modem etc. combination.Wherein, the main processing operation system of CPU System and application program etc.；Modem is for handling wireless communication.It is understood that above-mentioned modem can not also It is integrated into processor 610, is realized separately through chip piece.

Optionally, above-mentioned each embodiment of the method mentions under realizing when processor 610 executes the program instruction in memory 620 The image index generation method of confession.

Memory 620 may include random access memory (Random Access Memory, RAM), also may include read-only Memory (Read-Only Memory).Optionally, which includes non-transient computer-readable medium (non- transitory computer-readable storage medium).Memory 620 can be used for store instruction, program, generation Code, code set or instruction set.Memory 620 may include storing program area and storage data area, wherein storing program area can store Instruction for realizing operating system, the instruction at least one function, for realizing the finger of above-mentioned each embodiment of the method Enable etc.；Storage data area, which can be stored, uses created data etc. according to terminal.

The structure of above-mentioned terminal is only illustrative, and in actual implementation, terminal may include more or fewer components, Such as: display screen etc., the present embodiment is not construed as limiting this.

It will be understood by those skilled in the art that the restriction of structure shown in Fig. 6 not structure paired terminal 600, can wrap It includes than illustrating more or fewer components, perhaps combine certain components or is arranged using different components.

One exemplary embodiment of the application also provides a kind of computer readable storage medium, is stored thereon with computer journey Sequence, the computer program realize the localization method that above-mentioned each embodiment of the method provides when being loaded and executed by processor.

One exemplary embodiment of the application additionally provides a kind of computer program product comprising instruction, when it is in computer When upper operation, so that computer executes localization method described in above-mentioned each embodiment.

It should be understood that referenced herein " multiple " refer to two or more."and/or", description association The incidence relation of object indicates may exist three kinds of relationships, for example, A and/or B, can indicate: individualism A exists simultaneously A And B, individualism B these three situations.Character "/" typicallys represent the relationship that forward-backward correlation object is a kind of "or".

Above-mentioned the embodiment of the present application serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.

The foregoing is merely the exemplary embodiments of the application, all in spirit herein not to limit the application Within principle, any modification, equivalent replacement, improvement and so on be should be included within the scope of protection of this application.

Claims

1. a kind of image index generation method, which is characterized in that the described method includes:

Obtain the first image；

Image recognition is carried out to the first image, obtains the corresponding identification of at least two objects in the first image As a result；

Descriptive statement is generated by language description model, the descriptive statement includes the corresponding knowledge of at least two object Other result；The descriptive statement is for describing the first image；

2. the method according to claim 1, wherein described generate descriptive statement, packet by language description model It includes:

The recognition result is converted into the first term vector；

3. according to right want 2 described in method, which is characterized in that it is described by the language description model to first word to Amount is handled, and the descriptive statement is obtained, comprising:

When the first image is the image that terminal is acquired by camera, the location information of the first image, institute are obtained State geographical location when location information is used to indicate shooting the first image；

The location information is converted into the second term vector；

First term vector and second term vector are handled by the language description model, obtain the description Sentence.

4. the method according to claim 1, wherein described that the descriptive statement is corresponding with the first image Storage, before obtaining the index of the first image, further includes:

Show that inquiry message, the inquiry message are used to ask whether the index that confirmation generates the first image；

When receiving the confirmation instruction corresponding to the inquiry message, execute described by the descriptive statement and first figure As corresponding the step of storing, obtaining the index of the first image.

5. according to the method described in claim 4, it is characterized in that, after the display inquiry message, further includes:

When not receiving confirmation instruction, show that input frame, the input frame are corresponding for inputting the first image Descriptive statement；

Receive the sentence inputted in the input frame；

By the storage corresponding with the first image of the sentence of the input, the index of the first image is obtained.

6. method according to any one of claims 1 to 5, which is characterized in that described by the descriptive statement and described the The corresponding storage of one image, after obtaining the index of the first image, further includes:

Show search box；

Receive the first keyword inputted in described search frame；

The second image to match with first keyword is searched in photograph album, in the corresponding descriptive statement of second image Including first object keyword, it is default that the similarity between the first object keyword and first keyword meets first Condition；

Show second image.

7. according to the method described in claim 6, it is characterized in that, before display second image, further includes:

When the quantity of second image is greater than preset quantity, display reminding information, the prompt information is for prompting input Second keyword；

Obtain second keyword；

Search and first keyword, matched second image of second keyword in the photograph album, described second It include the first object keyword and the second target keywords, second target keywords in the corresponding descriptive statement of image Similarity between second keyword meets the second preset condition.

8. method according to any one of claims 1 to 5, which is characterized in that described to carry out image to the first image Identification, obtains the corresponding recognition result of at least two objects in the first image, comprising:

Image recognition is carried out to the first image by image recognition model, obtains at least two pairs in the first image As corresponding recognition result；Wherein, described image identification model be using multiple sample images to deep learning network into Row training obtains, and the object in each sample image in the multiple sample image is corresponding with tag along sort.

9. method according to any one of claims 1 to 5, which is characterized in that described to be retouched by the generation of language description model Before predicate sentence, further includes:

Training sample set is obtained, the training sample set includes multiple training sample images, in multiple described training sample images Every training sample image in object marking have a tag along sort, every training sample image is corresponding with expectation and describes language Sentence；

It for every training sample image, is handled by initial language description model, exports practical descriptive statement；

When the error is greater than preset threshold, then adjust the parameter of the initial language description model, and from it is described for The step of every training sample image is handled by initial language description model, exports practical descriptive statement weight Newly start to execute；When the error is less than or equal to the preset threshold, the language description model is generated.

10. a kind of image index generating means, which is characterized in that described device includes:

Image collection module, for obtaining the first image；

Picture recognition module obtains at least two in the first image for carrying out image recognition to the first image The corresponding recognition result of object；

Sentence generation module, for generating descriptive statement by language description model, the descriptive statement includes described at least two The corresponding recognition result of a object；The descriptive statement is for describing the first image；

Generation module is indexed, for obtaining the first image for descriptive statement storage corresponding with the first image Index.

11. a kind of terminal, which is characterized in that the terminal includes processor and memory, and the memory is stored with computer Program, the computer program are loaded by the processor and are executed to realize image as described in any one of claim 1 to 9 Index generation method.

12. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium Program, the computer program are loaded by processor and are executed to realize image index as described in any one of claim 1 to 9 Generation method.