CN110765301B

CN110765301B - Picture processing method, device, equipment and storage medium

Info

Publication number: CN110765301B
Application number: CN201911075405.0A
Authority: CN
Inventors: 林立
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-11-06
Filing date: 2019-11-06
Publication date: 2022-02-25
Anticipated expiration: 2039-11-06
Also published as: CN110765301A

Abstract

The embodiment of the application provides a picture processing method, a picture processing device, picture processing equipment and a storage medium, wherein the method comprises the following steps: receiving an image search request, wherein the image search request comprises search words; determining the distance between the word vector of the search word and the word vector of the tag in the picture tag set; the label is used for marking words of picture contents; determining the label with the distance meeting the condition as a target label; and determining the picture with the label including the target label as a search result corresponding to the picture search request. By the method and the device, the picture which the user wants to search can be accurately found, and the search recall rate of the picture is improved.

Description

Picture processing method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the field of computers, and relates to but is not limited to a picture processing method, a picture processing device, picture processing equipment and a storage medium.

Background

With the continuous development of intelligent terminals, the current intelligent terminal has high photographing performance and large storage capacity, and more pictures are stored in the terminal by a user, so that the user can find the pictures and manage albums difficultly. In order to improve the efficiency of managing pictures for users, a method for quickly searching pictures is needed.

At present, a method for searching for a picture on a terminal generally analyzes a storage path of the picture and roughly analyzes a tag attribute of the picture to search for the picture, or performs text matching on a search word and a tag of the picture in a text matching manner to search for the picture.

However, whether the picture is searched according to the storage path of the picture or the picture is searched in a text matching manner, the searching capability is poor, the picture which the user wants to search cannot be searched accurately, and the searching recall rate of the picture is low.

Disclosure of Invention

The embodiment of the application provides a picture processing method, a picture processing device, picture processing equipment and a storage medium, which can improve the searching capability of pictures, accurately search pictures which users want to search and improve the searching recall rate of the pictures.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a picture processing method, which comprises the following steps:

receiving an image search request, wherein the image search request comprises search words;

determining the distance between the word vector of the search word and the word vector of the tag in the picture tag set; the label is used for marking words of picture contents;

determining the label with the distance meeting the condition as a target label;

and determining the picture with the label including the target label as a search result corresponding to the picture search request.

An embodiment of the present application provides an image processing apparatus, including:

the receiving module is used for receiving an image searching request, and the image searching request comprises searching words;

the first determining module is used for determining the distance between the word vector of the search word and the word vector of the label in the picture label set; the label is used for marking words of picture contents;

the second determining module is used for determining the label with the distance meeting the condition as a target label;

and the third determining module is used for determining the picture with the label including the target label as a search result corresponding to the picture search request.

a memory for storing executable instructions; and the processor is used for realizing the method when executing the executable instructions stored in the memory.

The embodiment of the application provides a storage medium, which stores executable instructions and is used for causing a processor to implement the method when executed.

The embodiment of the application has the following beneficial effects:

the method comprises the steps of determining a label with a distance meeting a condition as a target label according to the distance between a word vector of a search word and a word vector of a label in a picture label set, determining the picture with the target label in the label as a search result corresponding to a picture search request, and calculating the distance between the word vector of the search word and the word vector of the label to more accurately determine the label corresponding to the search word, so that the picture with the label can be accurately found as the search result, the searching capability of the picture is improved, the picture which a user wants to search is accurately found, and the search recall rate of the picture is improved.

Drawings

FIG. 1A is a block diagram of an alternative architecture of a picture processing system 10 according to an embodiment of the present disclosure;

FIG. 1B is a schematic diagram of an alternative architecture of a picture processing system 20 according to an embodiment of the present disclosure;

FIG. 2A is a schematic diagram of an alternative architecture of a picture processing system applied to a blockchain system according to an embodiment of the present disclosure;

FIG. 2B is an alternative block diagram according to an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of a server provided in an embodiment of the present application;

fig. 4 is an alternative flowchart of a picture processing method according to an embodiment of the present application;

FIG. 5A is an alternative interface diagram of an album manager APP provided by an embodiment of the present application;

FIG. 5B is an alternative interface diagram of an album manager APP provided by the embodiment of the present application;

FIG. 5C is an alternative interface diagram of an album manager APP provided by the embodiment of the present application;

FIG. 5D is an alternative interface diagram of an album manager APP provided by the embodiment of the present application;

fig. 6 is an alternative flowchart of a picture processing method according to an embodiment of the present application;

FIG. 7 is an interface diagram of a plurality of pictures searched according to the search term "food" in the embodiment of the present application;

fig. 8 is an alternative flowchart of a picture processing method according to an embodiment of the present application;

fig. 9 is an alternative flowchart of a picture processing method according to an embodiment of the present application;

fig. 10 is an alternative flowchart of a picture processing method according to an embodiment of the present application;

fig. 11 is an alternative flowchart of a picture processing method according to an embodiment of the present application;

FIG. 12A is a product interface diagram provided by an embodiment of the present application;

FIG. 12B is a search result presentation interface diagram provided by an embodiment of the application;

FIG. 13 is a schematic flow chart diagram illustrating picture tag analysis provided by an embodiment of the present application;

fig. 14 is a schematic flowchart of a picture query provided in an embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiments of the present application belong. The terminology used in the embodiments of the present application is for the purpose of describing the embodiments of the present application only and is not intended to be limiting of the present application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) The search accuracy is as follows: the number of photos in the category corresponding to the search term in the search results/the number of photos in the total search results corresponding to the search term, that is, how many results in the search results obtained by the search are correct, i.e., the proportion of correct results in the search results. The search accuracy is the proportion of the target result in the evaluation search result.

2) Search recall rate: the number of photos in the search result corresponding to the category corresponding to the search term/the number of photos in the category sample corresponding to the search term, that is, how many correct results are searched in the total searched objects, that is, the proportion of the correct results in the search results to the correct results in the total searched objects. The search recall is the proportion of the target category recalled from the area of interest.

3) Word vector (Word Embedding): a general term for a set of language modeling and feature learning techniques in word-embedded natural language processing, where words or phrases from a vocabulary are mapped to vectors of real numbers. Conceptually, it involves mathematical embedding from a one-dimensional space of each word to a continuous vector space with lower dimensions.

4) Convolutional Neural Networks (CNN), Convolutional Neural Networks: is a kind of feedforward neural network containing convolution calculation and having a deep structure, and is one of the representative algorithms of deep learning (deep learning). Convolutional Neural Networks have a characteristic learning capability and are capable of performing Shift-Invariant classification of input information according to their hierarchical structure, and are therefore also referred to as "Shift-Invariant Artificial Neural Networks (SIANN).

5) Blockchain (Blockchain): an encrypted, chained transactional memory structure formed of blocks (blocks).

6) Block chain Network (Blockchain Network): the new block is incorporated into the set of a series of nodes of the block chain in a consensus manner.

In order to better understand the picture processing method provided in the embodiment of the present application, a picture searching method in the related art is first described:

in the related art, when searching for a picture on a terminal, there are mainly the following ways:

the method comprises the steps of uploading a user picture to a server, analyzing the content of the picture, extracting a content tag of the picture, analyzing a search word input by the user when the user searches the picture, and matching a picture corresponding to the tag with the same text as the search word in a text matching mode.

And secondly, searching locally at the terminal, analyzing the storage path of the picture by the terminal, and roughly analyzing the label attribute of the picture of the user according to the storage path of the picture, so that rough searching can be performed.

However, the first method has at least the following problems: on one hand, the picture needs to be uploaded to the server and then analyzed by the server to determine the label of the picture, so that a large amount of time and flow needs to be consumed, and a user is sensitive to the use of the flow, so that the method is low in efficiency and affects the experience of the user. On the other hand, uploading pictures to a server also causes concern for personal privacy of a user, the user is very sensitive to the personal privacy, and uploading data for analyzing the pictures causes the user to be suspicious of finding corresponding application products from the pictures. On the other hand, when picture searching is performed, the search word input by the user is not completely consistent with the label of the picture, but the search word input by the user and the label of the picture have the same meaning, for example, when the search word input by the user is 'food' and the label of the picture is 'food', if matching is performed by adopting a text matching method, 'food' and 'food' cannot be completely matched, so that searching is performed only by adopting the text matching method, and the problems of inaccurate matching and low search recall rate caused by the fact that the search word input by the user cannot be matched with the text label in the label library exist.

With the second mode, there are at least the following problems: only the picture path is judged, and the picture path is not analyzed based on the picture content, so that the picture which the user wants to search cannot be accurately searched.

Based on at least one problem existing in the related art, an embodiment of the present application provides a picture processing method, where a distance between a word vector of a search word and a word vector of a tag in a picture tag set is determined, and the tag whose distance satisfies a condition is determined as a target tag, so that a picture in the tag including the target tag is determined as a search result corresponding to a picture search request.

In addition, it should be noted that the solution provided in the embodiment of the present application also relates to an artificial intelligence model building technology, for example, a picture classification model for classifying pictures is built, or a model for determining picture labels is built, which will be described below.

Here, it should be noted that artificial intelligence is a theory, method, technique and application system that simulates, extends and expands human intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

An exemplary application of the image processing device provided by the embodiment of the present application is described below, and the device provided by the embodiment of the present application can be implemented as various types of terminals such as a notebook computer, a tablet computer, a desktop computer, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, and a portable game device), and can also be implemented as a server.

In the following, an exemplary application will be explained when the device is implemented as a server. Referring to fig. 1A, fig. 1A is a schematic diagram of an alternative architecture of a picture processing system 10 according to an embodiment of the present disclosure. In order to support a picture processing application (for example, the picture processing application may be a picture presentation application, or may be a picture search application), the terminal (for example, the terminal 100-1 and the terminal 100-2 are shown) connects to the server 300 corresponding to the client of the picture processing application through the network 200, the terminal obtains a picture search request, and sends the picture search request to the server 300 through the network 200, so that the server 300 determines, in response to the picture search request, a distance between a word vector of a search word and a word vector of a tag in a picture tag set, and determines a tag whose distance satisfies a condition as a target tag; and finally, determining the picture with the label including the target label as a search result corresponding to the picture search request, and sending the search result to the terminal through the network 200. The network 200 may be a wide area network or a local area network, or a combination thereof. The terminal may display a search picture corresponding to the search result on the current interface (the current interface 110-1 and the current interface 110-2 are exemplarily shown), that is, display the searched picture on the current interface.

In the following, an exemplary application will be explained when the device is implemented as a terminal. Referring to fig. 1B, fig. 1B is a schematic diagram of another alternative architecture of the image processing system 20 according to the embodiment of the present disclosure. In order to support a picture processing application (for example, the picture processing application may be a picture presentation application, or a picture search application), a terminal (for example, the terminal 100-3 and the terminal 100-4 are shown) connects to a server 300 corresponding to a client of the picture processing application through a network 200, the terminal obtains a picture search request and sends a tag search request to the server 300 through the network 200, and the server 300 sends a word vector of a search word and a word vector of a tag in a picture tag set to the terminal through the network 200 in response to the tag search request. When the terminal receives the word vectors of the search words and the word vectors of the labels in the picture label set, determining the distance between the word vectors of the search words and the word vectors of the labels in the picture label set, and determining the labels with the distance meeting the conditions as target labels; and finally, determining the picture with the label including the target label as a search result corresponding to the picture search request. The terminal may display a search picture corresponding to the search result on the current interface (the current interface 110-3 and the current interface 110-4 are exemplarily shown).

The picture processing system 10 or the picture processing system 20 related To the embodiment of the present application may also be a distributed system 101 of a blockchain system, referring To fig. 2A, fig. 2A is an optional structural schematic diagram of the picture processing system 10 provided in the embodiment of the present application applied To the blockchain system, where the distributed system 101 may be a distributed node formed by a plurality of nodes 102 (any form of computing devices in an access network, such as a server and a user terminal) and a client 103, a Peer-To-Peer (P2P, Peer To Peer) network is formed between the nodes, and the P2P Protocol is an application layer Protocol operating on a Transmission Control Protocol (TCP). In a distributed system, any machine, such as a server or a terminal, can join to become a node, and the node comprises a hardware layer, a middle layer, an operating system layer and an application layer.

Referring to the functions of each node in the blockchain system shown in fig. 2A, the functions involved include:

1) routing, a basic function that a node has, is used to support communication between nodes.

Besides the routing function, the node may also have the following functions:

2) the application is used for being deployed in a block chain, realizing specific services according to actual service requirements, recording data related to the realization functions to form recording data, carrying a digital signature in the recording data to represent a source of task data, and sending the recording data to other nodes in the block chain system, so that the other nodes add the recording data to a temporary block when the source and integrity of the recording data are verified successfully.

For example, the services implemented by the application include:

2.1) wallet, for providing the function of transaction of electronic money, including initiating transaction (i.e. sending the transaction record of current transaction to other nodes in the blockchain system, after the other nodes are successfully verified, storing the record data of transaction in the temporary blocks of the blockchain as the response of confirming the transaction is valid; of course, the wallet also supports the querying of the electronic money remaining in the electronic money address.

And 2.2) sharing the account book, wherein the shared account book is used for providing functions of operations such as storage, query and modification of account data, record data of the operations on the account data are sent to other nodes in the block chain system, and after the other nodes verify the validity, the record data are stored in a temporary block as a response for acknowledging that the account data are valid, and confirmation can be sent to the node initiating the operations.

2.3) Intelligent contracts, computerized agreements, which can enforce the terms of a contract, implemented by codes deployed on a shared ledger for execution when certain conditions are met, for completing automated transactions according to actual business requirement codes, such as querying the logistics status of goods purchased by a buyer, transferring the buyer's electronic money to the merchant's address after the buyer signs for the goods; of course, smart contracts are not limited to executing contracts for trading, but may also execute contracts that process received information.

3) And the Block chain comprises a series of blocks (blocks) which are mutually connected according to the generated chronological order, new blocks cannot be removed once being added into the Block chain, and recorded data submitted by nodes in the Block chain system are recorded in the blocks.

4) Consensus (Consensus), a process in a blockchain network, is used to agree on transactions in a block among a plurality of nodes involved, the agreed block is to be appended to the end of the blockchain, and the mechanisms for achieving Consensus include Proof of workload (PoW, Proof of Work), Proof of rights and interests (PoS, Proof of equity (DPoS), Proof of granted of shares (DPoS), Proof of Elapsed Time (PoET, Proof of Elapsed Time), and so on.

Referring to fig. 2B, fig. 2B is an optional schematic diagram of a Block Structure (Block Structure) provided in this embodiment, each Block includes a hash value of a transaction record (hash value of the Block) stored in the Block and a hash value of a previous Block, and the blocks are connected by the hash values to form a Block chain. The block may include information such as a time stamp at the time of block generation. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using cryptography, and each data block contains related information for verifying the validity (anti-counterfeiting) of the information and generating a next block.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a server 300 provided in an embodiment of the present application, where when a device is implemented as a server, the server 300 shown in fig. 3 includes: at least one processor 310, memory 350, at least one network interface 320, and a user interface 330. The various components in server 300 are coupled together by a bus system 340. It will be appreciated that the bus system 340 is used to enable communications among the components connected. The bus system 340 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 340 in fig. 3.

The Processor 310 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 330 includes one or more output devices 331, including one or more speakers and/or one or more visual display screens, that enable presentation of media content. The user interface 330 also includes one or more input devices 332, including user interface components to facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 350 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 350 optionally includes one or more storage devices physically located remote from processor 310. The memory 350 may include either volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 350 described in embodiments herein is intended to comprise any suitable type of memory. In some embodiments, memory 350 is capable of storing data, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below, to support various operations.

An operating system 351 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 352 for communicating to other computing devices via one or more (wired or wireless) network interfaces 320, exemplary network interfaces 320 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

an input processing module 353 for detecting one or more user inputs or interactions from one of the one or more input devices 332 and translating the detected inputs or interactions.

In some embodiments, the apparatus provided by the embodiments of the present application may be implemented in software, and fig. 3 illustrates a picture processing apparatus 354 stored in the memory 350, where the picture processing apparatus 354 may be a picture processing apparatus in the server 300, and may be software in the form of programs and plug-ins, and the like, and includes the following software modules: the receiving module 3541, the first determining module 3542, the second determining module 3543, and the third determining module 3544 are logical and thus may be combined or further split in any combination depending on the functionality implemented. The functions of the respective modules will be explained below.

In other embodiments, the image processing device 354 may also be disposed on the terminal, may be an image processing device in the terminal, and may also be software in the form of programs and plug-ins.

In other embodiments, the apparatus provided in this embodiment of the present Application may be implemented in hardware, and for example, the apparatus provided in this embodiment of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the picture processing method provided in this embodiment of the present Application, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

The following will explain the picture processing method provided by the embodiment of the present application. Referring to fig. 4, fig. 4 is an alternative flowchart of a picture processing method provided in the embodiment of the present application, and will be described with reference to the steps shown in fig. 4.

Step S401, receiving an image search request, wherein the image search request comprises search terms.

Here, the server may receive a picture search request sent by the terminal, or the terminal may receive a picture search request input by a user, that is, the picture processing apparatus according to the embodiment of the present application may be implemented as the server, or may be implemented as the terminal. If not specifically stated, the embodiment of the present application takes an example in which the image processing apparatus is implemented as a server, and describes an image processing method.

The image search request is used for requesting to search images, the image search request comprises search words, the search words are words used for describing image categories or image attributes, and the search words can be text words, voice words, expression words and gesture words input by a user through a terminal.

In the embodiment of the Application, an image processing Application (APP) can be run on the terminal, and the image processing Application can realize the following functions: and acquiring any at least one processing function related to the picture, such as picture acquisition, picture storage, picture display and the like. For example, the picture handling application may be a photo album manager APP.

For example, when the search word is a text word input by a user through a terminal, an input box for inputting the search word by the user may be included on the current interface of the photo album manager APP. As shown in fig. 5A, which is an alternative interface diagram of the photo album manager APP provided in the embodiment of the present application, an input box 501 is provided on the current interface 50 of the photo album manager APP, and a user may input a keyword of a picture to be searched in the input box 501, where the keyword is the search word. For example, the user may input "food" in the input box 501, and the terminal adds the "food" as a search word to the picture search request and sends the picture search request to the server.

For another example, when the search word is a voice word input by the user through the terminal, the current interface of the photo album manager APP may include a voice input reminding identifier for reminding the user of voice input, and the photo album manager APP has a voice collecting and recognizing function. As shown in fig. 5B, the interface diagram is an optional interface diagram of the photo album manager APP provided in the embodiment of the present application, the current interface 50 of the photo album manager APP has a voice input reminding identifier 502, when the user clicks the voice input reminding identifier 502, or wakes up the voice collection and recognition function of the photo album manager APP by voice, or wakes up the voice collection and recognition function of the photo album manager APP by biometric information, the voice input reminding identifier 502 is in a reminding state, and the client of the photo album manager APP collects the voice information of the user through the voice collection unit (e.g., a microphone) of the terminal, performs voice recognition on the collected voice information, and takes a recognition result as the search term. The voice input reminding identifier 502 being in the reminding state may be that the color of the voice input reminding identifier 502 is changed, or the voice input reminding identifier 502 is displayed in an animation form, or the shape of the voice input reminding identifier 502 is changed, or the size of the voice input reminding identifier 502 is changed, or any one of the identifier forms is changed. For example, a user can wake up the voice collection and recognition function of the photo album housekeeper APP through a fingerprint, and then speak a 'food' to the terminal, so that the terminal collects the voice word 'food' of the user and adds the 'food' as a search word to a picture search request to send the picture search request to the server.

For another example, when the search word is an expression word input by the user through the terminal, an expression input reminding identifier for reminding the user of expression input may be included on the current interface of the photo album manager APP, and the photo album manager APP has a face collecting and recognizing function. As shown in fig. 5C, the interface diagram is an optional interface diagram of the photo album manager APP provided in the embodiment of the present application, where the current interface 50 of the photo album manager APP has an expression input reminding identifier 503, and when the user clicks the expression input reminding identifier 503, or wakes up the face collecting and recognizing function of the photo album manager APP by voice, or wakes up the face collecting and recognizing function of the photo album manager APP by biometric information, the expression input reminding identifier 503 is in a reminding state, and the client of the photo album manager APP collects the face information of the user through an image collecting unit (e.g., a camera) of the terminal, and performs expression recognition on the collected face information, and uses the recognition result as the search word. For example, a user can wake up the area collection and recognition function of an album manager APP through voice, and then the user laughs towards the terminal, so that the terminal can collect and recognize that the expression vocabulary of the user is "happy", and add the "happy" as a search word to a picture search request and send the picture search request to the server.

For another example, when the search word is a gesture word input by the user through the terminal, a gesture input reminding identifier for reminding the user of gesture input may be included on the current interface of the photo album manager APP, and the photo album manager APP has a gesture collecting and recognizing function. As shown in fig. 5D, the interface diagram is an optional interface diagram of the photo album manager APP provided in the embodiment of the present application, the current interface 50 of the photo album manager APP has a gesture input reminding identifier 504, when the user clicks the gesture input reminding identifier 504, or wakes up the gesture collection and recognition function of the photo album manager APP by voice, or wakes up the gesture collection and recognition function of the photo album manager APP by biometric information, the gesture input reminding identifier 504 is in a reminding state, and the client of the photo album manager APP collects gesture information of the user through a video collection unit (e.g., a camera) of the terminal, performs gesture recognition on the collected gesture information, and takes a recognition result as the search term. For example, a user can wake up a gesture collection and recognition function of an album manager APP through voice, and then the user is more open than a mute gesture towards the terminal, so that the terminal can collect and recognize that the gesture vocabulary of the user is open, and add the open as a search word to a picture search request and send the picture search request to a server.

Step S402, determining the distance between the word vector of the search word and the word vector of the label in the picture label set.

Here, after the server acquires the search word, a word vector of the search word is looked up in a word vector database, and tags in the picture tag set are acquired and a word vector of each tag is looked up in a word vector database. The tags in the picture tag set are words for labeling picture contents.

In the embodiment of the application, the distance between the word vector of the search word and the word vector of each label can be determined, the distance represents the similarity between the search word and the corresponding label, when the distance is larger, the similarity between the search word and the corresponding label is lower, and when the distance is smaller, the similarity between the search word and the corresponding label is higher.

In step S403, the tag whose distance satisfies the condition is determined as the target tag.

Here, the condition is a preset condition for filtering the tags, for example, the condition may be that a distance between a word vector of the search word and a word vector of the tag is within a certain range, or the number of the tags is the same. In the embodiment of the application, after the distance between the word vector of the search word and the word vector of the label is determined, the label with the distance meeting the condition is determined as the target label, and the target label is a word with higher similarity to the search word.

Step S404, determining the picture with the tag including the target tag as a search result corresponding to the picture search request.

Here, for pictures on the terminal, each picture has at least one tag, after the target tag is determined, matching is performed on all the pictures on the terminal through the target tag, and when the target tag is included in the tags of any one or more pictures, the one or more pictures are determined as the searched search result, that is, the one or more pictures are determined as the pictures matched with the search terms.

It should be noted that the search term may be the same as or different from the tag of the picture on the terminal, for example, the search term may be "food", and the tag of the picture may be "food" or "food", and when the target tag is determined, it may be determined that the tag meeting the condition includes "food" and "food", so when the picture matching is performed, the picture whose tag includes "food" or the picture whose tag includes "food" may be matched.

According to the image processing method provided by the embodiment of the application, the label with the distance meeting the condition is determined as the target label according to the distance between the word vector of the search word and the word vector of the label in the image label set, so that the image with the target label in the label is determined as the search result corresponding to the image search request, and therefore, the label corresponding to the search word can be more accurately determined by calculating the distance between the word vector of the search word and the word vector of the label, the image with the label is accurately found as the search result, the search capability of the image is further improved, the image which a user wants to search is accurately found, and the search recall rate of the image is improved.

Fig. 6 is an optional flowchart of a picture processing method according to an embodiment of the present application, and as shown in fig. 6, the method includes the following steps:

step S601, when the picture is acquired, the terminal identifies the picture and determines the content corresponding to the picture.

Here, the picture acquired by the terminal may be a picture taken by the terminal through its own image acquisition unit (e.g., a camera), or a picture downloaded by the user when browsing a web page or using another application program, or a picture backed up by the user in the cloud, or a picture browsed by the user and cached by the terminal.

In the embodiment of the application, when the terminal acquires the picture, the terminal performs feature extraction and feature identification on the picture through a picture classification model corresponding to the picture processing application, so that the content corresponding to the picture is determined. The image classification model may be obtained by training an image classification Software Development Kit (SDK) based on a deep convolutional neural network.

Step S602, the terminal determines the category of the picture according to the content corresponding to the picture.

Here, after the content corresponding to the picture is determined, the content is analyzed to determine the category of the picture.

For example, when it is determined that the content corresponding to the picture includes text content, determining that the category of the picture is a text category; when the content corresponding to the picture is determined to comprise the figure image, determining that the picture type is the figure type; and when the content corresponding to the picture is determined to comprise plants and streams, determining that the picture category is a landscape category and the like.

Step S603, the terminal determines the category of the picture as a label of the picture.

Here, after the category of the picture is determined, the category is determined as the label of the picture, for example, when the category of the picture is a landscape category, the label of the picture is a landscape; and when the picture category is the person category, the label of the picture is a person.

In step S604, the terminal receives a picture search request.

Here, the terminal receives a picture search request input by a user, wherein the picture search request comprises search words for searching.

And step S605, the terminal sends the picture searching request to a server.

Step S606, the server determines the distance between the word vector of the search word and the word vector of the tag in the picture tag set. The label is used for marking words of picture contents;

in step S607, the server determines the tag whose distance satisfies the condition as the target tag.

Step S608, the server determines the picture with the tag including the target tag as a search result corresponding to the picture search request.

Here, the server may acquire all pictures and a label of each picture on the terminal. And after the server determines the target label, matching the target label with the label of each picture, and determining the picture as a search result when the label of the picture contains the label same as the target label.

And step S609, the server sends the search result to the terminal.

In other embodiments, the server may perform background analysis on the search terms, the server returns a target tag matched with the search terms to the terminal, and the terminal performs image search locally according to the target tag, that is, the terminal determines the image in the tag including the target tag as a search result corresponding to the image search request, so that the user experience is not affected, and the image search efficiency is improved.

And step S610, the terminal displays the picture corresponding to the search result on the current interface.

And after searching the picture corresponding to the search word input by the user, displaying the picture on the current interface so as to be convenient for the user to view.

In some embodiments, a decision box for determining whether the search result is accurate may be further displayed on the current interface, as shown in fig. 7, where the multiple pictures are searched according to the search term "food" input by the user, the searched pictures are displayed on the current interface, and a decision box 701 is further displayed on each picture: "search accurate? "or display the judgment box 701 at any position of the current page (fig. 7 exemplarily shows the case that the judgment box 701 is displayed at the bottom of the current page, of course, in some embodiments, the judgment box 701 may also be displayed at the position corresponding to each displayed picture), if the user clicks" yes ", it indicates that the picture is the picture that the user wants to search, and if the user clicks" no ", it indicates that the picture is not the picture that the user wants to search. Therefore, according to the clicking operation of the user, the searching accuracy of the search can be determined.

According to the picture processing method provided by the embodiment of the application, the terminal analyzes the picture, the label corresponding to the picture is determined, the picture is not required to be sent to the server, and the label of the picture is determined through the server, so that the efficiency is higher, a user does not sense the picture, the user experience on a product is not influenced, meanwhile, the data of the user is not required to be uploaded, the picture analysis work is carried out on the terminal, only background analysis is carried out on the search terms, so that the user experience is not influenced, and the picture searching efficiency is improved.

Fig. 8 is an optional flowchart of a picture processing method according to an embodiment of the present application, and as shown in fig. 8, the method includes the following steps:

step S801, when the picture is acquired, the terminal identifies the picture and determines the content corresponding to the picture.

Step S802, the terminal determines the type of the picture according to the content corresponding to the picture.

In step S803, the terminal determines the category of the picture as the label of the picture.

It should be noted that steps S801 to S803 are the same as steps S601 to S603, and the embodiments of the present application are not repeated.

Step S804, the terminal receives an image search request, wherein the image search request comprises search words.

Step S805, the terminal sends a tag search request to the server according to the picture search request.

Here, the tag search request includes the search word, and the tag search request is used to request to search for a word vector of the search word and a word vector of a tag in the picture tag set.

Step S806, the server responds to the tag search request, and obtains the word vector of the search word and the word vector of the tag in the picture tag set.

Step S807, the server sends the word vector of the search word and the word vector of the tag in the picture tag set to the terminal through the network.

Here, the server searches a word vector of the search word in a word vector database according to the search word in the tag search request, obtains tags in the picture tag set, and searches a word vector of each tag in the word vector database. And sending the obtained word vectors of the search words and the word vectors of the labels in the picture label set to the terminal through the network.

Step S808, the terminal determines a distance between the word vector of the search word and the word vector of the tag in the picture tag set.

Here, the tag is a word for labeling the picture content. After the terminal obtains the word vectors of the search words and the word vectors of the labels in the picture label set, the terminal determines the distance between the word vectors of the search words and the word vectors of the labels in the picture label set. In this way, the terminal determines the distance between the search word and the tag, that is, the determination process of the target tag is also executed on the terminal side, so that the workload of the server can be reduced, and the service delay caused by the server responding to a large number of picture search requests can be avoided.

And step S809, the terminal determines the label with the distance meeting the condition as a target label.

Step S810, the terminal determines the picture with the label including the target label as a search result corresponding to the picture search request.

And step S811, the terminal displays the picture corresponding to the search result on the current interface.

According to the picture processing method provided by the embodiment of the application, the terminal analyzes the picture and determines the label corresponding to the picture without sending the picture to the server, the label of the picture is determined through the server, so that the efficiency is higher, the user cannot perceive the label, the experience of the user on the product is not influenced, and meanwhile, after the terminal acquires the word vector of the search word and the word vector of the label in the picture label set, the terminal determines the distance between the word vector of the search word and the word vector of the label in the picture label set. In this way, the terminal determines the distance between the search word and the tag, that is, the determination process of the target tag is also executed on the terminal side, so that the workload of the server can be reduced, and the service delay caused by the server responding to a large number of picture search requests can be avoided.

Based on fig. 4, fig. 9 is an optional flowchart of the picture processing method provided in the embodiment of the present application, in some embodiments, the picture search request includes search information, as shown in fig. 9, after step S401, the method may further include the following steps:

in step S901, after determining the picture search request, the search information is converted into text information.

Here, the picture search request includes the search information, which may be a word that the user has input by voice, that is, the search information is not a word that the user has input but a sentence made up of a plurality of words, for example, the user may input the search information "please search for the landscape of the guangzhou" by voice. And after receiving the search information, the terminal converts the search information into text information.

Step S902, performing word segmentation processing on the text information to obtain at least one word segmentation.

Here, the text information may be subjected to a word segmentation process using an open-source word segmentation tool to obtain at least one word, for example, a word segmentation process of "please search for the landscape of guangzhou", and the obtained words are "please", "search", "guangzhou", "of" and "landscape", respectively.

Step S903, extracting the participle from the at least one participle according to the part of speech of the participle to obtain the search word.

Here, the word segmentation extraction refers to extracting at least one meaningful word from the word segmentation, for example, a word with a part of speech being a noun or a verb may be extracted as the search word.

Referring to fig. 9, in some embodiments, the determining the tag whose distance satisfies the condition as the target tag in step S403 may be implemented in the following two ways:

the first method is as follows: in step S4031, the tag having the smallest distance is determined as the target tag.

The second method comprises the following steps: step S4032, determine the tag whose distance is smaller than the threshold as the target tag.

When the target tags are determined by the first mode, the number of the determined target tags is one, and when the target tags are determined by the second mode, the number of the determined target tags is one or more.

Based on fig. 6, fig. 10 is an optional flowchart illustration of the picture processing method provided in the embodiment of the present application, and as shown in fig. 10, before step S606, the method may further include the following steps:

step S1001, the server groups the labels in the picture label set to obtain at least one label group.

Here, the tags of the synonyms may be divided into a tag group, and in some embodiments, step S1001 may be implemented by:

step S1001a, determine a tag distance between word vectors of every two tags in the picture tag set.

Step S1001b, dividing at least two tags with the tag distance within a preset range into the same tag group to obtain the at least one tag group.

Here, the at least two tags whose tag distance is within the preset range are tags of similar words to each other, and the preset range may be set according to actual grouping needs, which is not limited in the embodiment of the present application.

Correspondingly, after step S607, the method may further include the steps of:

step S1002, acquiring a tag group where the target tag is located as a target tag group.

After the target tag is determined, determining a tag group in which the target tag is located as the target tag group, where the target tag group includes at least one tag.

Correspondingly, step S608 may be implemented by:

step S1003, determining the picture of the tag including the tag in the target tag group in the tag as the search result corresponding to the picture search request.

Here, since the target tag group includes at least one tag, a picture including any one tag in the target tag group may be used as a search result. For example, the tags of the target tag group include: food and food, and at the time of searching, determining a picture of the tag including any at least one of the food and the food as the search result.

In the embodiment of the application, the pictures are searched according to one or more labels of the similar words, so that a plurality of pictures similar to the meaning of the search words can be searched, the searching range is expanded, and the searching accuracy is improved.

Based on fig. 6, fig. 11 is an optional flowchart illustration of the picture processing method provided in the embodiment of the present application, and as shown in fig. 11, before step S606, the method may further include the following steps:

step S1101, the server groups the tags in the picture tag set to obtain at least one tag group.

In step S1102, similarity between tags in each tag group is determined.

Here, the tags in the same tag group may be tags of words with similar meanings to each other, or words with different meanings, so that the similarity between the tags may be determined according to the distance between word vectors of the words, the similarity between words with smaller distances is higher, and the similarity between words with larger distances is lower.

And S1103, sorting the tags in the corresponding tag groups according to the ascending or descending order of the similarity, so as to obtain a sorting result.

Correspondingly, after step S607, the method may further include the steps of:

step S1104, acquiring a tag group where the target tag is located as a target tag group.

And step S1105, according to the sorting result, sequentially determining the preset number of tags in the target tag group as the selected tags.

Here, when the tags in the target tag group are sorted in the order of increasing similarity, a preset number of tags may be sequentially determined as the selection tags from the tail of the sorting result according to the sorting result; when the tags in the target tag group are sorted in the order of decreasing similarity, a preset number of tags may be sequentially determined as selection tags from the head of the sorting result according to the sorting result. The preset number can be determined according to the condition of picture matching, and the preset number is used for representing the number of the determined labels which correspond to the search terms and are used for picture matching.

Correspondingly, step S608 may be implemented by: step S1106, determining the picture with the tag including the selected tag as a search result corresponding to the picture search request.

In the embodiment of the application, according to the sorting result of the sorting of the tags in the target tag group, a preset number of tags are sequentially selected as selection tags for subsequent picture matching search. Therefore, the preset number of tags which are closer to the meaning of the search word can be selected from the target tag group for picture matching search, and therefore more accurate pictures can be matched.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

The embodiment of the application provides a picture processing method, which is characterized in that text labels are printed on pictures based on main contents and objects of the pictures to identify the contents in the pictures, a user can input search words, and the pictures which are most similar to the contents of the search words are found out through a certain algorithm.

According to the method, the user picture is analyzed on the terminal, so that the efficiency is highest, the user does not perceive the picture, and the user experience of the product is not influenced. Moreover, the data of the user does not need to be uploaded, the picture analysis work is carried out on the terminal, only background analysis is carried out on the search words, matched tags are returned, and searching is carried out locally, so that the user experience is not influenced, and the picture searching efficiency is improved. Meanwhile, the similar meaning word analysis technology is used for matching the search words of the user, so that the recall rate of picture search can be greatly improved.

The application scenario of the embodiment of the application is that the entrance of the picture classified search is integrated in the photo album manager application program on the mobile phone of the user, and the program can display the related pictures only by inputting the content words of the pictures to be searched in the search box. Product interface as shown in fig. 12A, a search box 1201 is included on the current interface. When the user inputs a search word 1202 in the search box, for example, "food", a picture related to food is shown, as shown in fig. 12B, wherein the search result shows not only the picture contained on the local area but also the picture related to the cloud backed up by the user.

The image processing method according to the embodiment of the present application is mainly divided into two stages, where the first stage is a photo tag analysis stage on a terminal, fig. 13 is a flowchart illustrating the image tag analysis provided in the embodiment of the present application, the second stage is a process of analyzing a search term and querying an image, and fig. 14 is a flowchart illustrating the image query provided in the embodiment of the present application.

As shown in fig. 13, the method includes the steps of:

step S1301, the user installs the photo album manager APP.

Step S1302, the terminal obtains the user authorization, and scans and reads the picture of the user.

And step S1303, loading the picture into a high-quality picture classification model on the APP, and analyzing to obtain the label of the picture.

And step S1304, storing the photo label obtained by analysis in a local database.

In the embodiment of the present application, the purpose of the picture analysis is to analyze the classification and the tag of the picture, and one picture may include a plurality of tags. The label is a word describing the contents of the picture, such as landscape, a group photo, a baby, a boy, a girl, and the like. The technical capability of label classification can use the picture classification SDK developed by Tencent excellence graph. The picture classification SDK of the superior graph provides classification capability of 198 labels and is a picture classification model based on a deep convolutional neural network. Based on the ability of priority image classification, the photo album manager App first classifies the labels of the pictures on the user's mobile phone and stores the label of each picture.

As shown in fig. 14, the method includes the steps of:

in step S1401, the user inputs a search word in the picture search box.

In step S1402, the APP requests a search word to the server.

In step S1403, the server runs a word segmentation service to decompose the search word into individual words.

After a user inputs a search word on the App, the user requests an interface of the server to input the search word, and the server analyzes the search word. The server will first perform word segmentation on the search word, where the word segmentation is to divide the input of the user into a group of meaningful words, for example, the user input: "Guangzhou landscape" then the meaningful words are "Guangzhou" and "landscape". The ability to segment words may use an open source segmentation tool, such as a closing segmentation.

In step S1404, the word stock data is preloaded after the server data is initialized.

In step S1405, after the AI text model data is initialized, the most similar words of the preset number are found for each distribution diagram tag, and the words are sorted according to the similarity.

In step S1406, the search result is stored in the database.

Step S1407, searching and matching each word in the word library to obtain the corresponding optimal graph label.

Step S1408, issues the obtained tag result to the terminal.

And step S1409, the terminal matches the pictures belonging to the label on the terminal according to the obtained label, and displays the search result.

The server also records the serial numbers of 198 labels provided by the superior graph SDK, the analysis of the search terms by the algorithm is to find the label closest to the search terms, then the matched closest label is issued to the terminal App, and the terminal App can directly match the picture on the mobile phone according to the label result analyzed in the previous stage.

The key point is how quickly to find the closest tag to the user's search term. In the field of computer natural language processing, a word vector is generally used to represent a vocabulary, the distance between words is the similarity, and the smaller the distance between two words is, the closer the meanings of the two words are, thereby judging whether the two words are similar. For the search algorithm of the embodiment of the present application, one approach is to calculate the distance between the search word and the tagged vocabulary, and in order to calculate this, the word vector of the vocabulary must be found.

In the embodiment of the application, large-scale corpus training is needed to find word vectors of all Chinese vocabularies, and for this purpose, an open-source Chinese word vector database can be used, and the word database contains word vectors of 800 ten thousand common vocabularies. Then each time the user searches for a word, the word vector of the word can be found from the corpus, and then the distance between the word vector and the labeled word vector is calculated, so that the closest label can be obtained.

In some embodiments, an optimization may be performed because it is time consuming to find a word vector from a large corpus each time. Firstly, the closest 200 words are found out for each label to form a table, namely a table of 169 x 200, then the search words of the user only need to be matched in the table, and once the similar meaning words consistent with the search words are found, the similar labels are shown as which words in the table, so that the time of searching from the original corpus in a large scale is saved, and the efficiency is improved.

According to the picture processing method provided by the embodiment of the application, through tests, compared with other photo album products, the search recall rate reaches 33.8%, which is far higher than that of other photo albums by 5.2%.

In some embodiments, for the picture content identification technology, in addition to the tag identification function of the optimal picture, other picture content detection technologies with better content identification can be used, so that the picture content can be more closely and comprehensively understood, and thus, a greater gain can be brought to the search.

Continuing with the exemplary structure of the implementation of the image processing apparatus 354 as a software module provided in the embodiments of the present application, in some embodiments, as shown in fig. 3, the software module stored in the image processing apparatus 354 of the memory 350 may be an image processing apparatus in the server 300, including:

a receiving module 3541, configured to receive a picture search request, where the picture search request includes search terms;

a first determining module 3542 for determining a distance between a word vector of the search word and a word vector of a tag in a picture tag set; the label is used for marking words of picture contents;

a second determining module 3543, configured to determine, as a target tag, a tag whose distance satisfies a condition;

a third determining module 3544, configured to determine a picture in the tag that includes the target tag as a search result corresponding to the picture search request.

In some embodiments, the apparatus further comprises: the picture identification module is used for identifying the picture and determining the content corresponding to the picture when the picture is acquired before receiving a picture search request; the picture category determining module is used for determining the category of the picture according to the content corresponding to the picture; and the picture label determining module is used for determining the category of the picture as the label of the picture.

In some embodiments, the second determination module is further configured to: determining a tag having a minimum distance as the target tag; or, determining the label with the distance smaller than the threshold value as the target label.

In some embodiments, the picture search request includes search information therein, the apparatus further comprising: the conversion module is used for converting the search information into text information after receiving the picture search request; the word segmentation processing module is used for carrying out word segmentation processing on the text information to obtain at least one word segmentation; and the word segmentation extraction module is used for performing word segmentation extraction in the at least one word segmentation according to the part of speech of the word segmentation so as to obtain the search word.

In some embodiments, the apparatus further comprises: the tag grouping module is used for grouping tags in the picture tag set to obtain at least one tag group; the acquisition module is used for acquiring a tag group where the target tag is located as a target tag group after the target tag is determined; and the third determining module is further configured to determine, as a search result corresponding to the picture search request, a picture including the tags in the target tag group in the tags.

In some embodiments, the tag grouping module is further to: determining a label distance between word vectors of every two labels in the picture label set; dividing at least two tags with the tag distance within a preset range into the same tag group to obtain at least one tag group.

In some embodiments, the apparatus further comprises: the fourth determining module is used for determining the similarity between the labels in each label group; the sorting module is used for sorting the tags in the corresponding tag groups according to the increasing or decreasing sequence of the similarity to obtain a sorting result; correspondingly, the third determining module is further configured to: according to the sorting result, sequentially determining a preset number of tags in the target tag group as selection tags; and determining the picture with the tag including the selection tag as a search result corresponding to the picture search request.

It should be noted that the description of the apparatus in the embodiment of the present application is similar to the description of the method embodiment, and has similar beneficial effects to the method embodiment, and therefore, the description is not repeated. For technical details not disclosed in the embodiments of the apparatus, reference is made to the description of the embodiments of the method of the present application for understanding.

Embodiments of the present application provide a storage medium having stored therein executable instructions, which when executed by a processor, will cause the processor to perform a method provided by embodiments of the present application, for example, the method as illustrated in fig. 4.

In some embodiments, the storage medium may be a Ferroelectric Random Access Memory (FRAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), a charged Erasable Programmable Read Only Memory (EEPROM), a flash Memory, a magnetic surface Memory, an optical disc, or a Compact disc Read Only Memory (CD-ROM), etc.; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. An image processing method, comprising:

determining the label with the distance meeting the condition as a target label; taking the tag group where the target tag is as a target tag group; the label group comprises labels of similar words;

and matching in the pictures with at least one tag through the target tag group, and determining one or more pictures as search results when the tags of the one or more pictures contain any tag in the target tag group.

2. The method of claim 1, further comprising:

before receiving a picture search request, when the picture is acquired, identifying the picture and determining the content corresponding to the picture;

determining the category of the picture according to the content corresponding to the picture;

and determining the category of the picture as the label of the picture.

3. The method according to claim 1, wherein the determining the tag whose distance satisfies the condition as the target tag comprises:

determining a tag having a minimum distance as the target tag; alternatively, the first and second electrodes may be,

and determining the label with the distance smaller than the threshold value as the target label.

4. The method according to any one of claims 1 to 3, wherein the picture search request includes search information, the method further comprising:

after receiving a picture search request, converting the search information into text information;

performing word segmentation processing on the text information to obtain at least one word segmentation;

and performing word segmentation extraction in the at least one word segmentation according to the part of speech of the word segmentation to obtain the search word.

5. The method according to any one of claims 1 to 3, further comprising:

grouping the tags in the picture tag set to obtain at least one tag group;

correspondingly, after the target label is determined, acquiring a label group where the target label is located as a target label group;

and determining the picture of the tag, which comprises the tag in the target tag group, as a search result corresponding to the picture search request.

6. The method of claim 5, wherein the grouping tags in the picture tag set to obtain at least one tag group comprises:

determining a label distance between word vectors of every two labels in the picture label set;

dividing at least two tags with the tag distance within a preset range into the same tag group to obtain at least one tag group.

7. The method of claim 5, further comprising:

determining similarity between tags in each tag group;

sorting the labels in the corresponding label group according to the increasing or decreasing sequence of the similarity to obtain a sorting result;

correspondingly, the determining the picture of the tag, including the tag in the target tag group, as the search result corresponding to the picture search request includes:

according to the sorting result, sequentially determining a preset number of tags in the target tag group as selection tags;

and determining the picture with the tag including the selection tag as a search result corresponding to the picture search request.

8. A picture processing apparatus, comprising:

the second determining module is used for determining the label with the distance meeting the condition as a target label; taking the tag group where the target tag is as a target tag group; the label group comprises labels of similar words;

and the third determining module is used for matching in the pictures with at least one label through the target label group, and determining one or more pictures as the search result when any label in the target label group is contained in the labels of the one or more pictures.

9. A picture processing device, comprising:

a memory for storing executable instructions; a processor for implementing the method of any one of claims 1 to 7 when executing executable instructions stored in the memory.

10. A storage medium having stored thereon executable instructions for causing a processor to perform the method of any one of claims 1 to 7 when executed.