CN110163121A - Image processing method, device, computer equipment and storage medium - Google Patents
Image processing method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN110163121A CN110163121A CN201910360905.2A CN201910360905A CN110163121A CN 110163121 A CN110163121 A CN 110163121A CN 201910360905 A CN201910360905 A CN 201910360905A CN 110163121 A CN110163121 A CN 110163121A
- Authority
- CN
- China
- Prior art keywords
- image
- term vector
- multiple images
- semantic feature
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/416—Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of image processing method, device, computer equipment and storage mediums, belong to network technique field.This method comprises: obtaining multiple contextual informations of multiple images;By multiple image and multiple contextual information input language model, by the language model and multiple contextual information, feature extraction is carried out to multiple image, obtains the semantic feature of multiple image;Semantic feature based on multiple image carries out image procossing.The present invention is extracted the semantic feature of image by language model, can complete the image processing tasks under some pairs of higher scenes of image, semantic demand, improve the accuracy of image procossing.
Description
Technical field
The present invention relates to network technique field, in particular to a kind of image processing method, device, computer equipment and storage
Medium.
Background technique
At present in social interaction, compared to for text, it is to be expressed that image can more visually express user
Semanteme, also more vivid and interesting.And with the development of computer equipment, user can be helped to understand figure by computer equipment
Picture, that is to say that computer equipment can carry out feature extraction to image, execute image message time so as to assist improving user
The efficiency of the operations such as multiple or image exploitation.
In traditional feature extracting method, computer equipment be usually pass through VGG (visual geometry group,
Visual geometric group) superficial feature of the network extraction image in pixel level, VGG network is the (example under some specific scenes
Such as image classification, image segmentation, image recognition) obtained model is trained, therefore VGG network can only be extracted in feature extraction
The superficial feature of interest to the specific scene, such as VGG segmentation network extraction is each cut zone in pixel level
Boundary information (such as mammary gland divide), what VGG sorter network extracted is class label (such as cat belonging to image in pixel level
Dog classification etc.).
In above process, VGG network can only extract image in the superficial feature of pixel level, can not more fully understand figure
As semantic, under some pairs of higher scenes of image, semantic demand, the accuracy of image procossing is not high.
Summary of the invention
The embodiment of the invention provides a kind of image processing method, device, computer equipment and storage mediums, are able to solve
Computer equipment can not more fully understand image, semantic, under some pairs of higher scenes of image, semantic demand, image procossing
The not high problem of accuracy.The technical solution is as follows:
On the one hand, a kind of image processing method is provided, this method comprises:
Multiple contextual informations of multiple images are obtained, the multiple contextual information is locating for the image in text scene
At least one of in text information before or after position;
By described multiple images and the multiple contextual information input language model, by the language model and described
Multiple contextual informations carry out feature extraction to described multiple images, obtain the semantic feature of described multiple images;
Semantic feature based on described multiple images carries out image procossing.
In a kind of possible embodiment, obtaining multiple second initial term vectors includes:
Insertion processing is carried out to the multiple contextual information, the term vector of pre-training is retrieved as at the beginning of the multiple second
Beginning term vector.
On the one hand, a kind of image processing apparatus is provided, which includes:
Module is obtained, for obtaining multiple contextual informations of multiple images, the multiple contextual information is in text
At least one of in text information in scene before or after image present position;
Characteristic extracting module, for passing through described multiple images and the multiple contextual information input language model
The language model and the multiple contextual information carry out feature extraction to described multiple images, obtain described multiple images
Semantic feature;
Image processing module carries out image procossing for the semantic feature based on described multiple images.
In a kind of possible embodiment, the characteristic extracting module includes:
Acquiring unit, for obtaining the multiple first initial term vectors, the multiple first initial term vector corresponds to described
Multiple images;
The acquiring unit, is also used to obtain the multiple second initial term vectors, and the multiple second initial term vector is corresponding
In the multiple contextual information;
Repetitive exercise unit, for being based on the initial term vector of the multiple first initial word vector sum the multiple second,
Training is iterated to the language model;
Unit is obtained, for obtaining institute when loss function value is less than targets threshold or the number of iterations reaches targeted number
State the semantic feature of multiple images.
In a kind of possible embodiment, the acquiring unit is used for:
Described multiple images input pixel characteristic is extracted into model, is extracted by the pixel characteristic more described in model extraction
The pixel characteristic of a image;
According to the pixel characteristic of described multiple images, clustering processing is carried out to described multiple images, obtains the multiple figure
The class label of picture;
To distribute identical random term vector as the first initial term vector with the image of the same category label.
In a kind of possible embodiment, the acquiring unit is used for:
When in described multiple images include the first image when, text is extracted from the first image, to the text into
Row insertion processing, obtains the term vector of at least one word in the text, by the flat of the term vector of at least one word
Equal vector, is retrieved as the first initial term vector corresponding with the first image, and the first image is the image for carrying text;
When in described multiple images including the second image, random term vector is retrieved as corresponding with second image
First initial term vector, second image are the image for not carrying text.
In a kind of possible embodiment, the acquiring unit is used for:
Insertion processing is carried out to the multiple contextual information, the term vector of pre-training is retrieved as at the beginning of the multiple second
Beginning term vector.
In a kind of possible embodiment, the repetitive exercise unit is used for:
During being iterated training to the language model, keep the multiple second initial term vector fixed not
Become, the numerical value of the multiple first initial term vector is adjusted, multiple first term vectors are obtained;
When loss function value is less than targets threshold or the number of iterations reaches targeted number, the language of described multiple images is obtained
Adopted feature includes:
When loss function value is less than targets threshold or the number of iterations reaches targeted number, by the multiple first term vector
It is determined as the semantic feature of described multiple images.
In a kind of possible embodiment, described image processing module includes:
Storage processing unit, for the image identification or classification logotype according to described multiple images, by described multiple images
Semantic feature store into database, when receive carry target image image processing commands when, from the database
The semantic feature for obtaining the target image, the semantic feature based on the target image carry out image procossing.
In a kind of possible embodiment, the storage processing unit is used for:
When also carrying the image identification of the target image in described image process instruction, by the database with institute
Semantic feature corresponding to image identification is stated, the semantic feature of the target image is determined as;Or,
Clustering processing is carried out to the target image, obtains the classification logotype of the target image, it will be in the database
With semantic feature corresponding to the classification logotype, it is determined as the semantic feature of the target image.
On the one hand, provide a kind of computer equipment, the computer equipment include one or more processors and one or
Multiple memories are stored at least one instruction in the one or more memory, and at least one instruction is by this or more
A processor is loaded and is executed to realize the operation as performed by the image processing method of above-mentioned any possible implementation.
On the one hand, a kind of computer readable storage medium is provided, at least one instruction is stored in the storage medium, it should
At least one instruction is loaded by processor and is executed to realize the image processing method institute such as above-mentioned any possible implementation
The operation of execution.
Technical solution bring beneficial effect provided in an embodiment of the present invention includes at least:
By obtaining multiple contextual informations of multiple images, since multiple contextual information is to scheme in text scene
As in the text information before or after present position at least one of, therefore by multiple image and multiple contextual information
After input language model, it can be believed by the language model and multiple contextual information in view of multiple context
Under the action of breath, feature extraction is carried out to multiple image, so as to obtain the semantic feature of multiple image, due to these
Semantic feature is that the vector of the whole non-visualization feature expressed by semantic level of multiple image indicates, so that server
It better understood when image, semantic, so that one can be completed when the semantic feature based on multiple image carries out image procossing
A bit to the image processing tasks under the higher scene of image, semantic demand, the accuracy of image procossing is improved.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is a kind of implementation environment schematic diagram of image processing method provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of image processing method provided in an embodiment of the present invention;
Fig. 3 is a kind of schematic diagram of cluster result provided in an embodiment of the present invention;
Fig. 4 is a kind of principle schematic diagram of language model training process provided in an embodiment of the present invention;
Fig. 5 is a kind of principle schematic diagram of language model training process provided in an embodiment of the present invention;
Fig. 6 is a kind of principle schematic diagram of database purchase provided in an embodiment of the present invention;
Fig. 7 is a kind of structural schematic diagram of image processing apparatus provided in an embodiment of the present invention;
Fig. 8 is the structural schematic diagram of computer equipment provided in an embodiment of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is described in further detail.
Fig. 1 is a kind of implementation environment schematic diagram of image processing method provided in an embodiment of the present invention.Referring to Fig. 1, at this
It may include at least one terminal 101 and server 102 in implementation environment, be detailed below:
Wherein, which can be any terminal for capableing of sending information or image, and user appoints logging in
It, can be to 102 sending information of server or image after one terminal.
Wherein, which can be any computer equipment for being capable of providing image processing services, work as server
102 when receiving the image of any terminal at least one terminal 101, the semantic feature of the available image, and is based on
The semantic feature of the image carries out image procossing.
The embodiment of the present invention can be applied in some man-machine interaction scenarios, since user is gradually inclined in social interaction
In substituting text information with facial expression image, the semanteme to be expressed of user is visually expressed, increases social interest, because
This, when user carries out interacting message by the intelligent answers product such as terminal and some chat robots, intelligent assistant, intelligent customer service
When, it similarly has after conveying semantic demand, user to send expression picture by terminal using facial expression image, server 102
The semantic feature of the facial expression image can be extracted, and carries out corresponding image procossing, such as server 102 can be from database
Recommend one and the highest response image of the facial expression image matching degree out, and then send the response image to terminal, compared to biography
The intelligent answer product that can only reply text information of system, the image processing method of this semantic feature based on facial expression image,
Intelligent answer product can be made more to personalize and intelligent, solve that text information is excessively blunt, is easy that the language fails to express the meaning, no
The shortcomings that enough vivid and interestings.
Fig. 2 is a kind of flow chart of image processing method provided in an embodiment of the present invention.Referring to fig. 2, this method is applied to
In computer equipment, the embodiment is described in detail so that computer equipment is server as an example below:
201, multiple images input pixel characteristic is extracted model by server, and extracting model extraction by the pixel characteristic should
The pixel characteristic of multiple images.
Wherein, multiple image can be the image of any content, for example, may include facial expression image in multiple image
Or non-facial expression image, the facial expression image can be the image for expressing idea in human-computer interaction process, can take in the facial expression image
With text, further, portrait expression, animal expression or cartoon expression etc. can also be divided into inside facial expression image.
Wherein, which extracts the pixel characteristic that model is used to extract image, which refers to image in picture
The superficial feature of plain level that is to say that the visualization such as texture, color, shape or boundary that image is visually presented is special
Sign.
In some embodiments, which, which extracts model, can be CNN (convolutional neural
Networks, convolutional neural networks) network, it can be TCN (temporal convolutional networks) network, also
It can be VGG (visual geometry group, visual geometric group) network etc..
It is extracted for model is CNN and is illustrated by the pixel characteristic, may include the input of serial connection in CNN
Layer, at least one convolutional layer and output layer, the input layer is for being decoded processing to input picture, at least one convolutional layer
For carrying out process of convolution to by decoded image, which is used to carry out the image after process of convolution non-thread
Property processing and normalized.In some embodiments, at least one pond layer, the pond be may be incorporated between each convolutional layer
Change the characteristic pattern that layer is used to compress convolutional layer output, to reduce the size of this feature figure.
In some embodiments, it can be connected using residual error between at least one convolutional layer, residual error connection that is to say:
It, can be by any feature figure that the convolutional layer between the convolutional layer is exported and current convolutional layer institute for each convolutional layer
Residual block (residual block) is obtained after the corresponding characteristic pattern superposition of output, using the residual block as the next convolution of input
One characteristic pattern of layer, so as to solve to generate the degenerate problem of network, for example, one can be carried out at interval of a convolutional layer
Secondary residual error connection can also carry out residual error connection etc. at interval of two convolutional layers, and the embodiment of the present invention does not connect residual error
The convolution layer number at middle interval is specifically limited.
In the above case said, above-mentioned steps 201 that is to say, multiple images input pixel characteristic is extracted model by server,
The pixel characteristic extracts the form that model is convolutional neural networks, passes through at least one convolutional layer pair in the convolutional neural networks
Multiple image carries out process of convolution, exports the pixel characteristic of multiple image.Certainly, extracting model when pixel characteristic is the time
When convolutional network, similar process can also be executed, only each convolutional layer in time convolutional network is to multiple figure
What it is as progress is cause and effect convolution (causal convolutions) processing, is not described herein.
In some embodiments, which, which extracts model, can also be an a kind of VGG network (special CNN net
Network), it include multiple convolutional layers and multiple pond layers in VGG network, each convolutional layer uses the small-sized convolution having a size of 3*3
Core uses the maximum Chi Huahe having a size of 2*2 in each pond layer, and carries out a residual errors at interval of two convolutional layers and connect
It connects, so that the size reduction half of each Chi Huahou image, depth doubles, to simplify with the intensification of VGG network
The structure of CNN network.For example, the VGG network can be VGG-16 etc., the embodiment of the present invention not to the level of the VGG network into
Row is specific to be limited.
202, server carries out clustering processing to multiple image, it is more to obtain this according to the pixel characteristic of multiple image
The class label of a image.
In above process, server can be based on KNN (k-nearest neighbor, k nearest neighbor) algorithm, more by this
Multiple similarities corresponding to the pixel characteristic of a image carry out clustering processing to multiple image, obtain multiple image
Class label.
In some embodiments, server can construct a KNN model, the training image based on a training image collection
Concentrating includes multiple training images, and each training image includes a pixel characteristic and a class label, needs to illustrate
It is that, in the case where some training precisions are of less demanding, the quantity of class label can be concentrated to be set as small the training image
In first object quantity, cluster is quickly completed when allowing to image clustering, and it is more demanding in some training precisions
In the case of, which can be concentrated the quantity of class label to be set greater than or be equal to first object quantity, so that right
Classification belonging to image can be more subtly divided when image clustering.Wherein, which can be any
Numerical value more than or equal to 1.
In the above case said, the pixel characteristic of multiple image can be sequentially input KNN model by server, by this
KNN model obtains multiple similarities between the pixel characteristic of multiple image and the pixel characteristic of multiple training image, root
The class label of multiple image is obtained according to multiple similarity.
Specifically, it is illustrated by taking any one image in multiple image as an example, which is inputted KNN mould by server
After type, which is obtained based on the KNN model and the training image concentrates multiple similarities between multiple training images, according to
The sequence of similarity from big to small, is ranked up multiple training image, and similarity size is located at preceding second destination number
In the class label of a training image, the highest class label of frequency of occurrence is determined as the class label of the image.Wherein, this
Two destination numbers any can be greater than or equal to 1 numerical value.
In some embodiments, server can by Euclidean between the pixel characteristic of any image and any training image away from
From inverse, the similarity being retrieved as between the pixel characteristic of the image and the pixel characteristic of the training image, due to passing through Europe
Formula distance can measure absolute distance of the different pixel characteristics in feature space, therefore the inverse of Euclidean distance is preferably retouched
State the similarity between the pixel characteristic of image and the pixel characteristic of training image.
In some embodiments, server can also will be graceful between the pixel characteristic of any image and any training image
The inverse of Hatton's distance, the similarity being retrieved as between the pixel characteristic of the image and the pixel characteristic of the training image, due to
Absolute wheelbase of the different characteristics of image in feature space can be measured by manhatton distance, and (axis here refers to coordinate
Axis), thus manhatton distance inverse also can preferably describe image pixel characteristic and training image pixel characteristic it
Between similarity.
For example, the pixel characteristic of image P is inputted KNN model, concentrating in the training image of KNN model includes 20 training
Image, server determine the inverse of Euclidean distance between the pixel characteristic of image P and the pixel characteristic of this 20 training images
For corresponding 20 similarities, 20 training images are ranked up according to the sequence of similarity from big to small, obtain sequence
Positioned at preceding 5 training images, due to having 4 to belong to class label A in preceding 5 training images that sort, only 1 belongs to class
Distinguishing label B, therefore the highest class label A of frequency of occurrence is determined as to the class label of image P.
In some embodiments, server can also be after the class label for obtaining an image, by the image
Pixel characteristic and the image class label be added to KNN model training image concentrate, so as to multiple
During image carries out clustering processing, constantly expands the training image collection of KNN model, promote the accurate of the KNN Model tying
Degree.
Fig. 3 is a kind of schematic diagram of cluster result provided in an embodiment of the present invention, referring to Fig. 3, it is assumed that the training of KNN model
In image set include class label " staring ", " no language " and " looking up at ", after 8 images are inputted the KNN model, it is available such as
Cluster result shown in Fig. 3.
203, server is that the image with the same category label distributes identical random term vector as the first initial word
Vector.
Wherein, which can be any term vector generated at random, which is to any
Image carries out the obtained term vector of initialization process.
In above-mentioned steps 201-203, server obtains multiple first initial term vectors, wherein multiple first is initial
Term vector corresponds to multiple image, and further, server can distribute identical the also to the identical image of class label
One initial term vector enables the identical image of class label to share the first initial term vector, thus in successive iterations training
During have shared the adjustment of parameter so that the identical image of class label eventually semantic feature having the same.
In above process, server first to image carry out clustering processing, then based on class label to image carry out word to
The initialization process of amount, so that causing the context of these images to be believed if some images only occur in a small amount of dialogue
When breath is not abundant enough, the accuracy of the semantic feature of these images can be promoted by the method for clustering processing, but also reduce
The training duration of language model, reduces the training calculation amount of language model when subsequent trained.
It should be noted that extracted semantic feature refers to that image is whole in semantic level institute table in the embodiment of the present invention
The vector of the non-visualization feature reached indicates, and does not mean that traditional image visualization feature expressed by pixel level
Vector indicates.
In some embodiments, server can not also execute above-mentioned steps 201-203, that is to say not to multiple images into
Row clustering processing, but the initialization process of term vector is directly carried out to each image, above-mentioned steps 201-203 can be at this time
It is replaced with following methods: when in multiple images including the first image, text is extracted from first image, to the text
Insertion processing is carried out, the term vector of at least one word in the text is obtained, by being averaged for the term vector of at least one word
Vector is retrieved as the first initial term vector corresponding with first image, wherein and first image is the image for carrying text,
Optionally, when in multiple image including the second image, random term vector can also be retrieved as and second figure by server
As corresponding first initial term vector, wherein second image is the image for not carrying text, hence for the first image and
Two images carry out targetedly initialization process respectively, optimize the processing logic of term vector initialization procedure, shorten training
Duration.
In above process, when extracting text from multiple image, OCR (optical character can be used
Recognition, optical character identification) technology identifies that the text in multiple image, server do not have in above process
Clustering processing is carried out, but for each image, the semantic feature of the image can be trained during repetitive exercise,
So that the semantic feature of image is more targeted.
For example, Fig. 4 is a kind of principle schematic diagram of language model training process provided in an embodiment of the present invention, referring to figure
4, it is assumed that text 401, image 402, image 403 and text 404 are one section of dialogue between user A and user B respectively, to language
It says and this section of dialogue is considered as one section of long this paper for carrying image, for text 401 and text 404, base when model training
Method in following step 204 obtains the term vector of pre-training as the second initial term vector, for image 402 respectively
Speech, since image 402 is the first image for carrying text, server extracts the text in image 402 based on OCR technique and " looks up at
Big shot " obtains the term vector of multiple words in the text, the average vector of the term vector of multiple word is determined as image
The initial term vector of the first of 402, since image 403 is the second image for not carrying text, takes at this time for image 403
Any random term vector can be directly retrieved as the first initial term vector of image 403 by business device, thus word-based insertion (word
Embedding method), server in this segment length's text all texts and image be completed initialization, and then hold
Row following step 205.
204, server obtains multiple contextual informations of multiple images, carries out at insertion to multiple contextual information
Reason, is retrieved as the multiple second initial term vectors for the term vector of pre-training.
Wherein, multiple contextual information is in the text information in text scene before or after image present position
At least one of, optionally, text scene can be session context, and multiple contextual information is image in a session at this time
At least one of in session before present position or session later, optionally, text scene is also possible to carrying figure
The long text scene of picture, multiple contextual information is in the text in long text before image or text later at this time
At least one of, the embodiment of the present invention does not limit the form of text scene specifically.
In above-mentioned steps 204, server obtains multiple second initial term vectors, wherein multiple second initial term vector
Corresponding to multiple contextual information, in some embodiments, server is carrying out insertion processing to multiple contextual information
Process, that is to say: server for each contextual information in multiple contextual informations, by the contextual information institute it is right
Only hot (one-hot) coding answered is multiplied with the weight matrix of pre-training, so that the one-hot coding is mapped to term vector space,
The term vector of a pre-training is obtained, the term vector of the pre-training is determined as to the second initial term vector of the contextual information,
To which duration required for training can be shortened by the term vector for obtaining pre-training.
205, server is based on the initial term vector of multiple first initial word vector sum multiple second, carries out to language model
Repetitive exercise.
In above process, which can be any natural language processing
(natural language processing, NLP) model, for example, the language model can be n-gram model (also referred to as N member
Model), it can be NNLM (neural network language model, neural network language model), can be ELMo
(embeddings from language models, using the language model of insertion processing), can also be BERT
(bidirectional encoder representation from transformers is turned over using what alternating binary coding indicated
Translate model) etc., the embodiment of the present invention does not limit the structure of the language model specifically.
During above-mentioned repetitive exercise, server by multiple second initial word of multiple first initial word vector sum to
Input language model is measured, loss function value is obtained according to the prediction result of language model, when the loss function value is greater than target threshold
When value, the language model can be joined based on back-propagation algorithm (backpropagation algorithm, BP algorithm)
Number adjustment, iteration execute the above process until loss function value is less than or equal to the targets threshold or the number of iterations reaches target
It, can be with deconditioning when number.
It is by text term vector (the second initial term vector) and figure when in above process, due to language model training
As term vector (the first initial term vector) together input language model, enable language model based on the semanteme for extracting text
The similar processing logic of feature, extracts further feature of multiple image on semantic level, so as to be detached from pixel spy
The constraint of sign gets through the processing barrier between text and image, gets the semantic feature of multiple images.
In some embodiments, during being iterated training to language model, server can keep multiple
Second initial term vector immobilizes, so that only the numerical value of the multiple first initial term vector is adjusted in training, from
And when can guarantee deconditioning, it is empty that obtained multiple first term vectors and multiple second initial word vectors are located at the same vector
Between, the semantic feature of the semantic feature and contextual information that is to say image is located at the same feature space, and server being capable of base
In the method for control variable, so that the semantic feature of multiple image has more accurate semantic meaning representation effect.
Fig. 5 is a kind of principle schematic diagram of language model training process provided in an embodiment of the present invention, false referring to Fig. 5
If text 401, image 402, image 403 and text 404 are one section of dialogue between user A and user B respectively, to language mould
This section of dialogue is considered as one section of long this paper for carrying image when type training, for text 401 and text 404, is obtained respectively
It takes the term vector of pre-training as the second initial term vector, for image 402 and image 403, first will acquire 402 He of image
Cluster classification number (such as the class label provided in an embodiment of the present invention, or mapped according to class label of image 403
Classification logotype), it, will according to the acquisition of respective cluster classification number and two random term vectors corresponding to respective cluster classification number
This two random term vectors respectively as image 402 and image 403 the first initial term vector, so that server can be word-based
Be embedded in (word embedding) method, in this segment length's text all texts and image be completed initialization, by two
Two the second initial term vector input language models of a first initial word vector sum, based on back-propagation algorithm to the language model
It is iterated training, until loss function value reaches targeted number less than targets threshold or the number of iterations, trigger the server is executed
Following step 206.
206, when loss function value, which is less than targets threshold or the number of iterations, reaches targeted number, server is by multiple first
Term vector is determined as the semantic feature of multiple images.
Wherein, multiple first term vector be after the multiple first initial term vectors are carried out with parameter adjustment obtained word to
Amount.Wherein, the targets threshold can be it is any be greater than or equal to 0 numerical value.
It should be noted that since the initial term vector of multiple first initial word vector sum multiple second is also language mould
A kind of parameter of type, therefore during to language model repetitive exercise, also can be to each first initial term vector and each
A second initial term vector carries out parameter adjustment, when loss function functional value reaches target time less than targets threshold or the number of iterations
Deconditioning when number, parameter multiple first initial term vectors adjusted are properly termed as multiple first term vectors, parameter tune at this time
The multiple second initial term vectors after whole are properly termed as multiple second term vectors.
In above process, with the optimization of language model Language Processing effect itself, each first initial term vector with
And each second initial term vector also can be more and more accurate, also just represents multiple first term vector to the language of multiple image
Adopted expression effect is become better and better, and multiple second term vector also becomes better and better to the semantic meaning representation effect of multiple contextual informations,
So that multiple first term vector be determined as to the semantic feature of multiple image, additionally it is possible to by this in deconditioning
Multiple second term vectors are determined as the semantic feature of multiple contextual information.
In above-mentioned steps 201-206, server passes through multiple images and multiple contextual information input language models
The language model and multiple contextual informations carry out feature extraction to multiple image, obtain the semantic feature of multiple image,
During feature extraction, server is based on being iterated training to language model, due in the training process can be to first
The initial term vector of initial word vector sum second carries out parameter adjustment, so that loss function value is less than targets threshold or the number of iterations reaches
When to targeted number, the semantic feature of the semantic feature of available multiple images and multiple contextual information, so as to
The barrier between text and image is got through, extracts image in the further feature of semantic level.
It is illustrated so that language model is ELMo as an example below, includes a two-way LSTM (long in ELMo
Short-term memory, shot and long term memory network) language model, it that is to say to include a forward direction LSTM and one in ELMo
A backward LSTM, server, to LSTM, will lead to before the input of multiple first initial word vector sum multiple second initial term vector
The semantic feature of multiple image and multiple contextual information in the forward direction is extracted to LSTM before crossing, server will be multiple
To LSTM after the initial term vector input of first initial word vector sum multiple second, by it is rear to LSTM extract multiple image and
Multiple contextual information in the rear semantic feature on direction, it is possible to further by it is preceding to LSTM and backward LSTM most
Maximum-likelihood estimates that (maximum likelihood estimation, MLE) is used as loss function value, when the maximal possibility estimation
Deconditioning when less than or equal to targets threshold obtains the semantic feature of multiple images, enables the semantic features of multiple images
Enough semantemes for more accurately expressing multiple image.
207, server stores the semantic feature of multiple image to database according to the classification logotype of multiple images
In.
Wherein, category mark can be class label, be also possible to the identification code mapped based on class label, example
Such as, category mark can be class label " no language ", when establishing class label " no language " and identification code in server
When mapping relations between " 3055 ", category mark can be " 3055 ".
In above process, server can map according to the class label of multiple image and obtain multiple image
Classification logotype that is to say to be stored in a manner of key-value pair in the database, and server is by the classification logotype of multiple image
It is stored as key name, the semantic feature of multiple image is stored as key assignments, has been convenient for subsequent to number in database
According to reading.
In some embodiments, when server does not execute above-mentioned steps 201-203, since each image is to unique right
Ying Yuyi semantic feature, therefore server can also be special by the semanteme of multiple image according to the image identification of multiple images
Sign storage has been convenient for the subsequent reading to data in database into database.
Fig. 6 is a kind of principle schematic diagram of database purchase provided in an embodiment of the present invention, and referring to Fig. 6, server is pressed
According to image identification, the semantic feature of each image is stored into database, wherein the database illustrated in Fig. 6 is precisely to deposit
The term vector library for storing up each contextual information stores text term vector and image term vector in the term vector library, often respectively
A text term vector and each image term vector both correspond to respective ID (identification, identification code), to image word
For vector, can correspond to image ID or category IDs, in some embodiments, the term vector that ID is 1~n be text word to
Amount, the term vector that ID is n~N are image term vector, and n is any value more than or equal to 1, and N is any value greater than n,
When certainly, if it is storing according to classification logotype, the corresponding pass between image and classification logotype can also be stored in the database
System.
208, when receiving the image processing commands for carrying target image, the language of the target image is obtained from database
Adopted feature, the semantic feature based on the target image carry out image procossing.
Wherein, target image and processing type can be carried in the image processing commands, which can be semanteme
Segmentation, image classification, image generation etc., optionally, can also carry the image identification of target image in the image processing commands.
Since in above-mentioned steps 207, server stores the semantic feature of multiple images into database, when target figure
As when any image, the semantic feature of the image can be directly read from database, based on the image in hiting data library
Semantic feature carries out image procossing, thus duration used in the processing image greatlyd save.
During server is based on database progress image procossing, it can be obtained when receiving image processing commands
Image entrained by the image processing commands is taken, clustering processing is carried out to the image and obtains the class label of the image, from data
Acquisition and semantic feature (term vector) corresponding to such distinguishing label, carry out image procossing based on the semantic feature in library, thus
The time for extracting characteristics of image can be saved, the efficiency of image procossing is optimized.
In some embodiments, if the case where server is the semantic feature according to image identification storage multiple images
Under, if also carrying the image identification of target image in image processing commands, server can by database with the image
The corresponding semantic feature of mark, is determined as the semantic feature of the target image, so as to rapidly obtain target image
Semantic feature has been convenient for the execution of the image processing tasks in downstream.
In above-mentioned determination process, server can be index with the image identification of target image, retrieve in the database
With index content corresponding to the index (semantic feature that is to say image), when the index can hit the semanteme of any image
When feature, the semantic feature of the image is determined as to the semantic feature of target image.
In some embodiments, if the case where server is the semantic feature according to classification logotype storage multiple images
Under, server directly can carry out clustering processing to the target image, and specific cluster process is similar with above-mentioned steps 202, here
It does not repeat them here, the class label of the available target image of server after clustering processing, and then according to the target image
Class label, mapping obtains the classification logotype of the target image, and then will identify corresponding semanteme with the category in database
Feature is determined as the semantic feature of target image, copes with the image processing requirements under more extensive scene.In above-mentioned determination
In the process, server can also be index with the classification logotype of target image, and the semanteme for executing the similar image that sets the goal really is special
The step of sign, is not described herein.
In above-mentioned steps 207-208, server carries out image procossing based on the semantic feature of multiple images, so as to
Image is extracted in the further feature of semantic level by language model, can more fully understand image language based on computer equipment
Justice can also complete the image processing tasks under some pairs of higher scenes of image, semantic demand, below with several examples
Mode is described in detail image processing tasks, which includes but is not limited to following examples.
In some embodiments, which can be image generation, and image generates at this time can take in instruction
With text or image, server is allowed to be based on the text or image, is obtained from database and the text or image
Image corresponding to the semantic feature is determined as output figure by the highest semantic feature of context matches degree between semantic feature
Picture obtains the highest output image of a context matches degree, in some human-computer dialogues so as to generate instruction based on image
Scene in, which can be sent to terminal, so that server to one image of terminal replies, increases
The interest of human-computer dialogue process, improves the degree of intelligence of chat robots.
In some embodiments, which can also be image classification, and generate can be in instruction for image at this time
Multiple images to be processed are carried, server obtains multiple image to be processed according to according to the process in above-mentioned steps 208
Semantic feature, the image to be processed so as to which the similarity between semantic feature is greater than targets threshold are determined as same class
Not, so as to realize image classification based on the semantic feature of image, semantic similar image category can be obtained, compared to
Traditional method classified according to pixel characteristic, can reach the classifying quality of higher performance.
All the above alternatives can form the alternative embodiment of the disclosure, herein no longer using any combination
It repeats one by one.
Method provided in an embodiment of the present invention, by obtaining multiple contextual informations of multiple images, on multiple
Context information is at least one in the text information in text scene before or after image present position, therefore this is more
After a image and multiple contextual information input language model, it can be believed by the language model and multiple context
Breath carries out feature extraction to multiple image under the action of considering multiple contextual information, more so as to obtain this
The semantic feature of a image, since these semantic features are the whole non-visualizations expressed by semantic level of multiple image
The vector of feature indicates, enables the server to more fully understand image, semantic, thus the semantic feature based on multiple image
When carrying out image procossing, the image processing tasks under some pairs of higher scenes of image, semantic demand can be completed, figure is improved
As the accuracy of processing.
Further, clustering processing first is carried out to image, then carries out the initialization of term vector to image based on class label
Processing, so that causing the contextual information of these images not abundant enough if some images only occur in a small amount of dialogue
When, the accuracy of the semantic feature of these images can be promoted by the method for clustering processing, and also reduce language model
Training duration, reduce the training calculation amount of language model.
Further, to the identical image of class label, identical first initial term vector can be distributed, so that classification mark
The first initial term vector can be shared by signing identical image, to have shared the tune of parameter during successive iterations training
It is whole, so that the identical image of class label eventually semantic feature having the same.
Further, when in multiple images include the first image when, extract text from first image, to the text into
Row insertion processing, obtains the term vector of at least one word in the text, by the term vector of at least one word it is average to
Amount, is retrieved as the first initial term vector corresponding with first image, will be random when in multiple image including the second image
Term vector is retrieved as the first initial term vector corresponding with second image, hence for the first image and the second image respectively into
Capable targetedly initialization process, optimizes the processing logic of term vector initialization procedure, shortens trained duration.Another party
Face carries out insertion processing to multiple contextual informations, and the term vector of pre-training is retrieved as the multiple second initial term vectors, thus
By obtaining the term vector of pre-training, further shorten duration required for training.
Further, according to the image identification of multiple images or classification logotype, the semantic feature of multiple image is stored
Into database, so as to properly store the semantic feature for the multiple images trained, in case the image procossing in downstream
Task call obtains the semanteme of the target image when receiving the image processing commands for carrying target image from database
Feature, semantic feature based on the target image carry out image procossing, greatly improve the efficiency of image processing process and accurate
Rate.
Fig. 7 is a kind of structural schematic diagram of image processing apparatus provided in an embodiment of the present invention, referring to Fig. 7, the device packet
It includes:
Module 701 is obtained, for obtaining multiple contextual informations of multiple images, multiple contextual information is in text
At least one of in text information in scene before or after image present position;
Characteristic extracting module 702 is used for by multiple image and multiple contextual information input language model, by this
Language model and multiple contextual information carry out feature extraction to multiple image, obtain the semantic feature of multiple image;
Image processing module 703 carries out image procossing for the semantic feature based on multiple image.
Device provided in an embodiment of the present invention, by obtaining multiple contextual informations of multiple images, on multiple
Context information is at least one in the text information in text scene before or after image present position, therefore this is more
After a image and multiple contextual information input language model, it can be believed by the language model and multiple context
Breath carries out feature extraction to multiple image under the action of considering multiple contextual information, more so as to obtain this
The semantic feature of a image, since these semantic features are the whole non-visualizations expressed by semantic level of multiple image
The vector of feature indicates, enables the server to more fully understand image, semantic, thus the semantic feature based on multiple image
When carrying out image procossing, the image processing tasks under some pairs of higher scenes of image, semantic demand can be completed, figure is improved
As the accuracy of processing.
In a kind of possible embodiment, this feature extraction module 702 includes:
Acquiring unit, for obtaining the multiple first initial term vectors, multiple first initial term vector corresponds to multiple
Image;
The acquiring unit is also used to obtain the multiple second initial term vectors, and multiple second initial term vector corresponds to should
Multiple contextual informations;
Repetitive exercise unit, for being based on the initial term vector of multiple first initial word vector sum multiple second, to this
Language model is iterated training;
Unit is obtained, for being somebody's turn to do when loss function value is less than targets threshold or the number of iterations reaches targeted number
The semantic feature of multiple images.
In a kind of possible embodiment, which is used for:
Multiple image input pixel characteristic is extracted into model, the multiple image of model extraction is extracted by the pixel characteristic
Pixel characteristic;
According to the pixel characteristic of multiple image, clustering processing is carried out to multiple image, obtains the class of multiple image
Distinguishing label;
To distribute identical random term vector as the first initial term vector with the image of the same category label.
In a kind of possible embodiment, which is used for:
When in multiple image including the first image, text is extracted from first image, the text is embedded in
Processing, obtains the term vector of at least one word in the text, and the average vector of the term vector of at least one word obtains
For the first initial term vector corresponding with first image, which is the image for carrying text;
When in multiple image including the second image, random term vector is retrieved as and second image corresponding first
Initial term vector, second image are the image for not carrying text.
In a kind of possible embodiment, which is used for:
Insertion processing is carried out to multiple contextual information, the term vector of pre-training is retrieved as multiple second initial word
Vector.
In a kind of possible embodiment, which is used for:
During being iterated training to the language model, multiple second initial term vector is kept to immobilize,
The numerical value of multiple first initial term vector is adjusted, multiple first term vectors are obtained;
When loss function value is less than targets threshold or the number of iterations reaches targeted number, the semanteme of multiple image is obtained
Feature includes:
It is when loss function value is less than targets threshold or the number of iterations reaches targeted number, multiple first term vector is true
It is set to the semantic feature of multiple image.
In a kind of possible embodiment, which includes:
Storage processing unit, for the image identification or classification logotype according to multiple image, by the language of multiple image
Adopted characteristic storage is into database, and when receiving the image processing commands for carrying target image, obtaining from the database should
The semantic feature of target image, the semantic feature based on the target image carry out image procossing.
In a kind of possible embodiment, which is used for:
When also carrying the image identification of the target image in the image processing commands, by the database with the image mark
Know corresponding semantic feature, is determined as the semantic feature of the target image;Or,
To the target image carry out clustering processing, obtain the classification logotype of the target image, by the database with such
Corresponding semantic feature is not identified, is determined as the semantic feature of the target image.
All the above alternatives can form the alternative embodiment of the disclosure, herein no longer using any combination
It repeats one by one.
It should be understood that image processing apparatus provided by the above embodiment is when handling image, only with above-mentioned each function
The division progress of module can according to need and for example, in practical application by above-mentioned function distribution by different function moulds
Block is completed, i.e., the internal structure of computer equipment is divided into different functional modules, with complete it is described above whole or
Partial function.In addition, image processing apparatus provided by the above embodiment and image processing method embodiment belong to same design,
Specific implementation process is detailed in image processing method embodiment, and which is not described herein again.
Fig. 8 is the structural schematic diagram of computer equipment provided in an embodiment of the present invention, which can be because of configuration
Or performance is different and generate bigger difference, may include one or more processors (central processing
Units, CPU) 801 and one or more memory 802, wherein at least one finger is stored in the memory 802
It enables, which is loaded by the processor 801 and executed to realize that above-mentioned each image processing method embodiment provides
Image processing method.Certainly, which can also have wired or wireless network interface, keyboard and input and output
The components such as interface, to carry out input and output, which can also include other components for realizing functions of the equipments,
This will not be repeated here.
In the exemplary embodiment, a kind of computer readable storage medium is additionally provided, for example including at least one instruction
Memory, it is above-mentioned at least one instruction can by the processor in terminal execute to complete image processing method in above-described embodiment
Method.For example, the computer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk and
Optical data storage devices etc..
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware
It completes, relevant hardware can also be instructed to complete by program, which can store in a kind of computer-readable storage
In medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of image processing method, which is characterized in that the described method includes:
Multiple contextual informations of multiple images are obtained, the multiple contextual information is the image present position in text scene
Before or after text information at least one of;
By described multiple images and the multiple contextual information input language model, by the language model and the multiple
Contextual information carries out feature extraction to described multiple images, obtains the semantic feature of described multiple images;
Semantic feature based on described multiple images carries out image procossing.
2. the method according to claim 1, wherein described pass through the language model and the multiple context
Information carries out feature extraction to described multiple images, and the semantic feature for obtaining described multiple images includes:
The multiple first initial term vectors are obtained, the multiple first initial term vector corresponds to described multiple images;
The multiple second initial term vectors are obtained, the multiple second initial term vector corresponds to the multiple contextual information;
Based on the initial term vector of the multiple first initial word vector sum the multiple second, the language model is iterated
Training;
When loss function value is less than targets threshold or the number of iterations reaches targeted number, the semanteme for obtaining described multiple images is special
Sign.
3. according to the method described in claim 2, it is characterized in that, the multiple first initial term vectors of acquisition include:
Described multiple images input pixel characteristic is extracted into model, the multiple figure of model extraction is extracted by the pixel characteristic
The pixel characteristic of picture;
According to the pixel characteristic of described multiple images, clustering processing is carried out to described multiple images, obtains described multiple images
Class label;
To distribute identical random term vector as the first initial term vector with the image of the same category label.
4. according to the method described in claim 2, it is characterized in that, the multiple first initial term vectors of acquisition include:
When in described multiple images including the first image, text is extracted from the first image, the text is carried out embedding
Enter processing, obtain the term vector of at least one word in the text, by the term vector of at least one word it is average to
Amount, is retrieved as the first initial term vector corresponding with the first image, and the first image is the image for carrying text;
When in described multiple images including the second image, random term vector is retrieved as and second image corresponding first
Initial term vector, second image are the image for not carrying text.
5. according to the method described in claim 2, it is characterized in that, described based on described in the multiple first initial word vector sum
Multiple second initial term vectors, being iterated training to the language model includes:
During being iterated training to the language model, the multiple second initial term vector is kept to immobilize,
The numerical value of the multiple first initial term vector is adjusted, multiple first term vectors are obtained;
It is described when loss function value is less than targets threshold or the number of iterations and reaches targeted number, obtain the language of described multiple images
Adopted feature includes:
When loss function value is less than targets threshold or the number of iterations reaches targeted number, the multiple first term vector is determined
For the semantic feature of described multiple images.
6. the method according to claim 1, wherein the semantic feature based on described multiple images carries out figure
As processing includes:
According to the image identification or classification logotype of described multiple images, by the semantic feature storage of described multiple images to database
In, when receiving the image processing commands for carrying target image, the semanteme of the target image is obtained from the database
Feature, the semantic feature based on the target image carry out image procossing.
7. according to the method described in claim 6, it is characterized in that, described obtain the target image from the database
Semantic feature includes:
When also carrying the image identification of the target image in described image process instruction, by the database with the figure
As the corresponding semantic feature of mark, it is determined as the semantic feature of the target image;Or,
To the target image carry out clustering processing, obtain the classification logotype of the target image, by the database with institute
Semantic feature corresponding to classification logotype is stated, the semantic feature of the target image is determined as.
8. a kind of image processing apparatus, which is characterized in that described device includes:
Module is obtained, for obtaining multiple contextual informations of multiple images, the multiple contextual information is in text scene
At least one of in text information before or after middle image present position;
Characteristic extracting module is used for by described multiple images and the multiple contextual information input language model, by described
Language model and the multiple contextual information carry out feature extraction to described multiple images, obtain the language of described multiple images
Adopted feature;
Image processing module carries out image procossing for the semantic feature based on described multiple images.
9. a kind of computer equipment, which is characterized in that the computer equipment includes one or more processors and one or more
A memory is stored at least one instruction in one or more of memories, and at least one instruction is by one
Or multiple processors are loaded and are executed to realize such as claim 1 to the described in any item image processing method institutes of claim 7
The operation of execution.
10. a kind of computer readable storage medium, which is characterized in that be stored at least one instruction, institute in the storage medium
At least one instruction is stated to be loaded by processor and executed to realize such as claim 1 to the described in any item images of claim 7
Operation performed by processing method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910360905.2A CN110163121B (en) | 2019-04-30 | 2019-04-30 | Image processing method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910360905.2A CN110163121B (en) | 2019-04-30 | 2019-04-30 | Image processing method, device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110163121A true CN110163121A (en) | 2019-08-23 |
CN110163121B CN110163121B (en) | 2023-09-05 |
Family
ID=67633043
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910360905.2A Active CN110163121B (en) | 2019-04-30 | 2019-04-30 | Image processing method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110163121B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866908A (en) * | 2019-11-12 | 2020-03-06 | 腾讯科技(深圳)有限公司 | Image processing method, image processing apparatus, server, and storage medium |
CN111080660A (en) * | 2019-11-14 | 2020-04-28 | 中国科学院深圳先进技术研究院 | Image segmentation method and device, terminal equipment and storage medium |
CN111385188A (en) * | 2019-11-22 | 2020-07-07 | 百度在线网络技术(北京)有限公司 | Recommendation method and device for dialog elements, electronic equipment and medium |
CN111666439A (en) * | 2020-05-28 | 2020-09-15 | 重庆渝抗医药科技有限公司 | Working method for rapidly extracting and dividing medical image big data aiming at cloud environment |
CN111783557A (en) * | 2020-06-11 | 2020-10-16 | 北京科技大学 | Wearable blind guiding equipment based on depth vision and server |
CN112364200A (en) * | 2021-01-15 | 2021-02-12 | 清华大学 | Brain-like imaging method, device, equipment and storage medium |
CN112861934A (en) * | 2021-01-25 | 2021-05-28 | 深圳市优必选科技股份有限公司 | Image classification method and device of embedded terminal and embedded terminal |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106462574A (en) * | 2014-06-24 | 2017-02-22 | 谷歌公司 | Techniques for machine language translation of text from an image based on non-textual context information from the image |
US20170147910A1 (en) * | 2015-10-02 | 2017-05-25 | Baidu Usa Llc | Systems and methods for fast novel visual concept learning from sentence descriptions of images |
CN107423277A (en) * | 2016-02-16 | 2017-12-01 | 中兴通讯股份有限公司 | A kind of expression input method, device and terminal |
WO2018049960A1 (en) * | 2016-09-14 | 2018-03-22 | 厦门幻世网络科技有限公司 | Method and apparatus for matching resource for text information |
CN109034203A (en) * | 2018-06-29 | 2018-12-18 | 北京百度网讯科技有限公司 | Training, expression recommended method, device, equipment and the medium of expression recommended models |
CN109255047A (en) * | 2018-07-18 | 2019-01-22 | 西安电子科技大学 | Based on the complementary semantic mutual search method of image-text being aligned and symmetrically retrieve |
US20190050639A1 (en) * | 2017-08-09 | 2019-02-14 | Open Text Sa Ulc | Systems and methods for generating and using semantic images in deep learning for classification and data extraction |
CN109447990A (en) * | 2018-10-22 | 2019-03-08 | 北京旷视科技有限公司 | Image, semantic dividing method, device, electronic equipment and computer-readable medium |
-
2019
- 2019-04-30 CN CN201910360905.2A patent/CN110163121B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106462574A (en) * | 2014-06-24 | 2017-02-22 | 谷歌公司 | Techniques for machine language translation of text from an image based on non-textual context information from the image |
US20170147910A1 (en) * | 2015-10-02 | 2017-05-25 | Baidu Usa Llc | Systems and methods for fast novel visual concept learning from sentence descriptions of images |
CN107423277A (en) * | 2016-02-16 | 2017-12-01 | 中兴通讯股份有限公司 | A kind of expression input method, device and terminal |
WO2018049960A1 (en) * | 2016-09-14 | 2018-03-22 | 厦门幻世网络科技有限公司 | Method and apparatus for matching resource for text information |
US20190050639A1 (en) * | 2017-08-09 | 2019-02-14 | Open Text Sa Ulc | Systems and methods for generating and using semantic images in deep learning for classification and data extraction |
CN109034203A (en) * | 2018-06-29 | 2018-12-18 | 北京百度网讯科技有限公司 | Training, expression recommended method, device, equipment and the medium of expression recommended models |
CN109255047A (en) * | 2018-07-18 | 2019-01-22 | 西安电子科技大学 | Based on the complementary semantic mutual search method of image-text being aligned and symmetrically retrieve |
CN109447990A (en) * | 2018-10-22 | 2019-03-08 | 北京旷视科技有限公司 | Image, semantic dividing method, device, electronic equipment and computer-readable medium |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866908A (en) * | 2019-11-12 | 2020-03-06 | 腾讯科技(深圳)有限公司 | Image processing method, image processing apparatus, server, and storage medium |
CN111080660A (en) * | 2019-11-14 | 2020-04-28 | 中国科学院深圳先进技术研究院 | Image segmentation method and device, terminal equipment and storage medium |
CN111080660B (en) * | 2019-11-14 | 2023-08-08 | 中国科学院深圳先进技术研究院 | Image segmentation method, device, terminal equipment and storage medium |
CN111385188A (en) * | 2019-11-22 | 2020-07-07 | 百度在线网络技术(北京)有限公司 | Recommendation method and device for dialog elements, electronic equipment and medium |
CN111666439A (en) * | 2020-05-28 | 2020-09-15 | 重庆渝抗医药科技有限公司 | Working method for rapidly extracting and dividing medical image big data aiming at cloud environment |
CN111783557A (en) * | 2020-06-11 | 2020-10-16 | 北京科技大学 | Wearable blind guiding equipment based on depth vision and server |
CN111783557B (en) * | 2020-06-11 | 2023-08-15 | 北京科技大学 | Wearable blind guiding equipment based on depth vision and server |
CN112364200A (en) * | 2021-01-15 | 2021-02-12 | 清华大学 | Brain-like imaging method, device, equipment and storage medium |
CN112364200B (en) * | 2021-01-15 | 2021-04-13 | 清华大学 | Brain-like imaging method, device, equipment and storage medium |
CN112861934A (en) * | 2021-01-25 | 2021-05-28 | 深圳市优必选科技股份有限公司 | Image classification method and device of embedded terminal and embedded terminal |
Also Published As
Publication number | Publication date |
---|---|
CN110163121B (en) | 2023-09-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110163121A (en) | Image processing method, device, computer equipment and storage medium | |
CN108304882B (en) | Image classification method and device, server, user terminal and storage medium | |
CN114357973B (en) | Intention recognition method and device, electronic equipment and storage medium | |
WO2024098533A1 (en) | Image-text bidirectional search method, apparatus and device, and non-volatile readable storage medium | |
CN108419094A (en) | Method for processing video frequency, video retrieval method, device, medium and server | |
CN109857846B (en) | Method and device for matching user question and knowledge point | |
WO2018196718A1 (en) | Image disambiguation method and device, storage medium, and electronic device | |
CN113298197B (en) | Data clustering method, device, equipment and readable storage medium | |
CN111506709B (en) | Entity linking method and device, electronic equipment and storage medium | |
CN111666400B (en) | Message acquisition method, device, computer equipment and storage medium | |
CN112183083A (en) | Abstract automatic generation method and device, electronic equipment and storage medium | |
CN114328988A (en) | Multimedia data feature extraction method, multimedia data retrieval method and device | |
CN117114063A (en) | Method for training a generative large language model and for processing image tasks | |
CN111368066B (en) | Method, apparatus and computer readable storage medium for obtaining dialogue abstract | |
CN113254575B (en) | Machine reading understanding method and system based on multi-step evidence reasoning | |
CN114266252A (en) | Named entity recognition method, device, equipment and storage medium | |
CN113887169A (en) | Text processing method, electronic device, computer storage medium, and program product | |
CN113704534A (en) | Image processing method and device and computer equipment | |
CN116883740A (en) | Similar picture identification method, device, electronic equipment and storage medium | |
CN110674716A (en) | Image recognition method, device and storage medium | |
CN114490974A (en) | Automatic information reply method, device, system, electronic equipment and readable medium | |
CN114329005A (en) | Information processing method, information processing device, computer equipment and storage medium | |
CN113869068A (en) | Scene service recommendation method, device, equipment and storage medium | |
Zhang et al. | Learning cross-modal aligned representation with graph embedding | |
Wang et al. | Capsule network based on multi-granularity attention model for text classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |